Every object is in place. We have a window, a DXGI factory, a GPU adapter, a D3D12 device, a command queue, a swap chain, render target views, allocators, and a command list. This chapter connects them into a running render loop.
By the end of this chapter you will have:
- a render loop that produces real frames,
- resource barriers that transition the back buffer between present and render target states,
- a
ClearRenderTargetViewcall that fills the screen with a solid colour, - a fence that safely synchronises the CPU with the GPU,
- and a window that turns solid blue.
Nothing is drawn. The purpose of this chapter is the loop itself — recording, submitting, presenting, and waiting. Everything that comes later (geometry, shaders, textures) plugs into this foundation.
Resource States and Barriers
Every GPU resource in DX12 exists in a state. The state tells the driver how the resource is currently being used and what memory layout it is in. Using a resource in a state it was not transitioned to is undefined behaviour — on some hardware it produces garbage pixels, on others a GPU crash.
The two states we care about today:
D3D12_RESOURCE_STATE_PRESENT— the back buffer is ready to be displayed. DXGI owns it.D3D12_RESOURCE_STATE_RENDER_TARGET— the back buffer is ready to be written to by the output merger.
Before clearing the back buffer you must transition it from PRESENT to RENDER_TARGET. Before calling Present you must transition it back.
A barrier is recorded as a command in the command list:
D3D12_RESOURCE_BARRIER barrier = {};
barrier.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
barrier.Flags = D3D12_RESOURCE_BARRIER_FLAG_NONE;
barrier.Transition.pResource = backBuffer;
barrier.Transition.StateBefore = D3D12_RESOURCE_STATE_PRESENT;
barrier.Transition.StateAfter = D3D12_RESOURCE_STATE_RENDER_TARGET;
barrier.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
commandList->ResourceBarrier(1, &barrier);
The reverse barrier uses the same structure with StateBefore and StateAfter swapped:
barrier.Transition.StateBefore = D3D12_RESOURCE_STATE_RENDER_TARGET;
barrier.Transition.StateAfter = D3D12_RESOURCE_STATE_PRESENT;
commandList->ResourceBarrier(1, &barrier);
This is the most common error for beginners: forgetting one of the barriers. The debug layer will catch it immediately with a message about an invalid resource state transition.
The Frame Sequence
One frame follows this sequence:
1. Ask the swap chain which back buffer to render to this frame.
2. Reset the allocator and command list for that frame index.
3. Barrier: PRESENT → RENDER_TARGET
4. Set the render target.
5. Clear the render target.
6. Barrier: RENDER_TARGET → PRESENT
7. Close the command list.
8. Submit: ExecuteCommandLists.
9. Present.
10. Signal the fence and wait for the GPU.
Steps 3–6 are where rendering will eventually go. Right now steps 4 and 5 are the entire visual output.
Fences
A fence is a synchronisation primitive. You ask the GPU to signal it at a particular value after it finishes a batch of work, and the CPU waits for that value before touching anything the GPU was using.
Create the fence before the loop:
Microsoft::WRL::ComPtr<ID3D12Fence> fence;
UINT64 fenceValue = 0;
HANDLE fenceEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr);
d3d.Device()->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&fence));
After each submit and present, increment the fence value, signal the queue, and wait:
++fenceValue;
queue.Get()->Signal(fence.Get(), fenceValue);
if (fence->GetCompletedValue() < fenceValue)
{
fence->SetEventOnCompletion(fenceValue, fenceEvent);
WaitForSingleObject(fenceEvent, INFINITE);
}
Signal is a GPU-side command. The queue executes it after all previously submitted work finishes, then writes fenceValue into the fence. GetCompletedValue reads the fence from the CPU. If it is already at fenceValue or higher, the GPU is done and we skip the wait entirely.
SetEventOnCompletion registers an OS event that the GPU will fire when the fence reaches the target value. WaitForSingleObject blocks the CPU thread until that event fires.
This approach waits for the GPU after every single frame. It is simple and provably safe — the GPU always finishes before the CPU starts the next frame. It is not efficient; it leaves both the GPU and CPU idle for a portion of every frame while the other side drains. That is a fine starting point. The standard improvement (tracking per-frame fence values and only waiting when you are about to overwrite an in-flight frame) is a natural next step once this is working.
Putting the Loop Together
Here is the complete main.cpp with everything wired:
#include "Window.h"
#include "DXGIContext.h"
#include "D3DDevice.h"
#include "CommandQueue.h"
#include "SwapChain.h"
#include "RTVHeap.h"
#include "FrameResources.h"
#include <d3d12.h>
#include <wrl/client.h>
static void RenderFrame(
const SwapChain& swapChain,
const RTVHeap& rtvHeap,
FrameResources& frames,
ID3D12CommandQueue* queue,
IDXGISwapChain3* dxgiSwapChain,
ID3D12Fence* fence,
HANDLE fenceEvent,
UINT64& fenceValue)
{
UINT frameIndex = swapChain.CurrentBackBufferIndex();
frames.ResetFor(frameIndex);
auto* cmd = frames.CommandList();
// Transition back buffer: PRESENT → RENDER_TARGET
ID3D12Resource* backBuffer = swapChain.BackBuffer(frameIndex);
D3D12_RESOURCE_BARRIER toRT = {};
toRT.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
toRT.Flags = D3D12_RESOURCE_BARRIER_FLAG_NONE;
toRT.Transition.pResource = backBuffer;
toRT.Transition.StateBefore = D3D12_RESOURCE_STATE_PRESENT;
toRT.Transition.StateAfter = D3D12_RESOURCE_STATE_RENDER_TARGET;
toRT.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
cmd->ResourceBarrier(1, &toRT);
// Clear
D3D12_CPU_DESCRIPTOR_HANDLE rtv = rtvHeap.HandleFor(frameIndex);
const float clearColor[] = { 0.1f, 0.1f, 0.4f, 1.0f };
cmd->OMSetRenderTargets(1, &rtv, FALSE, nullptr);
cmd->ClearRenderTargetView(rtv, clearColor, 0, nullptr);
// Transition back buffer: RENDER_TARGET → PRESENT
D3D12_RESOURCE_BARRIER toPresent = toRT;
toPresent.Transition.StateBefore = D3D12_RESOURCE_STATE_RENDER_TARGET;
toPresent.Transition.StateAfter = D3D12_RESOURCE_STATE_PRESENT;
cmd->ResourceBarrier(1, &toPresent);
// Submit
cmd->Close();
ID3D12CommandList* lists[] = { cmd };
queue->ExecuteCommandLists(1, lists);
// Present
UINT presentFlags = swapChain.TearingSupported() ? DXGI_PRESENT_ALLOW_TEARING : 0;
UINT syncInterval = swapChain.TearingSupported() ? 0 : 1;
dxgiSwapChain->Present(syncInterval, presentFlags);
// Wait for GPU
++fenceValue;
queue->Signal(fence, fenceValue);
if (fence->GetCompletedValue() < fenceValue)
{
fence->SetEventOnCompletion(fenceValue, fenceEvent);
WaitForSingleObject(fenceEvent, INFINITE);
}
}
int WINAPI wWinMain(
HINSTANCE hInstance,
HINSTANCE,
PWSTR,
int nCmdShow)
{
Window window;
if (!window.Create(hInstance, 1280, 720, nCmdShow))
{
MessageBoxW(nullptr, L"Window creation failed.", L"Startup Error", MB_ICONERROR);
return -1;
}
DXGIContext dxgi;
if (!dxgi.Init(true))
{
MessageBoxW(nullptr, L"No suitable GPU found.", L"Startup Error", MB_ICONERROR);
return -1;
}
D3DDevice d3d;
if (!d3d.Init(dxgi.Adapter(), true))
{
MessageBoxW(nullptr, L"D3D12 device creation failed.", L"Startup Error", MB_ICONERROR);
return -1;
}
CommandQueue queue;
if (!queue.Init(d3d.Device()))
{
MessageBoxW(nullptr, L"Command queue creation failed.", L"Startup Error", MB_ICONERROR);
return -1;
}
SwapChain swapChain;
if (!swapChain.Init(dxgi.Factory(), queue.Get(), window.Handle(),
window.Width(), window.Height()))
{
MessageBoxW(nullptr, L"Swap chain creation failed.", L"Startup Error", MB_ICONERROR);
return -1;
}
RTVHeap rtvHeap;
if (!rtvHeap.Init(d3d.Device(), swapChain))
{
MessageBoxW(nullptr, L"RTV heap creation failed.", L"Startup Error", MB_ICONERROR);
return -1;
}
FrameResources frames;
if (!frames.Init(d3d.Device()))
{
MessageBoxW(nullptr, L"Frame resources creation failed.", L"Startup Error", MB_ICONERROR);
return -1;
}
Microsoft::WRL::ComPtr<ID3D12Fence> fence;
UINT64 fenceValue = 0;
HANDLE fenceEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr);
d3d.Device()->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&fence));
MSG msg = {};
while (msg.message != WM_QUIT)
{
if (PeekMessageW(&msg, nullptr, 0, 0, PM_REMOVE))
{
TranslateMessage(&msg);
DispatchMessageW(&msg);
}
else
{
RenderFrame(
swapChain,
rtvHeap,
frames,
queue.Get(),
swapChain.Get(),
fence.Get(),
fenceEvent,
fenceValue);
}
}
// Drain the GPU before destroying objects
++fenceValue;
queue.Get()->Signal(fence.Get(), fenceValue);
fence->SetEventOnCompletion(fenceValue, fenceEvent);
WaitForSingleObject(fenceEvent, INFINITE);
CloseHandle(fenceEvent);
return static_cast<int>(msg.wParam);
}
When you build and run this, the window should fill with a solid dark blue. The title bar and borders belong to Windows; the solid colour is the GPU clearing the back buffer every frame.
The Drain at Shutdown
Notice the wait after the message loop exits:
++fenceValue;
queue.Get()->Signal(fence.Get(), fenceValue);
fence->SetEventOnCompletion(fenceValue, fenceEvent);
WaitForSingleObject(fenceEvent, INFINITE);
When wWinMain returns, C++ destroys the local objects — the swap chain, the heap, the device. If the GPU is still executing commands that reference those objects when the destructor runs, you get a crash or silent corruption. Waiting here ensures the GPU has drained completely before anything is destroyed.
Common Mistakes
If the window appears black instead of blue, check that OMSetRenderTargets is called before ClearRenderTargetView. The render target must be bound before clearing.
If the debug layer reports “resource is in the wrong state”, you have a missing barrier. Check that you have both the PRESENT → RENDER_TARGET barrier before the clear and the RENDER_TARGET → PRESENT barrier before Present.
If the application crashes on allocator->Reset(), the fence wait is not working. Add an OutputDebugStringW after the fence wait to confirm it is reached, and check that Signal is called on the queue before SetEventOnCompletion.
If Present returns DXGI_ERROR_DEVICE_REMOVED, the GPU has been lost. This can happen if the debug layer detected a fatal error on the previous frame. Check the debug output for the underlying cause.
If the frame rate is unexpectedly low (near the display refresh rate), you are likely presenting with syncInterval = 1 (V-Sync). That is correct for the tearing-unsupported path. Adjust syncInterval and presentFlags as needed.
Quick Checkpoint
You have a working render loop. These ideas should feel solid:
- Resources must be in the correct state for each operation. Transitions are explicit barriers.
PRESENT → RENDER_TARGETbefore writing.RENDER_TARGET → PRESENTbefore presenting.OMSetRenderTargetsbinds the render target for subsequent clear and draw calls.- A fence lets the CPU wait for the GPU.
Signalis GPU-side;SetEventOnCompletionis CPU-side. - Always drain the GPU before destroying DX12 objects.
- The command list parameter order in
RenderFramewill grow as rendering complexity grows. Eventually this becomes aRendererclass.
What’s Next
The render loop works. The next step is geometry: a vertex buffer, a vertex shader, a pixel shader, a root signature, and a pipeline state object. At the end of that sequence the clear colour gives way to your first triangle.