代码工程地址:
https://github.com/jiabaodan/Direct12BookReadingNotes
学习目标
- 理解本章中针对命令队列的更新(不再需要每帧都flush命令队列),提高性能;
- 理解其他两种类型的根信号参数类型:根描述和根常量;
- 熟悉如何通过程序方法来绘制通用的几何形状:盒子,圆柱体和球体;
- 学习如何在CPU做顶点动画,并且通过动态顶点缓冲将顶点数据上传到GPU内存。
1 帧资源
在之前的代码中,我们在每帧结束的时候调用D3DApp::FlushCommandQueue方法来同步CPU和GPU,这个方法可以使用,但是很低效:
- 在每帧开始的时候,GPU没有任何命令可以执行,所以它一直在等待,直到CPU提交命令;
- 每帧的结尾,CPU需要等待GPU执行完命令。
这个问题的其中一个解决方案是针对CPU更新的资源创建一个环形数组,我们叫它帧资源(frame resources),通常情况下数组中使用3个元素。该方案中,CPU提交资源后,将会获取下一个可使用的资源(GPU没有在执行的)继续数据的更新,使用3个元素可以确保CPU提前2个元素更新,这样就可以保证GPU一直的高效运算。下面的例子是使用在Shape示例中的,因为CPU只需要更新常量缓冲,所以帧数据只包含常量缓冲:
// Stores the resources needed for the CPU to build the command lists
// for a frame. The contents here will vary from app to app based on
// the needed resources.
struct FrameResource
{
public:
FrameResource(ID3D12Device* device, UINT passCount, UINT objectCount);
FrameResource(const FrameResource& rhs) = delete;
FrameResource& operator=(const FrameResource& rhs) = delete;
˜FrameResource();
// We cannot reset the allocator until the GPU is done processing the
// commands. So each frame needs their own allocator.
Microsoft::WRL::ComPtr<ID3D12CommandAllocator> CmdListAlloc;
// We cannot update a cbuffer until the GPU is done processing the
// commands that reference it. So each frame needs their own cbuffers.
std::unique_ptr<UploadBuffer<PassConstants>> PassCB = nullptr;
std::unique_ptr<UploadBuffer<ObjectConstants>> ObjectCB = nullptr;
// Fence value to mark commands up to this fence point. This lets us
// check if these frame resources are still in use by the GPU.
UINT64 Fence = 0;
};
FrameResource::FrameResource(ID3D12Device* device, UINT passCount, UINT objectCount)
{
ThrowIfFailed(device->CreateCommandAllocator(
D3D12_COMMAND_LIST_TYPE_DIRECT,
IID_PPV_ARGS(CmdListAlloc.GetAddressOf())));
PassCB = std::make_unique<UploadBuffer<PassConstants>> (device, passCount, true);
ObjectCB = std::make_unique<UploadBuffer<ObjectConstants>> (device, objectCount, true);
}
FrameResource::˜ FrameResource()
{
}
在我们的应用中使用Vector来实例化3个资源,并且跟踪当前的资源:
static const int NumFrameResources = 3;
std::vector<std::unique_ptr<FrameResource>> mFrameResources;
FrameResource* mCurrFrameResource = nullptr;
int mCurrFrameResourceIndex = 0;
void ShapesApp::BuildFrameResources()
{
for(int i = 0; i < gNumFrameResources; ++i)
{
mFrameResources.push_back(std::make_unique<FrameResource> (
md3dDevice.Get(), 1,
(UINT)mAllRitems.size()));
}
}
现在对于CPU第N帧,执行算法是:
void ShapesApp::Update(const GameTimer& gt)
{
// Cycle through the circular frame resource array.
mCurrFrameResourceIndex = (mCurrFrameResourceIndex + 1) % NumFrameResources;
mCurrFrameResource = mFrameResources[mCurrFrameResourceIndex];
// Has the GPU finished processing the commands of the current frame
// resource. If not, wait until the GPU has completed commands up to
// this fence point.
if(mCurrFrameResource->Fence != 0
&& mCommandQueue->GetLastCompletedFence() < mCurrFrameResource->Fence)
{
HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS);
ThrowIfFailed(mCommandQueue->SetEventOnFenceCompletion(
mCurrFrameResource->Fence, eventHandle));
WaitForSingleObject(eventHandle, INFINITE);
CloseHandle(eventHandle);
}
// […] Update resources in mCurrFrameResource (like cbuffers).
}
void ShapesApp::Draw(const GameTimer& gt)
{
// […] Build and submit command lists for this frame.
// Advance the fence value to mark commands up to this fence point.
mCurrFrameResource->Fence = ++mCurrentFence;
// Add an instruction to the command queue to set a new fence point.
// Because we are on the GPU timeline, the new fence point won’t be
// set until the GPU finishes processing all the commands prior to
// this Signal().
mCommandQueue->Signal(mFence.Get(), mCurrentFence);
// Note that GPU could still be working on commands from previous
// frames, but that is okay, because we are not touching any frame
// resources associated with those frames.
}
这个方案并没有完美解决等待,如果其中一个处理器处理太快,它还是要等待另一个处理器。
2 渲染物体(RENDER ITEMS)
绘制一个物体需要设置大量参数,比如创建顶点和索引缓存,绑定常量缓冲,设置拓扑结构,指定DrawIndexedInstanced参数。如果我们要绘制多个物体,设计和创建一个轻量级结构用来保存上述所有数据就很有用。我们对这一组单个绘制调用需要的所有数据称之为一个渲染物体(render item),当前Demo中,我们RenderItem结构如下:
// Lightweight structure stores parameters to draw a shape. This will
// vary from app-to-app.
struct RenderItem
{
RenderItem() = default;
// World matrix of the shape that describes the object’s local space
// relative to the world space, which defines the position,
// orientation, and scale of the object in the world.
XMFLOAT4X4 World = MathHelper::Identity4x4();
// Dirty flag indicating the object data has changed and we need
// to update the constant buffer. Because we have an object
// cbuffer for each FrameResource, we have to apply the
// update to each FrameResource. Thus, when we modify obect data we
// should set
// NumFramesDirty = gNumFrameResources so that each frame resource
// gets the update.
int NumFramesDirty = gNumFrameResources;
// Index into GPU constant buffer corresponding to the ObjectCB
// for this render item.
UINT ObjCBIndex = -1;
// Geometry associated with this render-item. Note that multiple
// render-items can share the same geometry.
MeshGeometry* Geo = nullptr;
// Primitive topology.
D3D12_PRIMITIVE_TOPOLOGY PrimitiveType = D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST;
// DrawIndexedInstanced parameters.
UINT IndexCount = 0;
UINT StartIndexLocation = 0;
int BaseVertexLocation = 0;
};
我们的应用将包含一个渲染物体列表来表示他们如何渲染;需要不同PSO的物体会放置到不同的列表中:
// List of all the render items.
std::vector<std::unique_ptr<RenderItem>> mAllRitems;
// Render items divided by PSO.
std::vector<RenderItem*> mOpaqueRitems;
std::vector<RenderItem*> mTransparentRitems;
3 PASS CONSTANTS
之前的章节中我们介绍了一个新的常量缓冲:
std::unique_ptr<UploadBuffer<PassConstants>> PassCB = nullptr;
它主要包含一些各个物体通用的常量,比如眼睛位置,透视投影矩阵,屏幕分辨率数据,还包括时间数据等。目前我们的Demo不需要所有这些数据,但是都实现他们会很方便,并且只会消耗很少的额外数据空间。比如我们如果要做一些后期特效,渲染目标尺寸数据就很有用:
cbuffer cbPass : register(b1)
{
float4x4 gView;
float4x4 gInvView;
float4x4 gProj;
float4x4 gInvProj;
float4x4 gViewProj;
float4x4 gInvViewProj;
float3 gEyePosW;
float cbPerObjectPad1;
float2 gRenderTargetSize;
float2 gInvRenderTargetSize;
float gNearZ;
float gFarZ;
float gTotalTime;
float gDeltaTime;
};
我们也需要修改之和每个物体关联的常量缓冲。目前我们只需要世界变换矩阵:
cbuffer cbPerObject : register(b0)
{
float4x4 gWorld;
};
这样做的好处是可以将常量缓冲分组进行更新,每一个pass更新的常量缓冲需要每一个渲染Pass的时候更新;物体常量只需要当物体世界矩阵变换的时候更新;静态物体只需要在初始化的时候更新一下。在我们Demo中,实现了下面的方法来更新常量缓冲,它们每帧在Update中调用一次:
void ShapesApp::UpdateObjectCBs(const GameTimer& gt)
{
auto currObjectCB = mCurrFrameResource->ObjectCB.get();
for(auto& e : mAllRitems)
{
// Only update the cbuffer data if the constants have changed.
// This needs to be tracked per frame resource.
if(e->NumFramesDirty > 0)
{
XMMATRIX world = XMLoadFloat4x4(&e->World);
ObjectConstants objConstants;
XMStoreFloat4x4(&objConstants.World, XMMatrixTranspose(world));
currObjectCB->CopyData(e->ObjCBIndex, objConstants);
// Next FrameResource need to be updated too.
e->NumFramesDirty--;
}
}
}
void ShapesApp::UpdateMainPassCB(const GameTimer& gt)
{
XMMATRIX view = XMLoadFloat4x4(&mView);
XMMATRIX proj = XMLoadFloat4x4(&mProj);
XMMATRIX viewProj = XMMatrixMultiply(view, proj);
XMMATRIX invView = XMMatrixInverse(&XMMatrixDeterminant(view), view);
XMMATRIX invProj = XMMatrixInverse(&XMMatrixDeterminant(proj), proj);
XMMATRIX invViewProj = XMMatrixInverse(&XMMatrixDeterminant(viewProj), viewProj);
XMStoreFloat4x4(&mMainPassCB.View, XMMatrixTranspose(view));
XMStoreFloat4x4(&mMainPassCB.InvView, XMMatrixTranspose(invView));
XMStoreFloat4x4(&mMainPassCB.Proj, XMMatrixTranspose(proj));
XMStoreFloat4x4(&mMainPassCB.InvProj, XMMatrixTranspose(invProj));
XMStoreFloat4x4(&mMainPassCB.ViewProj, XMMatrixTranspose(viewProj));
XMStoreFloat4x4(&mMainPassCB.InvViewProj, XMMatrixTranspose(invViewProj));
mMainPassCB.EyePosW = mEyePos;
mMainPassCB.RenderTargetSize = XMFLOAT2((float)mClientWidth, (float)mClientHeight);
mMainPassCB.InvRenderTargetSize = XMFLOAT2(1.0f / mClientWidth, 1.0f / mClientHeight);
mMainPassCB.NearZ = 1.0f;
mMainPassCB.FarZ = 1000.0f;
mMainPassCB.TotalTime = gt.TotalTime();
mMainPassCB.DeltaTime = gt.DeltaTime();
auto currPassCB = mCurrFrameResource->PassCB.get();
currPassCB->CopyData(0, mMainPassCB);
}
我们更新顶点着色器相应的支持这个缓冲变换:
VertexOut VS(VertexIn vin)
{
VertexOut vout;
// Transform to homogeneous clip space.
float4 posW = mul(float4(vin.PosL, 1.0f), gWorld);
vout.PosH = mul(posW, gViewProj);
// Just pass vertex color into the pixel shader.
vout.Color = vin.Color;
return vout;
}
这里额外的逐顶点矩阵相乘,在现在强大的GPU上是微不足道的。
着色器需要的资源发生变化,所以需要更新根签名相应的包含两个描述表:
CD3DX12_DESCRIPTOR_RANGE cbvTable0;
cbvTable0.Init(D3D12_DESCRIPTOR_RANGE_TYPE_CBV, 1, 0);
CD3DX12_DESCRIPTOR_RANGE cbvTable1;
cbvTable1.Init(D3D12_DESCRIPTOR_RANGE_TYPE_CBV, 1, 1);
// Root parameter can be a table, root descriptor or root constants.
CD3DX12_ROOT_PARAMETER slotRootParameter[2];
// Create root CBVs.
slotRootParameter[0].InitAsDescriptorTable(1, &cbvTable0);
slotRootParameter[1].InitAsDescriptorTable(1, &cbvTable1);
// A root signature is an array of root parameters.
CD3DX12_ROOT_SIGNATURE_DESC rootSigDesc(2,
slotRootParameter, 0, nullptr,
D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT__LAYOUT);
不要在着色器中使用太多的常量缓冲,为了性能[Thibieroz13]建议保持在5个以下。
4 形状几何
这节将会展示如何创建椭球体,球体,圆柱体和圆锥体。这些形状对于绘制天空示例,Debugging,可视化碰撞检测和延时渲染非常有用。
我们将在程序中创建几何体的代码放在GeometryGenerator(GeometryGenerator.h/.cpp)类中,该类创建的数据保存在内存中,所以我们还需要将它们赋值到顶点/索引缓冲中。MeshData结构是一个内嵌在GeometryGenerator中用来保存顶点和索引列表的简单结构:
class GeometryGenerator
{
public:
using uint16 = std::uint16_t;
using uint32 = std::uint32_t;
struct Vertex
{
Vertex(){}
Vertex(
const DirectX::XMFLOAT3& p,
const DirectX::XMFLOAT3& n,
const DirectX::XMFLOAT3& t,
const DirectX::XMFLOAT2& uv) :
Position(p),
Normal(n),
TangentU(t),
TexC(uv){}
Vertex(
float px, float py, float pz,
float nx, float ny, float nz,
float tx, float ty, float tz,
float u, float v) :
Position(px,py,pz),
Normal(nx,ny,nz),
TangentU(tx, ty, tz),
TexC(u,v){}
DirectX::XMFLOAT3 Position;
DirectX::XMFLOAT3 Normal;
DirectX::XMFLOAT3 TangentU;
DirectX::XMFLOAT2 TexC;
};
struct MeshData
{
std::vector<Vertex> Vertices;
std::vector<uint32> Indices32;
std::vector<uint16>& GetIndices16()
{
if(mIndices16.empty())
{
mIndices16.resize(Indices32.size());
for(size_t i = 0; i < Indices32.size(); ++i)
mIndices16[i] = static_cast<uint16> (Indices32[i]);
}
return mIndices16;
}
private:
std::vector<uint16> mIndices16;
};
…
};
4.1 创建圆柱体网格
我们通过定义底面和顶面半径,高度,切片(slice)和堆叠(stack)个数来定义一个圆柱体网格,如下图,我们将圆柱体划分成侧面,底面和顶面:
4.1.1 圆柱体侧面几何
我们创建的圆柱体中心的原点,平行于Y轴,所有顶点依赖于环(rings)。每个圆柱体有stackCount + 1环,每一环有sliceCount个独立的顶点。每一环半径的变化为(topRadius – bottomRadius)/stackCount;所以基本的创建圆柱体的思路就是遍历每一环创建顶点:
GeometryGenerator::MeshData
GeometryGenerator::CreateCylinder(
float bottomRadius, float topRadius,
float height, uint32 sliceCount, uint32
stackCount)
{
MeshData meshData;
//
// Build Stacks.
//
float stackHeight = height / stackCount;
// Amount t