本部分使用间接绘制命令在一个绘制调用中为不同的模型发布多个实例化绘制。
一、间接绘制原理
本示例主要演示间接绘制命令的用法。在vulkan除了诸如vkCmdDraw和vkCmdDrawIndexed指定绘制内容的参数直接传递到函数(“直接绘制”)的绘制函数外,vulkan还存在间接绘制命令。
vkCmdDrawIndirect和vkCmdDrawIndexedIndirect从包含要发布的绘制命令的描述的缓冲区对象中获取绘制命令,包括实例和索引计数,顶点偏移等。这还允许使用单个绘制命令绘制多个几何图形,只要它们支持通过相同的顶点(和索引)缓冲区。
间接绘制增加了生成(和更新)实际绘图命令的方式,因为该缓冲区可以离线生成和更新,而无需实际更新包含实际绘图功能的命令缓冲区。
使用间接绘图,您可以提前在CPU上离线生成绘图命令,甚至可以使用着色器对其进行更新(因为它们存储在设备本地缓冲区中)。这增加了许多新的可能性来更新绘制命令,而无需使用CPU,包括基于GPU的剔除。
本示例生成单个间接缓冲区,该缓冲区包含在随机位置,比例和旋转下使用不同实例渲染对象多次的12种不同植物的绘制命令。屏幕上看到的整个树叶(和树木)仅使用一次绘制调用即可绘制。
从单个文件加载不同的植物网格,并将其存储在单个索引和顶点缓冲区中,将索引偏移存储在间接绘制命令中。
二、间接绘制步骤
2.1 准备间接绘制缓冲区数据
// 间接绘制实例数
#define OBJECT_INSTANCE_COUNT 2048
// 地坪半径
#define PLANT_RADIUS 40.0f
本示例从头开始生成间接绘图缓冲区。第一步是为间接绘制生成数据。Vulkan为此有一个专用的结构,该VkDrawIndexedIndirectCommand示例std::vector< VkDrawIndexedIndirectCommand >在将其上载到GPU之前使用来存储它们:
//准备和存储一个包含间接绘制命令的缓冲区
void prepareIndirectData()
{
...
//为场景中的每个网格创建一个间接命令
uint32_t m = 0;
for (auto& meshDescriptor : meshes.plants.meshDescriptors)
{
VkDrawIndexedIndirectCommand indirectCmd{};
indirectCmd.instanceCount = OBJECT_INSTANCE_COUNT;
indirectCmd.firstInstance = m * OBJECT_INSTANCE_COUNT;
indirectCmd.firstIndex = meshDescriptor.indexBase;
indirectCmd.indexCount = meshDescriptor.indexCount;
indirectCommands.push_back(indirectCmd);
m++;
}
...
}
meshDescriptor由网格加载程序生成,并在全局索引缓冲区内包含索引基数和该网格的计数,该全局索引缓冲区包含用于风景的所有植物网格。
第二个植物网格的间接绘制命令如下所示:
indirectCmd.indexCount = 1668;
indirectCmd.instanceCount = 2048;
indirectCmd.firstIndex = 960;
indirectCmd.vertexOffset = 0;
indirectCmd.firstInstance = 2048;
这将导致绘制从索引960开始的2048个索引数据实例(使用1668个索引),第一个实例也很重要,因为着色器将其用于该对象的实例属性(位置,旋转,缩放)。
填充该向量后,我们需要创建用于GPU读取以下间接命令的缓冲区:
VK_CHECK_RESULT(vulkanDevice->createBuffer(
VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
&indirectCommandsBuffer,
stagingBuffer.size));
vulkanDevice->copyBuffer(&stagingBuffer, &indirectCommandsBuffer, queue);
要将缓冲区用于间接绘制命令,您需要VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT在创建时指定用法标记。由于缓冲区不会再在主机端更改,因此我们将其暂存到GPU以最大化性能。
2.2 准备实例位置等缓冲区数据
创建完间接绘制模型数据后,我们需要随机生成各实例模型的位置及大小参数供着色器使用
//准备和存储一个包含网格绘制实例数据的缓冲区
void prepareInstanceData()
{
std::vector<InstanceData> instanceData;
instanceData.resize(objectCount);
//在地坪半径内随机生成植被的位置及大小
std::default_random_engine rndEngine(benchmark.active ? 0 : (unsigned)time(nullptr));
std::uniform_real_distribution<float> uniformDist(0.0f, 1.0f);
for (uint32_t i = 0; i < objectCount; i++) {
instanceData[i].rot = glm::vec3(0.0f, float(M_PI) * uniformDist(rndEngine), 0.0f);
float theta = 2 * float(M_PI) * uniformDist(rndEngine);
float phi = acos(1 - 2 * uniformDist(rndEngine));
instanceData[i].pos = glm::vec3(sin(phi) * cos(theta), 0.0f, cos(phi)) * PLANT_RADIUS;
instanceData[i].scale = 1.0f + uniformDist(rndEngine) * 2.0f;
instanceData[i].texIndex = i / OBJECT_INSTANCE_COUNT;
}
vks::Buffer stagingBuffer;
VK_CHECK_RESULT(vulkanDevice->createBuffer(
VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
&stagingBuffer,
instanceData.size() * sizeof(InstanceData),
instanceData.data()));
VK_CHECK_RESULT(vulkanDevice->createBuffer(
VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT,
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
&instanceBuffer,
stagingBuffer.size));
vulkanDevice->copyBuffer(&stagingBuffer, &instanceBuffer, queue);
stagingBuffer.destroy();
}
之后就是简单的传统步骤,不再赘述;
2.3 着色器
本案例用了三个着色器:天空盒、地坪、植被。我们主要来说一下植被所使用的的着色器:
顶点着色器:
#version 450
// mesh数据
layout (location = 0) in vec4 inPos;
layout (location = 1) in vec3 inNormal;
layout (location = 2) in vec2 inUV;
layout (location = 3) in vec3 inColor;
// 实例位置大小数据
layout (location = 4) in vec3 instancePos;
layout (location = 5) in vec3 instanceRot;
layout (location = 6) in float instanceScale;
layout (location = 7) in int instanceTexIndex;
layout (binding = 0) uniform UBO
{
mat4 projection;
mat4 modelview;
} ubo;
layout (location = 0) out vec3 outNormal;
layout (location = 1) out vec3 outColor;
layout (location = 2) out vec3 outUV;
layout (location = 3) out vec3 outViewVec;
layout (location = 4) out vec3 outLightVec;
out gl_PerVertex
{
vec4 gl_Position;
};
void main()
{
outColor = inColor;
outUV = vec3(inUV, instanceTexIndex);
outUV.t = 1.0 - outUV.t;
//角度变换矩阵
mat4 mx, my, mz;
// 围绕x轴旋转
float s = sin(instanceRot.x);
float c = cos(instanceRot.x);
mx[0] = vec4(c, s, 0.0, 0.0);
mx[1] = vec4(-s, c, 0.0, 0.0);
mx[2] = vec4(0.0, 0.0, 1.0, 0.0);
mx[3] = vec4(0.0, 0.0, 0.0, 1.0);
// 围绕y轴旋转
s = sin(instanceRot.y);
c = cos(instanceRot.y);
my[0] = vec4(c, 0.0, s, 0.0);
my[1] = vec4(0.0, 1.0, 0.0, 0.0);
my[2] = vec4(-s, 0.0, c, 0.0);
my[3] = vec4(0.0, 0.0, 0.0, 1.0);
// 围绕z轴旋转
s = sin(instanceRot.z);
c = cos(instanceRot.z);
mz[0] = vec4(1.0, 0.0, 0.0, 0.0);
mz[1] = vec4(0.0, c, s, 0.0);
mz[2] = vec4(0.0, -s, c, 0.0);
mz[3] = vec4(0.0, 0.0, 0.0, 1.0);
mat4 rotMat = mz * my * mx;
outNormal = inNormal * mat3(rotMat);
vec4 pos = vec4((inPos.xyz * instanceScale) + instancePos, 1.0) * rotMat;
gl_Position = ubo.projection * ubo.modelview * pos;
vec4 wPos = ubo.modelview * vec4(pos.xyz, 1.0);
vec4 lPos = vec4(0.0, -5.0, 0.0, 1.0);
outLightVec = lPos.xyz - pos.xyz;
outViewVec = -pos.xyz;
}
从顶点着色器中,我们可以到,为了模拟真实的场景植被模型错乱分布,我们使用了不同的矩阵来处理随机方向。
片元着色器:
#version 450
layout (binding = 1) uniform sampler2DArray samplerArray;
layout (location = 0) in vec3 inNormal;
layout (location = 1) in vec3 inColor;
layout (location = 2) in vec3 inUV;
layout (location = 3) in vec3 inViewVec;
layout (location = 4) in vec3 inLightVec;
layout (location = 0) out vec4 outFragColor;
void main()
{
vec4 color = texture(samplerArray, inUV);
if (color.a < 0.5)
{
discard;
}
vec3 N = normalize(inNormal);
vec3 L = normalize(inLightVec);
vec3 ambient = vec3(0.65);
vec3 diffuse = max(dot(N, L), 0.0) * inColor;
outFragColor = vec4((ambient + diffuse) * color.rgb, 1.0);
}
片元着色器很简单仅是采样植被图片及简单光照处理。
2.4 渲染
在渲染的时候,我们首选需要绑定两个缓冲区,一个是mesh数据对应点饿缓冲区0,还有一个是实例位置大小的缓冲区数据1:
// 植被
vkCmdBindPipeline(drawCmdBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, pipelines.plants);
// Binding point 0 : Mesh vertex buffer
vkCmdBindVertexBuffers(drawCmdBuffers[i], VERTEX_BUFFER_BIND_ID, 1, &models.plants.vertices.buffer, offsets);
// Binding point 1 : Instance data buffer
vkCmdBindVertexBuffers(drawCmdBuffers[i], INSTANCE_BUFFER_BIND_ID, 1, &instanceBuffer.buffer, offsets);
vkCmdBindIndexBuffer(drawCmdBuffers[i], models.plants.indices.buffer, 0, VK_INDEX_TYPE_UINT32);
如果multiDrawIndirect支持,我们可以通过一次绘制来发出所有间接绘制:
void buildCommandBuffers()
{
...
for (int32_t i = 0; i < drawCmdBuffers.size(); ++i)
{
vkCmdDrawIndexedIndirect(drawCmdBuffers[i], indirectCommandsBuffer.buffer, 0, indirectDrawCount, sizeof(VkDrawIndexedIndirectCommand));
}
}
我们只是将缓冲区句柄传递给包含间接绘图命令和drawCounts数量的缓冲区。
此类的非间接,非实例化等效项是:
for (auto indirectCmd : indirectCommands)
{
for (uint32_t j = 0; j < indirectCmd.instanceCount; j++)
{
vkCmdDrawIndexed(drawCmdBuffers[i], indirectCmd.indexCount, 1, indirectCmd.firstIndex, 0, indirectCmd.firstInstance + j);
}
}
如果GPU不支持,multiDrawIndirect我们必须使用缓冲区偏移量一次发出间接绘制命令:
for (auto j = 0; j < indirectCommands.size(); j++)
{
vkCmdDrawIndexedIndirect(drawCmdBuffers[i], indirectCommandsBuffer.buffer, j * sizeof(VkDrawIndexedIndirectCommand), 1, sizeof(VkDrawIndexedIndirectCommand));
}
运行,我们可以看到,我们通过调用一次间接绘制,共生产了2048*12=24576个模型,这种技术非常适合用于创建植被等小型琐碎模型。