Rasterized Voxel-Based Dynamic Global Illumination(基于光栅化体素的动态全局照明)

__浅墨__

已于 2024-08-01 13:18:43 修改

阅读量1k

点赞数 17

分类专栏： GPU Pro 文章标签：图形渲染

于 2024-07-30 09:36:44 首次发布

本文链接：https://blog.csdn.net/qq_41940565/article/details/140727139

版权

GPU Pro 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

简介
对于现代游戏来说，提供逼真的环境变得越来越重要，而逼真的照明则是其中的核心。因此，不仅要考虑直接光照，还要考虑间接光照。同时，现代游戏越来越多地为玩家提供了与游戏世界互动的可能性，即移动物体、破坏建筑物或动态改变照明。这就是动态全局照明发挥作用的地方；与静态的预计算解决方案相比，动态全局照明能够考虑到高度动态的环境。

基于栅格化体素的动态全局照明考虑到了场景，尤其是室内场景，可能包含大量不同类型的光源（如点光源、聚光灯和定向光源）。同时，它在保持高交互帧率的同时，还能产生视觉效果良好且稳定的结果。

概述
该技术有效地利用了 DirectX 硬件引入的功能来实现上述结果。

第一步，使用硬件光栅化器为场景创建体素网格表示法。在这里，场景不经过深度测试就被渲染到一个小型 2D 渲染目标中，但栅格化后的像素不是输出到绑定的渲染目标中，而是通过原子函数写入一个 3D 读写缓冲区。这样，光栅化器就变成了一个 “体素化器”，可以高效地创建一个高度填充的场景三维网格。这个体素网格包含几何体的漫反射和法线信息，随后用于生成间接照明和测试几何体遮挡。由于体素网格在每一帧都会重新创建，因此所提出的技术是完全动态的，不依赖任何预先计算。

第二步，每个光源对网格内的体素进行照明。然后将照明转换为虚拟点光源（VPL），并以二阶球面谐波系数（SH-系数）的形式存储。通过使用内置的混合阶段，图形硬件再次得到利用，以合并每个光源的结果。随后，生成的 VPL 在网格内传播，以产生间接照明。与[Kaplanyan and Dachsbacher 10]提出的光传播体积（LPV）技术相比，既不需要为每个光源创建反射阴影图，也不需要在之后将 VPL 注入网格。此外，还不需要单独生成遮挡信息，并且获得的信息将比深度剥离等方法获得的信息更加精确。

实现

创建场景的体素网格表示法

首先，我们需要定义立方体素网格的属性，即它的范围、位置和视图/投影矩阵。为了避免由于场景的离散体素网格表示导致的闪烁，网格需要与观众相机同步移动并永久对齐到网格单元边界。为了正确地将我们的场景映射到体素网格，我们需要使用正交投影；因此，我们将使用三个视图矩阵来更全面地覆盖场景：一个用于从后到前的视图，一个用于从右到左的视图，和一个用于从上到下的视图。所有其他计算将完全在GPU上完成。

接下来，我们渲染位于网格边界内的场景几何体，禁用颜色写入和深度测试，将其渲染到一个小的2D渲染目标中。我们将使用一个32×32×32的网格；为此，使用一个64×64像素的渲染目标就足够了，并且我们将输出结果到一个读写缓冲区中。基本上，我们通过顶点着色器将三角形顶点传递给几何着色器。在几何着色器中选择一个视图矩阵，使得三角形在该视图中最为可见，以获得最高数量的光栅化像素。此外，三角形在归一化设备坐标中的大小增加了当前绑定渲染目标的像素大小。这样，因当前绑定渲染目标的低分辨率而被丢弃的像素仍然会被光栅化。光栅化的像素以原子方式写入像素着色器中的3D读写结构化缓冲区中。与[Mavridis and Papaioannou 11]的做法相比，这种方式无需在几何着色器中放大几何体即可获得场景的高度填充的3D网格表示。

代码清单7.1展示了在DirectX 11中使用HLSL实现这一过程的方法。

//顶点着色器

VS_OUTPUT main(VS_INPUT input)
{
    VS_OUTPUT output;
    output.position = float4(input.position, 1.0f);
    output.normal = input.normal;
    return output;
}

//几何着色器
static float3 viewDirections[3] = 
{
    float3(0.0f, 0.0f, -1.0f);
    float3(-1.0f, 0.0f, 0.0f),
    float3(0.0f, -1.0f, 0.0f)
};

//获取视图分辨率最大的的那个值
int GetViewIndex(in float3  normal)
{
    float3 directionMatrix;
    directionMatrix[0] = -viewDirections[0];
    directionMatrix[1] = -viewDirections[1];
    directionMatrix[2] = -viewDirections[2];

    float3 dotProduct = abs(mul(directionMatrix, normal));
    float maximun = max( max(dotProduct.x, dotProduct.y), dotProduct.z);
    
    int index;
    if(maximum == dotProducts.x)
        index = 0;
    else if(maximum == dotProducts.y)
        index = 1;
    else
        index = 2;
   
     return index;
}

[maxvertexcount(3)]
void main(triangle VS_OUTPUT input[3], inout TriangleStream<GS_OUTPUT>
            outputStream)
{
    float3 faceNormal = normalzie(input[0].normal + input[1].normal + input[2].normal);
    
    //获取当前三角形最可见的视图，以实现图元的最高光栅化。    
    int viewIndex = GetViewIndex(faceNormal);

    GS_OUTPUT output[3];
    [unroll]
    for(int i = 0; i < 3; i++)
    {
        output[i].position = mul(constBuffer.gridViewProjMatrices[viewIndex], 
                                input[i].position);
        //世界空间下的坐标
        output[i].positionWS = input[i].position.xyz;
        output[i].texCoords = input[i].texCoords;
        output[i].normal = input[i].normal;
    }

    //将归一化设备坐标中三角形的大小增加当前绑定渲染目标的纹理像素大小。
    float2 size0N = normalize(output[1].position.xy - output[0].position.xy);    
    float2 size1N = normalize(output[2].position.xy - output[1].position.xy);
    float2 size2N = normalize(output[0].position.xy - output[2].position.xy);
    float texelSize = 1.0f / 64.0f;
    output[0].position.xy += normalize(-side0N+side2N)*texelSize;
    output[1].position.xy += normalize(side0N-side1N)*texelSize;
    output[2].position.xy += normalize(side1N-side2N)*texelSize;
    
    [unroll]
    for(int j = 0; j < 3; j++)
        outputStream.Append(output[j]);    

    outputStream.RastartStrip();
}

//像素着色器
struct VOXEL
{
    uint colorMask;
    uint4 normalMasks;
    uint occlusion; //如果遮挡>0，体素仅包含几何信息
};
 
RWStructuredBuffer<VOXEL> gridBuffer : register(u1);

//四面体四个面的归一化方向
static float3 faceVectors[4] = 
{
    float3(0.0f,-0.57735026f,0.81649661f),
    float3(0.0f,-0.57735026f,-0.81649661f),
    float3(-0.81649661f,0.57735026f,0.0f),
    float3(0.81649661f,0.57735026f,0.0f)
};

 int GetNormalIndex(in float3 normal,out float dotProduct)
 {
     float4x3 faceMatrix;
     faceMatrix[0] = faceVectors[0];
     faceMatrix[1] = faceVectors[1];
     faceMatrix[2] = faceVectors[2];
     faceMatrix[3] = faceVectors[3];
 
     float4 dotProducts = mul(faceMatrix,normal);
     float maximum = max (max(dotProducts.x,dotProducts.y),
     max(dotProducts.z,dotProducts.w));
     int index;
     if(maximum==dotProducts.x)
         index = 0;
     else if(maximum==dotProducts.y)
         index = 1;
     else if(maximum==dotProducts.z)
         index = 2;
     else
         index = 3;

     dotProduct = dotProducts[index];
     return index;
 }

 void main(GS_OUTPUT input)
 {
     float3 base = colorMap.Sample(colorMapSampler,input.texCoords).rgb;
     
     //将颜色编码为无符号整数的24位低位，每个颜色通道使用8位。
     uint colorMask = EncodeColor(base.rgb);
     
     //计算颜色的颜色通道对比度，并将值写入颜色掩码的最高8位。
     float contrast = length(base.rrg-base.gbb) / (sqrt(2.0f)+base.r+base.g+base.b);
     int iContrast = int(contrast*255.0f);
     colorMask |= iContrast<<24;
    
     //将法线编码为无符号整数的低位27位，每个轴使用8位作为值，1位作为符号。
     float3 normal = normalize(input.normal);
     uint normalMask = EncodeNormal(normal.xyz);
     
     //计算四面体当前法线的哪一面最接近，并将相应的点积写入法线掩码的最高5位。
     float dotProduct;
     int normalIndex = GetNormalIndex(normal,dotProduct);
     int iDotProduct = int(saturate(dotProduct)*31.0f);
     normalMask |= iDotProduct<<27;
        
     //在体素网格中进行偏移。
     float3 offset = (input.positionWS-constBuffer.snappedGridCenter) *             
     constBuffer.invGridCellSize;
     offset = round(offset);
    
     //获取体素网格中的位置。
     int3 voxelPos = int3(16,16,16)+int3(offset);

     //仅输出网格边界内的体素。
      if((voxelPos.x>-1)&&(voxelPos.x<32)&&(voxelPos.y>-1)&&
         (voxelPos.y<32)&&(voxelPos.z>-1)&&(voxelPos.z<32))
      {
         // 将索引导入体素网格。
         int gridIndex = (voxelPos.z*1024)+(voxelPos.y*32)+voxelPos.x;
         
         // 输出颜色
         InterlockedMax(gridBuffer[gridIndex].colorMask,colorMask);
         
         // 输出法线.
         InterlockedMax(gridBuffer[gridIndex].normalMasks[normalIndex],
         normalMask);
         
         // 标记包含几何信息的体素。
         InterlockedMax(gridBuffer[gridIndex].occlusion,1);
       }
  }


                               //清单7.1。体素网格的生成。

为了避免多个线程写入同一位置时出现竞争情况，必须使用原子函数。由于DirectX 11中原子操作仅支持整数类型，所有值必须转换为整数。在众多DirectX 11缓冲区中，选择了RWStructuredBuffer，因为这是唯一一种可以在单个缓冲区中保存多个整数变量并同时对其执行原子操作的方法。

由于体素是实际场景的简化表示，详细的几何信息会丢失。为了在最终的全局照明输出中增强颜色漂移，优先选择在颜色通道中有高差异（“对比度”）的颜色。通过将对比度值写入整数颜色掩码的最高8位，具有最高对比度的颜色会自动占据主导地位，因为我们使用InterlockedMax()将结果写入体素网格。例如，由于细小的几何体在一个体素中可能有相反的法线，因此不仅颜色需要小心写入，法线也需要小心处理。因此，确定当前法线最接近哪个四面体的面。通过将相应的点积写入整数法线掩码的最高5位，自动选择最接近每个四面体面的法线，因为我们再次使用InterlockedMax()写入结果。根据检索到的四面体面，将法线写入体素的相应法线通道。稍后，当体素被照亮时，选择最接近光线向量的法线以获得最佳照明。这种方式有时可能导致法线来自不同的几何面，而非颜色面。然而，由于体素在其边界内浓缩了实际几何信息，这完全没问题，不会对全局照明结果产生负面影响。

由于所有操作都是使用非常小的渲染目标（64×64像素）完成的，这一步骤速度非常快。下图展示了Sponza场景的体素网格表示截图。为了创建这张图像，对于渲染场景的每个可见像素，重建其世界空间位置。借助重建的位置，我们可以确定相应体素在网格内的位置。最后，每个可见像素将被相应体素中存储的漫反射反照率信息着色。为了覆盖整个场景，使用了两个嵌套体素网格：靠近观众的区域使用高分辨率网格，远离的区域使用低分辨率网格。

在体素空间中创建虚拟点光源 (VPLs)

在这一步中，我们完全从先前生成的体素网格中创建虚拟点光源 (VPLs)。对于位于网格边界内的每个光源，我们渲染一个32×32像素的小四边形。通过硬件实例化，我们能够使用一个绘制调用为每个四边形渲染32个实例。顶点通过顶点着色器后，几何着色器将选择当前绑定的2D纹理数组中的相应渲染目标切片。然后像素着色器根据当前光源的类型照亮所有包含几何信息的体素。最后，照亮的体素被转换为虚拟点光源的二阶球谐表示。通过使用加法硬件混合，所有光源的结果自动结合。代码清单7.2展示了如何在DirectX 11中使用HLSL实现这一过程。

//顶点着色器

VS_OUTPUT main(VS_INPUT input , uint instanceID: SV_InstanceID)
{
    VS_OUTPUT output;
    output.position = float4(input.position ,1.0f);
    output. instanceID = instanceID;
    return output;
}

//几何着色器

struct GS_OUTPUT
{
    float4 position: SV_POSITION;
    uint rtIndex: SV_RenderTargetArrayIndex;
};

[maxvertexcount(4)]

void main(line VS_OUTPUT input[2],inout TriangleStream <GS_OUTPUT >
outputStream)
{
    // 从两个角顶点生成四边形。
    GS_OUTPUT output[4];
    
    // 左下顶点
    output[0].position = float4(input[0].position.x,input[0].position.y,
    input[0].position .z,1.0f);
    
    // 右下顶点
    output[1].position = float4(input[1].position.x,input[0].position.y,
    input[0].position .z,1.0f);
    
    // 左上顶点
    output[2].position = float4(input[0].position.x,input[1].position.y,
    input[0].position .z,1.0f);
    
    // 右上顶点
    output[3].position = float4(input[1].position.x,input[1].position.y,
    input[0].position .z,1.0f);

    //通过使用硬件实例化，将使用相应的实例ID调用几何着色器32次。
    //对于每次调用，实例ID用于确定当前四边形应光栅化到2D纹理阵列的哪个切片中。

    [unroll]
    for( int i=0;i<4;i++)
    {
    output[i].rtIndex = input[0]. instanceID;
    outputStream. Append(output[i]);
    }
    outputStream.RestartStrip();
}


//片元着色器
StructuredBuffer < VOXEL > gridBuffer: register(t0);
struct FS_OUTPUT
{
    float4 fragColor0: SV_TARGET0; // red SH-coefficients
    float4 fragColor1: SV_TARGET1; // blue SH- coefficients
    float4 fragColor2: SV_TARGET2; // green SH- coefficients
};

// 计算箝位余弦波瓣函数的二阶SH系数。
float4 ClampedCosineSHCoeffs(in float3 dir)
{
    float4 coeffs;
    coeffs.x = PI/(2.0f*sqrt(PI));
    coeffs.y = -((2.0f*PI)/3.0f)*sqrt(3.0f/(4.0f*PI));
    coeffs.z = ((2.0f*PI)/3.0f)*sqrt(3.0f/(4.0f*PI));
    coeffs.w = -((2.0f*PI)/3.0f)*sqrt(3.0f/(4.0f*PI));
    coeffs.wyz *= dir;
    return coeffs;
}

//确定四个指定法线中哪一个最接近指定方向。
//该函数返回最接近的法线，并将相应的点积作为输出参数。
float3 GetClosestNormal(in uint4 normalMasks ,in float3 direction ,out float
dotProduct)
{
    float4x3 normalMatrix;
    normalMatrix[0] = DecodeNormal(normalMasks.x);
    normalMatrix[1] = DecodeNormal(normalMasks.y);
    normalMatrix[2] = DecodeNormal(normalMasks.z);
    normalMatrix[3] = DecodeNormal(normalMasks.w);
    float4 dotProducts = mul( normalMatrix , direction);
    float maximum = max (max( dotProducts.x,dotProducts.y),
    max(dotProducts.z,dotProducts.w));
    int index;
    if(maximum == dotProducts.x)
        index = 0;
    else if(maximum == dotProducts.y)
        index = 1;
    else if(maximum == dotProducts.z)
        index = 2;
    else
        index = 3;

    dotProduct = dotProducts[ index];
    return normalMatrix[ index];
}

PS_OUTPUT main( GS_OUTPUT input)
{
    PS_OUTPUT output;
    
    // 获取当前体素的索引。
    int3 voxelPos = int3(input.position .xy,input.rtIndex );
    int gridIndex = ( voxelPos.z*1024)+( voxelPos.y*32)+voxelPos.x;
    
    // 如果体素没有几何信息，则获取体素数据并尽早退出。
    VOXEL voxel = gridBuffer[gridIndex];
    if(voxel. occlusion==0)
        discard;
    
    // 获取体素的世界空间位置。
    int3 offset = samplePos -int3(16,16,16);
    float3 position = (float3(offset)* constBuffer.gridCellSize)+
    constBuffer.snappedGridCenter;
    
    // 解码体素的颜色。
    float3 albedo = DecodeColor( voxel. colorMask);
    
    // 获取最接近光线方向的体素法线。
    float nDotL;
    float3 normal = GetClosestNormal( voxel.normalMasks ,lightVecN ,nDotL);
    
    // 根据当前灯光类型计算漫反射照明
    float3 diffuse = CalcDiffuseLighting(albedo,nDotL);
#ifdef USE_SHADOWS
    
    //借助阴影贴图根据当前灯光类型计算阴影项。
    float shadowTerm = ComputeShadowTerm( position );
    diffuse *= shadowTerm;
#endif
    //计算VPL的箝位余弦波瓣SH系数。
    float4 coeffs = ClampedCosineSHCoeffs( normal);
    
    // 输出SH-每个颜色通道的系数。
    output. fragColor0 = coeffs*diffues.r;
    output. fragColor1 = coeffs*diffuse.g;
    output. fragColor2 = coeffs*diffuse.b;
    return output;
}

//                           清单7.2 VPL的创建

为了输出所有三个颜色通道的二阶SH系数，这次我们渲染到三个具有半浮点精度的2D纹理数组中。由于所有计算完全在体素空间中进行，并且仅限于实际包含几何信息的体素，该技术在增加不同类型光源的数量时表现出良好的扩展性。

在许多情况下，我们甚至可以放弃使用点光源和聚光灯的阴影贴图，而不会显著影响最终渲染输出。然而，对于大型点光源、聚光灯和方向光源，我们需要使用阴影贴图以避免光线泄漏。在这里，我们可以简单地重用已经为直接照明步骤创建的阴影贴图。

传播虚拟点光源 (VPLs)

在这一步中，先前创建的虚拟点光源 (VPLs) 根据[Kaplanyan and Dachsbacher 10]提出的LPV技术在网格中迭代传播。基本上，每个VPL单元沿笛卡尔坐标系的三个轴向其周围的六个邻居单元传播光线。在进行传播时，使用第一步中的体素网格来确定光线传输到邻居单元的遮挡程度。第一次传播步骤的结果用于执行第二次传播步骤。如此迭代进行，直到我们得到视觉上令人满意的光线分布。第一遍迭代不使用遮挡，以初步让光线分布；从第二次迭代开始，使用几何遮挡以避免光线泄漏。

迭代传播在DirectX 11中通过利用计算着色器完成，因为我们不需要光栅化管线来完成这项工作。代码清单7.3展示了这一过程。

// compute shader
Texture2DArray inputRedSHTexture: register(t0);
Texture2DArray inputGreenSHTexture: register(t1);
Texture2DArray inputBlueSHTexture: register(t2);
StructuredBuffer < VOXEL > gridBuffer: register(t3);
RWTexture2DArray <float4> outputRedSHTexture: register(u0);
RWTexture2DArray <float4> outputGreenSHTexture: register(u1);
RWTexture2DArray <float4> outputBlueSHTexture: register(u2);

// 通往六个邻近区域中心的方向
static float3 directions[6] =
{
    float3(0.0f,0.0f,1.0f), float3(1.0f,0.0f,0.0f), float3(0.0f,0.0f, -1.0f),
    float3( -1.0f ,0.0f,0.0f), float3(0.0f,1.0f,0.0f), float3(0.0f, -1.0f ,0.0f)
};

// 六个面的SH系数 ( ClampedCosineSHCoeffs(directions[0 -5])
static float4 faceCoeffs[6] =
{
    float4(PI/(2*sqrt(PI)),0.0f,((2*PI)/3.0f)*sqrt(3.0f/(4*PI)),0.0f),
    float4(PI/(2*sqrt(PI)),0.0f,0.0f,-((2*PI)/3.0f)*sqrt(3.0f/(4*PI))),
    float4(PI/(2*sqrt(PI)),0.0f,-((2*PI)/3.0f)*sqrt(3.0f/(4*PI)),0.0f),
    float4(PI/(2*sqrt(PI)),0.0f,0.0f,((2*PI)/3.0f)*sqrt(3.0f/(4*PI))),
    float4(PI/(2*sqrt(PI)), -((2*PI )/3.0f)*sqrt(3.0f/(4*PI)),0.0f,0.0f),
    float4(PI/(2*sqrt(PI)),((2*PI)/3.0f)*sqrt(3.0f/(4*PI)),0.0f,0.0f)
};

// 六个相邻单元中心的偏移量
static int3 offsets[6] =
{
    int3(0,0,1), int3(1,0,0), int3(0,0,-1),
    int3(-1,0,0), int3(0,1,0), int3(0,-1,0)
};

[numthreads(8,8 ,8)]
void main(uint3 GroupID: SV_GroupID ,uint3 DispatchThreadID:
        SV_DispatchThreadID, uint3 GroupThreadID: SV_GroupThreadID ,
        uint GroupIndex: SV_GroupIndex)
{
    // 获取当前单元格的网格位置。
    int3 elementPos = DispatchThreadID.xyz;
    
    // 用当前单元格的值初始化 SH - 系数。
    float4 sumRedSHCoeffs = inputRedSHTexture. Load(int4(elementPos ,0));
    float4 sumGreenSHCoeffs = inputGreenSHTexture. Load(int4(elementPos ,0));
    float4 sumBlueSHCoeffs = inputBlueSHTexture. Load(int4(elementPos ,0));
    
    [unroll]
    for( int i=0;i<6;i++)
    {
        //  获取六个相邻单元格的网格位置。
        int3 samplePos = elementPos+ offsets[i];
        // 如果单元格出界，则继续
        if ((samplePos.x <0)||( samplePos.x >31)||( samplePos.y <0)||
            (samplePos.y >31)||( samplePos.z <0)||( samplePos.z >31))
            continue;
        // 邻近区域的 SH- 系数。
        float4 redSHCoeffs = inputRedSHTexture. Load(int4(samplePos ,0));
        float4 greenSHCoeffs = inputGreenSHTexture. Load(int4(samplePos ,0));
        float4 blueSHCoeffs = inputBlueSHTexture. Load(int4(samplePos ,0));
#ifdef USE_OCCLUSION
        float4 occlusionCoeffs = float4 (0.0f,0.0f,0.0f,0.0f);
        
        // 获取相应体素的索引。
        int gridIndex = (samplePos.z *1024)+( samplePos.y *32)+ samplePos.x;
        VOXEL voxel = gridBuffer[gridIndex];

        //如果体素包含几何信息，则查找与当前方向最接近的法线。
        //这样就可以生成最高的闭塞。然后计算检索到的法线的 SH 系数。
        if(voxel. occlusion > 0)
        {
            float dotProduct;
            float3 occlusionNormal = GetClosestNormal( voxel.normalMasks ,
            -directions[i],dotProduct);
            occlusionCoeffs = ClampedCosineSHCoeffs(occlusionNormal);
        }
#endif
        [unroll]
        for( int j=0;j<6;j++)
        {
            
            // 获取当前单元格面到当前相邻单元格中心的方向。
            float3 neighborCellCenter = directions[i];
            float3 facePosition = directions[j ]*0.5f;
            float3 dir = facePosition - neighborCellCenter;
            float fLength = length(dir);
            dir /= fLength;
            
            // 获取相应的实体角度。
            float solidAngle = 0.0f;
            if(fLength >0.5f)
            solidAngle = (fLength >=1.5f) ? (22.95668f/(4*180.0f)) :
            (24.26083f/(4*180.0f));
            
            // 计算 SH 方向系数。
            float4 dirSH;
            result.x = 1.0f/(2*sqrt(PI));
            result.y = -sqrt(3.0f/(4*PI));
            result.z = sqrt(3.0f/(4*PI));
            result.w = -sqrt(3.0f/(4*PI));
            result.wyz *= dir;
            
            // 计算从邻单元到当前单元面的通量。
            float3 flux;
            flux.r = dot(redSHCoeffs ,dirSH);
            flux.g = dot(greenSHCoeffs ,dirSH);
            flux.b = dot(blueSHCoeffs ,dirSH);
            flux = max(0.0f,flux)* solidAngle;
#ifdef USE_OCCLUSION
            // 应用闭塞
            float occlusion = 1.0f- saturate(dot(occlusionCoeffs ,dirSH));
            flux *= occlusion;
#endif
            // 为 SH 系数总和添加贡献。
            float4 coeffs = faceCoeffs[j];
            sumRedSHCoeffs += coeffs*flux.r;
            sumGreenSHCoeffs += coeffs*flux.g;
            sumBlueSHCoeffs += coeffs*flux.b;
        }
    }
    // 写出生成的红色、绿色和蓝色 SH- 系数。
    outputRedSHTexture[elementPos] = sumRedSHCoeffs;
    outputGreenSHTexture[elementPos] = sumGreenSHCoeffs;
    outputBlueSHTexture[elementPos] = sumBlueSHCoeffs;
}
                             //清单7.3 VPL的传播。

对于每个传播步骤，计算着色器以4×4×4线程组派发，总共利用32×32×32线程，对应于网格的总单元数。

应用间接照明

在这一步中，先前传播的虚拟点光源 (VPLs) 最终应用到场景中。为此，我们需要一个深度缓冲区，从中可以重建可见像素的世界空间位置，以及一个包含每个像素扰动法线信息的法线缓冲区。显然，作为直接照明方法的延迟渲染非常适合我们的情况，因为这两部分信息已经可用，无需额外工作。

在渲染全屏四边形时，像素着色器重建每个像素的世界空间位置和法线。根据世界空间位置，使用线性硬件过滤采样先前生成的网格（以三维纹理数组形式）。因此，我们仅需手动对第三维度进行过滤，以获得平滑的结果。借助采样的SH系数和表面法线，我们可以对每个像素执行SH照明。详见代码清单7.4。

// pixel shader
Texture2DArray inputRedSHTexture : register(t0);
Texture2DArray inputGreenSHTexture : register(t1);
Texture2DArray inputBlueSHTexture : register(t2);

PS_OUTPUT main(GS_OUTPUT input)
{
	PS_OUTPUT output;
	float depth = depthBuffer.Sample(depthBufferSampler, input.texCoords).r;
	float4 position = ReconstructPositionFromDepth(depth);
	float3 albedo = colorMap.Sample(colorMapSampler, input.texCoords).rgb;
	float3 normal =
		normalBuffer.Sample(normalBufferSampler, input.texCoords).xyz;
	// 获取网格的偏移量。
	float3 offset = (position.xyz - constBuffer.snappedGridCenter) *
		constBuffer.invGridCellSize;
	
    // 将 texCoords 存入 Texture2DArray。
	float3 texCoords = float3(16.5f, 16.5f, 16.0f) + offset;
	texCoords.xy /= 32.0f;
	
    // 获取三线采样的 texCoords .
	int lowZ = floor(texCoords.z);
	int highZ = min(lowZ + 1, 32 - 1);
	float highZWeight = texCoords.z - lowZ;
	float lowZWeight = 1.0f - highZWeight;
	float3 tcLow = float3(texCoords.xy, lowZ);
	float3 tcHigh = float3(texCoords.xy, highZ);
	
    //  从纹理 2DArray. 对红色、绿色和蓝色 SH- 系数进行三线性采样
	float4 redSHCoeffs =
		lowZWeight * inputRedSHTexture.Sample(linearSampler, tcLow) +
		highZWeight * inputRedSHTexture.Sample(linearSampler, tcHigh);
	float4 greenSHCoeffs =
		lowZWeight * inputGreenSHTexture.Sample(linearSampler, tcLow) +
		highZWeight * inputGreenSHTexture.Sample(linearSampler, tcHigh);
	float4 blueSHCoeffs =
		lowZWeight * inputBlueSHTexture.Sample(linearSampler, tcLow) +
		highZWeight * inputBlueSHTexture.Sample(linearSampler, tcHigh);
	
    // 计算表面法线的箝位余弦叶 SH- 系数。
	float4 surfaceNormalLobe = ClampedCosineSHCoeffs(normal);
	
    // 进行漫射 SH 照明 .
	float3 diffuseGlobalIllum;
	diffuseGlobalIllum.r = dot(redSHCoeffs, surfaceNormalLobe);
	diffuseGlobalIllum.g = dot(greenSHCoeffs, surfaceNormalLobe);
	diffuseGlobalIllum.b = dot(blueSHCoeffs, surfaceNormalLobe);
	diffuseIllum = max(diffuseIllum, float3(0.0f, 0.0f, 0.0f));
	diffuseGlobalIllum /= PI;
	output.fragColor = float4(diffuseGlobalIllum * albedo, 1.0f);
	return output;
}
                            //清单7.4 间接光照

如上所述，仅计算漫反射间接照明。然而，可以从SH系数中提取主导光源【Sloan 08】。借助提取的光源，可以执行常规的镜面反射照明。这使我们能够添加一种快速但粗略的镜面反射间接照明近似。

清除体素网格

在最后一步中，通过使用简单的计算着色器清除用于体素网格的RWStructuredBuffer。与传播步骤类似，计算着色器以4×4×4线程组分派，每个线程组运行8×8×8个线程，总共利用了32×32×32个线程，这对应于网格的总体素数。

处理大型环境

由于我们使用的是32×32×32体素网格，显然这单独无法处理现实中的大型游戏环境。根据[Kaplanyan and Dachsbacher 10]提出的LPV技术中的多级方法，可以使用多个嵌套体素网格。每个网格将具有相同数量的单元，但网格单元的大小将增加。这样可以在靠近观察者的区域内保持详细的间接照明，而在远处使用足够的粗间接照明。然而，为了避免网格之间边界处的硬切换，应在线性插值时在间接照明应用于每个可见像素的步骤中进行。对于边界区域，全球光照值对两个相邻网格都进行计算。当前像素的世界空间位置到高分辨率网格中心的距离可用于在线性插值之间的全局光照值。

结果

表7.1显示了使用提议的技术在Sponza场景中的性能结果。所有光源都是完全动态的，并对间接照明做出贡献。

7.6 结论
通过利用新的DirectX 11功能，这项技术能够在包含大量光源的现实游戏环境中保持高交互帧率的同时，生成视觉效果良好且稳定的结果。此外，视频内存消耗保持在较低水平。

然而，除了这项技术只能在DirectX 11或更高硬件上运行之外，它还需要深度和法线缓冲区来重建可见像素的位置和法线。因此，它最适合于诸如延迟渲染的直接照明方法。

参考文献

[Kaplanyan and Dachsbacher 10] Anton Kaplanyan and Carsten Dachsbacher. “Cascaded Light Propagation Volumes for Real-Time Indirect Illumination.” In Symposium on Interactive 3D Graphics and Games (I3D), pp. 99–107. New York: ACM, 2010.

[Mavridis and Papaioannou 11] Pavlos Mavridis and Georgios Papaioannou. “Global Illumination Using Imperfect Volumes.” Presentation, International Conference on Computer Graphics Theory and Applications (GRAPP), Algarve, Portugal, 2011.

[Sloan 08] Peter-Pike Sloan. “Stupid Spherical Harmonics (SH) Tricks.” Companion to Game Developers Conference 2008 lecture, http://www.ppsloan.org/ publications/StupidSH36.pdf, February 2008.