CS001-32-ddgi

wodownload2

已于 2024-07-26 16:05:52 修改

阅读量461

点赞数 2

文章标签： unity

于 2024-07-26 11:26:47 首次发布

本文链接：https://blog.csdn.net/wodownload2/article/details/140711104

版权

概述

DDGI 全称Dynamic Diffuse Global Illumination with Ray-Traced Irradiance Fields，即基于光线追踪辐照度场的动态漫反射全局光照。

DDGI

本文主要是参考：
https://github.com/SanYue-TechArt/RTXGI-DDGI-URP 代码参考
https://zhuanlan.zhihu.com/p/404520592
在上面的代码中有如下几个pass：
在这里插入图片描述
DDGI Ray Trace Pass
DDGI Update Irradiance Pass
DDGI Update Distance Pass
DDGI Relocate Probe Pass
DDGI Classify Probe Pass

发射射线阶段

每个探针发射指定个数的射线，但是方向是随机的。然后射线和场景中的几何体相交，若相交则记录交点相关数据包括：位置、法线、albedo、emission，这些数据最终组合成一个float4(r,g,b,a)其中rgb为辐射度，a表示射线和交点的距离。

// Primary Tracing.
DDGIPayload primaryPayload = GetPrimaryPayload();
TraceRay(_AccelerationStructure, RAY_FLAG_NONE, 0xFF, 0, 0, 0, rayDesc, primaryPayload);

DDGIPayload封装了：

DDGIPayload GetPrimaryPayload()
{
   const DDGIPayload payload = (DDGIPayload) 0;
   return payload;
}

struct DDGIPayload
{
   // For recursive shadow ray tracing.
   bool isShadowPayload;
   bool isInShadow;

   // Ray tracing api data.
   float  distance;
   uint   hitKind;
   float3 worldRayDirection;

   // Ray miss (sky evaluate)
   bool   isMissed;
   float3 skySample;

   // Intersection geometry and brdf data.
   float3 worldPos;
   float3 worldNormal;
   float3 albedo;
   float3 emission;
};

将radiance和distance包含为float4，写入RayBuffer中去。

// 将每一根光线的追踪结果记录在RayBuffer中
RayBuffer[rayBufferIdx] = float4(radiance, distance);

射线辐照度的计算包括：平行光+精确光+自发光+历史帧
平行光的漫反射：

[loop]
for(int i = 0; i < _DirectionalLightCount; ++i)
{
    Light dir_light = GetDDGIDirectionalLight(i);
    
    RayDesc shadowRayDesc;
    shadowRayDesc.Origin    = P;
    shadowRayDesc.Direction = dir_light.direction;
    shadowRayDesc.TMin      = 1e-1f;
    shadowRayDesc.TMax      = FLT_MAX;
    
    DDGIPayload shadowPayload = GetShadowPayload();
    TraceRay(_AccelerationStructure, RAY_FLAG_NONE, 0xFF, 0, 1, 0, shadowRayDesc, shadowPayload);
    
    if(shadowPayload.isInShadow) dir_light.shadowAttenuation = 0.0f;
    
    // Accumulate Radiance.
    radiance += dir_light.color * dir_light.shadowAttenuation * 
            saturate(dot(dir_light.direction, N)) * Lambert() * albedo;
}

其他精确光影的漫反射：

[loop]
for(int j = 0; j < _PunctualLightCount; ++j)
{
    Light punctual_light = GetDDGIPunctualLight(j, P);
    
    float3 lightPos = PunctualLightBuffer[j].position.xyz;
    
    RayDesc shadowRayDesc;
    shadowRayDesc.Origin    = P;
    shadowRayDesc.Direction = punctual_light.direction;
    shadowRayDesc.TMin      = 1e-1f;
    shadowRayDesc.TMax      = length(lightPos - P);
    
    DDGIPayload shadowPayload = GetShadowPayload();
    TraceRay(_AccelerationStructure, RAY_FLAG_NONE, 0xFF, 0, 1, 0, shadowRayDesc, shadowPayload);
    
    if(shadowPayload.isInShadow) punctual_light.shadowAttenuation = 0.0f;
    
    // Accumulate Radiance.
    radiance += punctual_light.color * punctual_light.shadowAttenuation * punctual_light.distanceAttenuation *
        saturate(dot(punctual_light.direction, N)) * Lambert() * albedo;
}

自发光

radiance += primaryPayload.emission;

历史帧

radiance += (min(albedo, float3(0.9f, 0.9f, 0.9f)) * Lambert()) * 
SampleDDGIIrradiance(P, N, primaryPayload.worldRayDirection);

这里的SampleDDGIIrradiance是核心代码。

计算probe的辐射度阶段

https://zhuanlan.zhihu.com/p/507469990
将上一步得到的光追结果RayData(irradiance,distance)进行时域累加更新保存到两张2D纹理中。

// ------------------------------
// Irradiance Evaluation
// ------------------------------
while (remainingRays > 0)
{
    uint numRays = min(CACHE_SIZE, remainingRays);
    if(GroupIndex < numRays)
    {
        RadianceCache[GroupIndex]  = RayBuffer[probeIndex * _MaxRaysPerProbe + offset + GroupIndex].rgb;
        DirectionCache[GroupIndex] = DDGIGetProbeRayDirection(offset + GroupIndex);
    }
    GroupMemoryBarrierWithGroupSync();

    for(uint i = 0; i < numRays; ++i)
    {
        float3 radiance  = RadianceCache[i].rgb;
        float3 direction = DirectionCache[i];
        float  weight    = saturate(dot(probeDirection, direction));
        
        result  += float4(weight * radiance, weight);
    }

    remainingRays -= numRays;
    offset        += numRays;
}

result为累加radiance。中间省略了很多步骤，比如和历史帧的比较，最终：

_ProbeIrradiance[texelLocation] = float4(result.rgb, 1);

每个方向上有多个像素：

// ------------------------------
// Irradiance Border Update
// ------------------------------
for (uint index = GroupIndex; index < BORDER_TEXELS; index += PROBE_IRRADIANCE_TEXELS * PROBE_IRRADIANCE_TEXELS)
{
    const uint3 sourceIndex = uint3(cornerTexelLocation + BORDER_OFFSETS[index].xy, cornerTexelLocation.z);
    const uint3 targetIndex = uint3(cornerTexelLocation + BORDER_OFFSETS[index].zw, cornerTexelLocation.z);
        _ProbeIrradiance[targetIndex] = _ProbeIrradiance[sourceIndex];
}

计算probe的距离阶段

计算两个方向的dot值。

float weight = saturate(dot(probeDirection, direction));
weight       = pow(weight, 64);

weightSum   += weight;
_ProbeDistance[texelLocation] = result;

调整probe阶段

这个步骤主要是为了放置漏光，将卡在墙里的probe移动到墙的外面，算法如下：
https://zhuanlan.zhihu.com/p/698127347
https://zhuanlan.zhihu.com/p/507469990
Relocation主要是为了调整卡墙的probe位置。

精简probe阶段

https://zhuanlan.zhihu.com/p/698757068
Probe Classification用于禁用一些不重要的probe，让其不参与GI计算，这样可以节省性能
那，什么是不重要的probe呢？
NVIDIA SDK默认实现认为：每一个Probe都有自己所影响的体素块范围，那假设这些体素块内没有任何几何体的话，也就没有东西会用得上这块体素的数据，我们就可以禁用这些体素块所对应的probe；如果下一帧有物体运动到体素块内，则重新更新该probe，大概的实现步骤是：
1）我们首先启用所有Probe，每一个Probe像以前一样发射光线查询场景几何获得distance数据
2）获得distance数据后，我们将每一个probe的distance数据与体素块相比对，如果该probe发射的所有光线获得的距离信息都比体素块的边长大，这就表明该probe的体素块内没有任何有效的几何体，我们就将该probe标记为禁用
3）被标记为禁用的probe在后续帧中只会发射很少的固定光线（关于什么是固定光线，我们之前做probe relocation稳定化用到过，可以在后文找到定义）用来查询体素块内是否有几何体，如果检测到几何体闯入它的体素块，那么它会恢复更新（正常发射大量光线）

if(hitDistance <= maxDistance)
{
    float4 rawData           = _ProbeData[outputCoords];
    _ProbeData[outputCoords] = float4(rawData.xyz, DDGI_PROBE_STATE_ACTIVE);
    return;
}

查询阶段

这个也就是使用上面的probe的数据的阶段，核心在计算权重。
https://zhuanlan.zhihu.com/p/507469990
对于一个pixel，查询周围8个probe的数据，然后加权求和得到每个probe的贡献。
probe的权重分为3个方面：方向权重、三线性权重、切比雪夫权重。
方向权重：法线和probe方向的dot值
三线性权重：考虑的是probe和pixel之间的距离权重
切比雪夫权重：https://zhuanlan.zhihu.com/p/404520592

通过切比雪夫不等式来判断probe和pixel之间存在遮挡物的概率，参考vsm的算法。

LitRayTracingForwardPass.hlsl

// Indirect Lighting Evaluate.
float3 indirectRadiance = SampleDDGIIrradiance(inputData.positionWS, inputData.normalWS, 
										-inputData.viewDirectionWS);
float3 indirectLighting = surfaceData.albedo * Lambert() * indirectRadiance * 
							aoFactor.indirectAmbientOcclusion;
#ifdef DDGI_SHOW_INDIRECT_ONLY
    color.rgb           = indirectLighting;
#elif DDGI_SHOW_PURE_INDIRECT_RADIANCE
    color.rgb           = indirectRadiance;
#else
    color.rgb           += indirectLighting;
#endif
return color;

光锥

https://www.bilibili.com/read/cv8375716/
https://developer.nvidia.com/rtx/raytracing/dxr/DX12-Raytracing-tutorial-Part-2
https://www.zhihu.com/question/503182026/answer/3208956976
https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#worldraydirection api内置函数参考

DispatchRaysIndex

内置函数：DispatchRaysIndex
物理是在compute shader还是在ray tracing shader中，都会有一个当前处理的纹理坐标是哪个的问题，在DDGI_RayGen中，如何计算当前正在处理的射线的索引呢？
使用内置函数DispatchRaysIndex()计算。他是一个uint3的数据类型。
其中：xyz代表的意义在C#中定义，如下：


cmd.DispatchRays(mDDGIRayTraceShader, GpuParams.RayGenShaderName, 
    (uint)mDDGIVolumeCpu.NumRays, 
    (uint)numProbesFlat, 
    1, camera);

width=(uint)mDDGIVolumeCpu.NumRays
height=(uint)numProbesFlat
depth=1
所以看代码：

uint probeIdx       = DispatchRaysIndex().y;
uint rayIdx         = DispatchRaysIndex().x;
uint rayBufferIdx   = probeIdx * _MaxRaysPerProbe + rayIdx;

则能明白DispatchRaysIndex().x代表是一个probe中的第几个射线，DispatchRaysIndex().y代表第几个probe，然后用： probeIdx * _MaxRaysPerProbe + rayIdx计算这里的rayBufferIdx。_MaxRaysPerProbe，这里的C#中设置的值是512。mDDGIVolumeCpu.MaxNumRays = 512。
计算出这个索引可以得到每个射线写入的数据的索引：

// 将每一根光线的追踪结果记录在RayBuffer中
RayBuffer[rayBufferIdx] = float4(radiance, distance);

RayTracingAccelerationStructure

加速数据结构
RayTracingAccelerationStructure：“a data structure used to represent the geometry in the scene for gpu ray tracing”
在C#中创建，传递给shader使用，和普通的数据传递方式一样。
C#创建：

// create
RayTracingAccelerationStructure.RASSettings setting = new RayTracingAccelerationStructure.RASSettings
    (RayTracingAccelerationStructure.ManagementMode.Automatic, RayTracingAccelerationStructure.RayTracingModeMask.Everything,  255);
mAccelerationStructure = new RayTracingAccelerationStructure(setting);

// transfer to shader
cmd.BuildRayTracingAccelerationStructure(mAccelerationStructure);
cmd.SetRayTracingAccelerationStructure(mDDGIRayTraceShader, GpuParams._AccelerationStructure, mAccelerationStructure);

shader中使用：

TraceRay(_AccelerationStructure, RAY_FLAG_NONE, 0xFF, 0, 0, 0, rayDesc, primaryPayload);

RayDesc

内置数据类型，参考：https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#worldraydirection
结构如下：

struct RayDesc
{
    float3 Origin;
    float TMin;
    float3 Direction;
    float TMax;
};

DDGIPayload

用户自定义的数据结构，开始的时候有点晕，因为直接传递到TraceRay的函数中了，我以为得符合规范定义数据结构，但是后面才发现这个数据结构完全可以自己定义。
自己定义payload数据结构，那什么时候赋值呢？搞清楚这个就得知道TraceRay的执行过程。
第一步：产生射线，在文https://developer.nvidia.com/rtx/raytracing/dxr/DX12-Raytracing-tutorial-Part-2中说到，这个阶段叫做Ray generation program。
The ray generation program, that will be the starting point of the raytracing, and called for each pixel: it will typically initialize a ray descriptor RayDesc starting at the location of the camera, in a direction given by evaluating the camera lens model at the pixel location. It will then invoke TraceRay(), that will shoot the ray in the scene. Other shaders below will process further events, and return their result to the ray generation shader through the ray payload.
第二步：无交点的情况处理，这个阶段叫做miss shader。
The miss shader is executed when a ray does not intersect any geometry. It can typically sample an environment map, or return a simple color through the ray payload.
第三步：最近交点的情况处理，这个阶段叫做closest shader。
The closest hit shader is called upon hitting a the geometric instance closest to the starting point of the ray. This shader can for example perform lighting calculations, and return the results through the ray payload. There can be as many closest hit shaders as needed, in the same spirit as a rasterization-based application has multiple pixel shaders depending on the objects.
还有可选的shader，一个叫做intersection shader，另一个叫做any hit shader，这里不做介绍。
上面shader的执行流为：

在这里插入图片描述
在源码：https://github.com/SanYue-TechArt/RTXGI-DDGI-URP
中，计算payload的是在LitRayTracing.shader中的DDGIRayTracing的pass中计算的。
也就是ddxr的框架只负责硬件加速发射射线，至于射线相交、不相交等等这些程序，放开给图形程序员自己实现。

HitKind

内置函数: HitKind
在文中：https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html
它是一个uint的数据类型，如果交点在正面返回：HIT_KIND_TRIANGLE_FRONT_FACE,对于的值为254；如果是背面返回的是：HIT_KIND_TRIANGLE_BACK_FACE，对于的值为255。

RayTCurrent

内置函数：RayTCurrent
数据类型为float，代表射线起点到交点的距离，文中：https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html说到交点的计算为：Origin+(Direction*RayTCurrent)。

WorldRayDirection

内置函数：WorldRayDirection()
射线的世界方向。

DDGI实现细节

probe组

probe的个数是(xNum,yNum,zNum)三个维度定义的一个均匀的体。

cmd.DispatchCompute(mUpdateIrradianceCS, mUpdateIrradianceKernel, 
    mDDGIVolumeCpu.NumProbes.x, 
    mDDGIVolumeCpu.NumProbes.z, 
    mDDGIVolumeCpu.NumProbes.y);

如果以y轴split这个volume的化，则每个slice面上的probe的个数为：xNumzNum，同样如果是以z轴切分的话，则每个slice面上的probe个数为xNumyNum。
对应的代码为：

// 获得纵轴上每一个Slice内的Probe数量
int DDGIGetProbesPerPlane(int3 probeCounts)
{
#if 1
    // Left or Right Y-UP
    return (probeCounts.x * probeCounts.z);
#elif 0
    // Left or Right Z-UP
    return (probeCounts.x * probeCounts.y);
#endif
}

如何描述一个probe的位置，可以使用三维的坐标描述(x,y,z)，也可以使用一个int索引描述这个probe，其编号算法如下：

int DDGIGetProbeIndex(int3 probeCoords)
{
    int probesPerPlane      = DDGIGetProbesPerPlane(_ProbeCount);
    int planeIndex          = DDGIGetPlaneIndex(probeCoords);
    int probeIndexInPlane   = DDGIGetProbeIndexInPlane(probeCoords, _ProbeCount);

    return (planeIndex * probesPerPlane) + probeIndexInPlane;
}

这里传入的参数是一个在volume范围内的三维坐标。
DDGIGetProbesPerPlane上面已经说过了，计算每个slice内的probe的个数。
DDGIGetPlaneIndex是计算当前probe对应的平面的索引，如果是以y切分的话，则返回的是y分量。如果是以z轴切分的话，则返回的是z轴分量。
DDGIGetProbeIndexInPlane用来计算在平面内probe的索引值。
最后probe对应的唯一index则为：(planeIndex * probesPerPlane) + probeIndexInPlane;
第几个平面*每个平面的probe数量+当前所在平面的索引=最终probe的一维索引值。

probe映射

如何将每个probe的多个射线计算的辐射度映射到一个2D图片上的一块呢？
这个过程很简单，具体是这样的：以22个probe为例子，每个probe映射为一个88像素块。
在这里插入图片描述

映射代码为：

uint3 DDGIGetProbeBaseTexelCoords(int probeIndex, int numProbeInteriorTexels=6)
{
    uint3 coords = DDGIGetProbeTexelCoordsOneByOne(probeIndex);
    int numProbeTexels = numProbeInteriorTexels + 2;
    return uint3(coords.x * numProbeTexels, coords.y * numProbeTexels, coords.z);
}

DDGIGetProbeTexelCoordsOneByOne这个函数，其实就是将一维的index转为三维的坐标，这个有点多余了。直接传递SV_GroupID即可。但是不管怎样就是得到一个probe的在volume中的坐标值。
numProbeInteriorTexels默认为6，但是下面考虑扩充，左右各加一个像素，构成8*8的像素块。
比如上图：
probe coords=(0,0,0)，则映射到一个张texture左上角的开始的纹理坐标为：(0,0,0)
probe coords=(0,1,0)，则映射到一个张texture左上角的开始的纹理坐标为：(0,8,0)
probe coords=(1,0,0)，则映射到一个张texture左上角的开始的纹理坐标为：(8,0,0)
probe coords=(1,1,0)，则映射到一个张texture左上角的开始的纹理坐标为：(8,8,0)

border扩充
参考：https://zhuanlan.zhihu.com/p/507469990
Probe irradiance/distance border update，由于后续采样这两张纹理使用的是bilinearSampler，所以需要对边界不全以保证采样正确性，补充边界的方法如图：
C#申请rt时指定过滤模式：
mProbeIrradiance.filterMode = FilterMode.Bilinear;
shader：
float3 irradiance = SAMPLE_TEXTURE2D_ARRAY_LOD(
_ProbeIrradianceHistory,
sampler_LinearClamp,
probeIrradianceUV.xy,
probeIrradianceUV.z,
0).rgb;
如何补充边界，边界像素的个数为：4*6+4=28
在这里插入图片描述

这个数组应该这么看：每个是uint4，xy为src的坐标，zw为dest的坐标，意思就是将src对应位置的属性拷贝到dest位置的属性。

static const uint4 BORDER_OFFSETS[BORDER_TEXELS] = 
{
    uint4(6, 6, 0, 0), uint4(6, 1, 1, 0), uint4(5, 1, 2, 0), uint4(4, 1, 3, 0), 
    uint4(3, 1, 4, 0), uint4(2, 1, 5, 0), uint4(1, 1, 6, 0), uint4(1, 6, 7, 0),
    uint4(1, 6, 0, 1), uint4(1, 5, 0, 2), uint4(1, 4, 0, 3), uint4(1, 3, 0, 4), 
    uint4(1, 2, 0, 5), uint4(1, 1, 0, 6), uint4(6, 1, 0, 7), uint4(6, 6, 7, 1), 
    uint4(6, 5, 7, 2), uint4(6, 4, 7, 3), uint4(6, 3, 7, 4), uint4(6, 2, 7, 5), 
    uint4(6, 1, 7, 6), uint4(1, 1, 7, 7), uint4(6, 6, 1, 7), uint4(5, 6, 2, 7), 
    uint4(4, 6, 3, 7), uint4(3, 6, 4, 7), uint4(2, 6, 5, 7), uint4(1, 6, 6, 7) 
};

for (uint index = GroupIndex; index < BORDER_TEXELS; index += PROBE_IRRADIANCE_TEXELS * PROBE_IRRADIANCE_TEXELS)
{
    const uint3 sourceIndex = uint3(cornerTexelLocation + BORDER_OFFSETS[index].xy, cornerTexelLocation.z);
    const uint3 targetIndex = uint3(cornerTexelLocation + BORDER_OFFSETS[index].zw, cornerTexelLocation.z);
    _ProbeIrradiance[targetIndex] = _ProbeIrradiance[sourceIndex];
}

经过上面的扩边操作，展示图如下：
在这里插入图片描述

texel方向

某个像素(66)中的一个像素对应的方向。
float3 probeDirection =
DecodeNormalOctahedron(((GroupThreadId.xy + 0.5f) / (float)PROBE_IRRADIANCE_TEXELS) * 2 - 1);
PROBE_IRRADIANCE_TEXELS=6，这里就是简单的将像素偏移到中心点，所以x和y分量分别加了0.5f，然后除以(6,6)，得到(0,1)之间的方向，然后转为(-1,1)之间，常规操作是2-1即可。
辐射度

while (remainingRays > 0)
{
    uint numRays = min(CACHE_SIZE, remainingRays);
    if(GroupIndex < numRays)
    {
        RadianceCache[GroupIndex]  = RayBuffer[probeIndex * _MaxRaysPerProbe + 
                                                offset + 
                                                GroupIndex].rgb;
        DirectionCache[GroupIndex] = DDGIGetProbeRayDirection(offset + GroupIndex);
    }
    GroupMemoryBarrierWithGroupSync();
    for(uint i = 0; i < numRays; ++i)
    {
        float3 radiance  = RadianceCache[i].rgb;
        float3 direction = DirectionCache[i];
        float  weight    = saturate(dot(probeDirection, direction));
        result  += float4(weight * radiance, weight);
    }
    remainingRays -= numRays;
    offset        += numRays;
}

GroupIndex的范围在0~35之间。
GroupMemoryBarrierWithGroupSync()确保线程组内的36个相乘，都全部执行完毕36个像素点的RadianceCache和DirectionCache的写操作，然后才会下面的for循环。
在这里插入图片描述