Directx11进阶教程之Tiled Based Deffered Shading




传统前向渲染(Traditional Forward Rendering)的点光源计算



传统的延迟渲染(Traditional Deffered Rendering)

传统的延迟渲染很简单,就是渲染整个场景的物体输出多张几何贴图,然后利用几何贴图在一次全屏幕绘制的Shading中计算。其中渲染点光源比较流行的办法是把点光源当做一个几何球体(LightSphereVolume),渲染到全屏上,有效计算的每个点光源半径内影响的像素。 并且设置RT为累加模式,N个光源就累加N次,最后得到所有点光源着色的最终效果。


总结: 相比于前向渲染,因为我们只渲染最前面的一层像素, overdraw 大量减少,浪费的计算也减少了, 但是N个点光源意味着计算N次光源球体RenderPass, 每个pass中我们都读取了一次各种gbuffer和写入一次shading结果,这导致GPU bandwidth浪费严重。如下所示:

因此图形工程师针对延迟渲染提出了更有效计算点光源的渲染管线:Tiled Deffered Shading

基于分片的延迟渲染(Tiled Based Deffered Shading)

上面传统延迟渲染的示意图说明了传统延迟渲染的GPU Bangwidth高的缺点,按照理想的改进模型如下:


针对这个理想的状态模型, 图形渲染工程师提出分块(tiled)的思想: 延迟渲染的基础上把整个屏幕划分为NxN块,一块(tile)的分辨率是16x16, 利用并行能力强大的computeShader计算哪些光源了哪些块(tile),并且让这些有效点光源对相应块的像素进行Pixel着色

下面简称 TiledBasedDefferedShading 为 TBDS



(2)在computeShader里分好每个块(tile),一个块(tile)一般是16x16或者32x32, 计算每个tile的所有像素(一般相机空间比较好)最大和最小的PosZ值

Texture2D<float4> DepthTex:register(t0);
Texture2D<float4> WorldPosTex:register(t1);
Texture2D<float4> WorldNormalTex:register(t2);
Texture2D<float4> SpecularRoughMetalTex:register(t3);
Texture2D<float4> AlbedoTex:register(t4);
SamplerState clampLinearSample:register(s0);
StructuredBuffer<PointLight> PointLights : register(t5);
RWTexture2D<float4> OutputTexture : register(u0);
groupshared uint minDepthInt;
groupshared uint maxDepthInt;
groupshared uint visibleLightCount = 0;
groupshared uint visibleLightIndices[1024];

[numthreads(GroundThreadSize, GroundThreadSize, 1)]
void CS(
	uint3 groupId :  SV_GroupID,
	uint3 groupThreadId : SV_GroupThreadID,
	uint groupIndex : SV_GroupIndex,
	uint3 dispatchThreadId : SV_DispatchThreadID)
	float depth = DepthTex[dispatchThreadId.xy].r;
	float viewZ = DepthBufferConvertToLinear(depth);
	uint depthInt = asuint(viewZ);
	minDepthInt = 0xFFFFFFFF;
	maxDepthInt = 0;

	if (depth != 0.0)
		InterlockedMin(minDepthInt, depthInt);
		InterlockedMax(maxDepthInt, depthInt);


	float minViewZ = asfloat(minDepthInt);
	float maxViewZ = asfloat(maxDepthInt);


	float3 frustumEqn0, frustumEqn1, frustumEqn2, frustumEqn3;
	uint tileResWidth = GroundThreadSize * GetNumTilesX();
	uint tileResHeight = GroundThreadSize * GetNumTilesY();
	uint pxm = GroundThreadSize * groupId.x;
	uint pym = GroundThreadSize * groupId.y;
	uint pxp = GroundThreadSize * (groupId.x + 1);
	uint pyp = GroundThreadSize * (groupId.y + 1);

	// four corners of the tile, clockwise from top-left
	float3 frustum0 = ConvertProjToView(float4(pxm / (float)tileResWidth*2.f - 1.f, (tileResHeight - pym) / (float)tileResHeight*2.f - 1.f, 1.f, 1.f)).xyz;
	float3 frustum1 = ConvertProjToView(float4(pxp / (float)tileResWidth*2.f - 1.f, (tileResHeight - pym) / (float)tileResHeight*2.f - 1.f, 1.f, 1.f)).xyz;
	float3 frustum2 = ConvertProjToView(float4(pxp / (float)tileResWidth*2.f - 1.f, (tileResHeight - pyp) / (float)tileResHeight*2.f - 1.f, 1.f, 1.f)).xyz;
	float3 frustum3 = ConvertProjToView(float4(pxm / (float)tileResWidth*2.f - 1.f, (tileResHeight - pyp) / (float)tileResHeight*2.f - 1.f, 1.f, 1.f)).xyz;
	frustumEqn0 = CreatePlaneEquation(frustum0, frustum1);
	frustumEqn1 = CreatePlaneEquation(frustum1, frustum2);
	frustumEqn2 = CreatePlaneEquation(frustum2, frustum3);
	frustumEqn3 = CreatePlaneEquation(frustum3, frustum0);

(4) 对每一个块(tile),遍历所有点光源,用frustum和Depth双重剔除,并把影响点光源的的全局索引加入到块(tile)的可见光源列表

	uint threadCount = GroundThreadSize * GroundThreadSize;
	uint passCount = (int(lightCount) + threadCount - 1) / threadCount;

	for (uint i = 0; i < passCount; ++i)
		uint lightIndex = i * threadCount + groupIndex;
		if (lightIndex >= lightCount)

		PointLight light = PointLights[lightIndex];
		float3 viewLightPos = mul(float4(light.pos, 1.0), View).xyz;
		if(TestFrustumSides(viewLightPos, light.radius, frustumEqn0, frustumEqn1, frustumEqn2, frustumEqn3))
			if (minViewZ - viewLightPos.z < light.radius && viewLightPos.z - maxViewZ < light.radius)
				uint offset;
				InterlockedAdd(visibleLightCount, 1, offset);
				visibleLightIndices[offset] = lightIndex;


(5)遍历块(tile)的 可见光源列表的光源,对块内的所有像素进行着色,这样GBuffer的各种RT做到了只读一次,并只写一次Shading结果, GPU bandwidth低

	if (visibleLightCount > 0)
		//G-Buffer-Pos(浪费1 float)
		float2 uv = float2(float(dispatchThreadId.x) / ScreenWidth, float(dispatchThreadId.y) / ScreenHeight);
		float3 worldPos = WorldPosTex.SampleLevel(clampLinearSample, uv, 0).xyz;

		//G-Buffer-Normal(浪费1 float)
		float3 worldNormal = WorldNormalTex.SampleLevel(clampLinearSample, uv, 0).xyz;
		worldNormal = normalize(worldNormal);

		float3 albedo = AlbedoTex.SampleLevel(clampLinearSample, uv, 0).xyz;

		//G-Buffer-Specual-Rough-Metal(浪费1 float)
		float3 gBufferAttrbite = SpecularRoughMetalTex.SampleLevel(clampLinearSample, uv, 0).xyz;
		float specular = gBufferAttrbite.x;
		float roughness = gBufferAttrbite.y;
		float metal = gBufferAttrbite.z;

		for (uint index = 0; index < visibleLightCount; ++index)
			uint lightIndex = visibleLightIndices[index];
			PointLight light = PointLights[lightIndex];
			float3 pixelToLightDir = light.pos - worldPos;
			float distance = length(pixelToLightDir);
			float3 L = normalize(pixelToLightDir);
			float3 V = normalize(cameraPos - worldPos);
			float3 H = normalize(L + V);
			float4 attenuation = light.attenuation;
			float attenua = 1.0 / (attenuation.x + attenuation.y * distance + distance * distance * attenuation.z);
			float3 radiance = light.color * attenua;

			//f(cook_torrance) = D* F * G /(4 * (wo.n) * (wi.n))
			float D = DistributionGGX(worldNormal, H, roughness);
			float G = GeometrySmith(worldNormal, V, L, roughness);
			float3 fo = GetFresnelF0(albedo, metal);
			float cosTheta = max(dot(V, H), 0.0);
			float3 F = FresnelSchlick(cosTheta, fo);
			float3 ks = F;
			float3 kd = float3(1.0, 1.0, 1.0) - ks;
			kd *= 1.0 - metal;

			float3 dfg = D * G * F;
			float nDotl = max(dot(worldNormal, L), 0.0);
			float nDotv = max(dot(worldNormal, V), 0.0);
			float denominator = 4.0 * nDotv * nDotl;
			float3 specularFactor = dfg / max(denominator, 0.001); += (kd * albedo / PI + specularFactor * specular) * radiance * nDotl * 2.2;

	OutputTexture[dispatchThreadId.xy] = color;


传统延迟渲染SphereLightVolume, 400个点光源





【2】DirectX 11 Rendering in Battlefield 3 - Frostbite

【3】OpenGL Step by Step - OpenGL Development

【4】AMD Tiled Lighting Direct3D 11 Demo | Geeks3D


