Directx11进阶教程之Tiled Based Deffered Shading

前言

很多游戏中存在大量的点光源(PointLight),环境艺术家为了让游戏模拟现实的氛围,一个场景下放下上千个点光源(PointLight)毫不奇怪。

下面介绍下传统的渲染管线大量点光源的表现。

传统前向渲染(Traditional Forward Rendering)的点光源计算

总体意思就是每个物体进行一次renderPass,并把在影响到物体的点光源作为一个数组放在Shader进行计算

总结:传统的前向渲染因为同个像素可能覆盖大量的物体,造成Overdraw很高,浪费了大量的计算,很多计算是不必要的,因为不透明物体的Shading呈现在屏幕的只有最前面的像素。因此引出了延迟渲染。

传统的延迟渲染(Traditional Deffered Rendering)

传统的延迟渲染很简单,就是渲染整个场景的物体输出多张几何贴图,然后利用几何贴图在一次全屏幕绘制的Shading中计算。其中渲染点光源比较流行的办法是把点光源当做一个几何球体(LightSphereVolume),渲染到全屏上,有效计算的每个点光源半径内影响的像素。 并且设置RT为累加模式,N个光源就累加N次,最后得到所有点光源着色的最终效果。

内的像素,而不必要每个光源对全屏像素都计算一次.

总结: 相比于前向渲染,因为我们只渲染最前面的一层像素, overdraw 大量减少,浪费的计算也减少了, 但是N个点光源意味着计算N次光源球体RenderPass, 每个pass中我们都读取了一次各种gbuffer和写入一次shading结果,这导致GPU bandwidth浪费严重。如下所示:

因此图形工程师针对延迟渲染提出了更有效计算点光源的渲染管线:Tiled Deffered Shading

基于分片的延迟渲染(Tiled Based Deffered Shading)

上面传统延迟渲染的示意图说明了传统延迟渲染的GPU Bangwidth高的缺点,按照理想的改进模型如下:

就是最理想的状态是:对于着色每个像素应该只读取一次GBuffer和只写入一次Shading结果

针对这个理想的状态模型, 图形渲染工程师提出分块(tiled)的思想: 延迟渲染的基础上把整个屏幕划分为NxN块,一块(tile)的分辨率是16x16, 利用并行能力强大的computeShader计算哪些光源了哪些块(tile),并且让这些有效点光源对相应块的像素进行Pixel着色

下面简称 TiledBasedDefferedShading 为 TBDS

TBDS的渲染流程:

(1)渲染整个场景的GBuffer

(2)在computeShader里分好每个块(tile),一个块(tile)一般是16x16或者32x32, 计算每个tile的所有像素(一般相机空间比较好)最大和最小的PosZ值

Texture2D<float4> DepthTex:register(t0);
Texture2D<float4> WorldPosTex:register(t1);
Texture2D<float4> WorldNormalTex:register(t2);
Texture2D<float4> SpecularRoughMetalTex:register(t3);
Texture2D<float4> AlbedoTex:register(t4);
SamplerState clampLinearSample:register(s0);
StructuredBuffer<PointLight> PointLights : register(t5);
RWTexture2D<float4> OutputTexture : register(u0);
groupshared uint minDepthInt;
groupshared uint maxDepthInt;
groupshared uint visibleLightCount = 0;
groupshared uint visibleLightIndices[1024];

[numthreads(GroundThreadSize, GroundThreadSize, 1)]
void CS(
	uint3 groupId :  SV_GroupID,
	uint3 groupThreadId : SV_GroupThreadID,
	uint groupIndex : SV_GroupIndex,
	uint3 dispatchThreadId : SV_DispatchThreadID)
//(2)计算每个Tiled的相机空间的MaxZ和MinZ
	float depth = DepthTex[dispatchThreadId.xy].r;
	float viewZ = DepthBufferConvertToLinear(depth);
	uint depthInt = asuint(viewZ);
	minDepthInt = 0xFFFFFFFF;
	maxDepthInt = 0;
	GroupMemoryBarrierWithGroupSync();

	if (depth != 0.0)
	{
		InterlockedMin(minDepthInt, depthInt);
		InterlockedMax(maxDepthInt, depthInt);
	}

	GroupMemoryBarrierWithGroupSync();

	float minViewZ = asfloat(minDepthInt);
	float maxViewZ = asfloat(maxDepthInt);

(3)计算每个块(tile)对应的frustum(相机空间的视截体)

	float3 frustumEqn0, frustumEqn1, frustumEqn2, frustumEqn3;
	uint tileResWidth = GroundThreadSize * GetNumTilesX();
	uint tileResHeight = GroundThreadSize * GetNumTilesY();
	uint pxm = GroundThreadSize * groupId.x;
	uint pym = GroundThreadSize * groupId.y;
	uint pxp = GroundThreadSize * (groupId.x + 1);
	uint pyp = GroundThreadSize * (groupId.y + 1);

	// four corners of the tile, clockwise from top-left
	float3 frustum0 = ConvertProjToView(float4(pxm / (float)tileResWidth*2.f - 1.f, (tileResHeight - pym) / (float)tileResHeight*2.f - 1.f, 1.f, 1.f)).xyz;
	float3 frustum1 = ConvertProjToView(float4(pxp / (float)tileResWidth*2.f - 1.f, (tileResHeight - pym) / (float)tileResHeight*2.f - 1.f, 1.f, 1.f)).xyz;
	float3 frustum2 = ConvertProjToView(float4(pxp / (float)tileResWidth*2.f - 1.f, (tileResHeight - pyp) / (float)tileResHeight*2.f - 1.f, 1.f, 1.f)).xyz;
	float3 frustum3 = ConvertProjToView(float4(pxm / (float)tileResWidth*2.f - 1.f, (tileResHeight - pyp) / (float)tileResHeight*2.f - 1.f, 1.f, 1.f)).xyz;
	frustumEqn0 = CreatePlaneEquation(frustum0, frustum1);
	frustumEqn1 = CreatePlaneEquation(frustum1, frustum2);
	frustumEqn2 = CreatePlaneEquation(frustum2, frustum3);
	frustumEqn3 = CreatePlaneEquation(frustum3, frustum0);

(4) 对每一个块(tile),遍历所有点光源,用frustum和Depth双重剔除,并把影响点光源的的全局索引加入到块(tile)的可见光源列表

//(3)计算和每个Tiled相交的点光源数量,并记录它们的索引
	uint threadCount = GroundThreadSize * GroundThreadSize;
	uint passCount = (int(lightCount) + threadCount - 1) / threadCount;

	for (uint i = 0; i < passCount; ++i)
	{
		uint lightIndex = i * threadCount + groupIndex;
		if (lightIndex >= lightCount)
			continue;

		PointLight light = PointLights[lightIndex];
		float3 viewLightPos = mul(float4(light.pos, 1.0), View).xyz;
		if(TestFrustumSides(viewLightPos, light.radius, frustumEqn0, frustumEqn1, frustumEqn2, frustumEqn3))
		{
			if (minViewZ - viewLightPos.z < light.radius && viewLightPos.z - maxViewZ < light.radius)
			{
				uint offset;
				InterlockedAdd(visibleLightCount, 1, offset);
				visibleLightIndices[offset] = lightIndex;
			}
		}
	}

	GroupMemoryBarrierWithGroupSync();

(5)遍历块(tile)的 可见光源列表的光源,对块内的所有像素进行着色,这样GBuffer的各种RT做到了只读一次,并只写一次Shading结果, GPU bandwidth低


	if (visibleLightCount > 0)
	{
		//G-Buffer-Pos(浪费1 float)
		float2 uv = float2(float(dispatchThreadId.x) / ScreenWidth, float(dispatchThreadId.y) / ScreenHeight);
		float3 worldPos = WorldPosTex.SampleLevel(clampLinearSample, uv, 0).xyz;

		//G-Buffer-Normal(浪费1 float)
		float3 worldNormal = WorldNormalTex.SampleLevel(clampLinearSample, uv, 0).xyz;
		worldNormal = normalize(worldNormal);

		float3 albedo = AlbedoTex.SampleLevel(clampLinearSample, uv, 0).xyz;

		//G-Buffer-Specual-Rough-Metal(浪费1 float)
		float3 gBufferAttrbite = SpecularRoughMetalTex.SampleLevel(clampLinearSample, uv, 0).xyz;
		float specular = gBufferAttrbite.x;
		float roughness = gBufferAttrbite.y;
		float metal = gBufferAttrbite.z;

		for (uint index = 0; index < visibleLightCount; ++index)
		{
			uint lightIndex = visibleLightIndices[index];
			PointLight light = PointLights[lightIndex];
			float3 pixelToLightDir = light.pos - worldPos;
			float distance = length(pixelToLightDir);
			float3 L = normalize(pixelToLightDir);
			float3 V = normalize(cameraPos - worldPos);
			float3 H = normalize(L + V);
			float4 attenuation = light.attenuation;
			float attenua = 1.0 / (attenuation.x + attenuation.y * distance + distance * distance * attenuation.z);
			float3 radiance = light.color * attenua;

			//f(cook_torrance) = D* F * G /(4 * (wo.n) * (wi.n))
			float D = DistributionGGX(worldNormal, H, roughness);
			float G = GeometrySmith(worldNormal, V, L, roughness);
			float3 fo = GetFresnelF0(albedo, metal);
			float cosTheta = max(dot(V, H), 0.0);
			float3 F = FresnelSchlick(cosTheta, fo);
			float3 ks = F;
			float3 kd = float3(1.0, 1.0, 1.0) - ks;
			kd *= 1.0 - metal;

			float3 dfg = D * G * F;
			float nDotl = max(dot(worldNormal, L), 0.0);
			float nDotv = max(dot(worldNormal, V), 0.0);
			float denominator = 4.0 * nDotv * nDotl;
			float3 specularFactor = dfg / max(denominator, 0.001);

			color.xyz += (kd * albedo / PI + specularFactor * specular) * radiance * nDotl * 2.2;
		}
	}

	OutputTexture[dispatchThreadId.xy] = color;

渲染结果对比

传统延迟渲染SphereLightVolume, 400个点光源

分块延迟渲染(TiledBasedDefferedShading),1024个点光源

项目源码链接

https://github.com/2047241149/SDEngine

资料参考

【1】https://newq.net/dl/pub/SA2014Practical.pdf

【2】DirectX 11 Rendering in Battlefield 3 - Frostbite

【3】OpenGL Step by Step - OpenGL Development

【4】AMD Tiled Lighting Direct3D 11 Demo | Geeks3D

【5】http://newq.net/dl/pub/SA2014ManyLightIntro.pdf

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
不错的dx11入门教程 Tutorial 1: Setting up DirectX 11 with Visual Studio Tutorial 2: Creating a Framework and Window Tutorial 3: Initializing DirectX 11 Tutorial 4: Buffers, Shaders, and HLSL Tutorial 5: Texturing Tutorial 6: Diffuse Lighting Tutorial 7: 3D Model Rendering Tutorial 8: Loading Maya 2011 Models Tutorial 9: Ambient Lighting Tutorial 10: Specular Lighting Tutorial 11: 2D Rendering Tutorial 12: Font Engine Tutorial 13: Direct Input Tutorial 14: Direct Sound Tutorial 15: FPS, CPU Usage, and Timers Tutorial 16: Frustum Culling Tutorial 17: Multitexturing and Texture Arrays Tutorial 18: Light Maps Tutorial 19: Alpha Mapping Tutorial 20: Bump Mapping Tutorial 21: Specular Mapping Tutorial 22: Render to Texture Tutorial 23: Fog Tutorial 24: Clipping Planes Tutorial 25: Texture Translation Tutorial 26: Transparency Tutorial 27: Reflection Tutorial 28: Screen Fades Tutorial 29: Water Tutorial 30: Multiple Point Lights Tutorial 31: 3D Sound Tutorial 32: Glass and Ice Tutorial 33: Fire Tutorial 34: Billboarding Tutorial 35: Depth Buffer Tutorial 36: Blur Tutorial 37: Coming Soon... DirectX 10 Tutorials: Tutorial 1: Setting up DirectX 10 with Visual Studio Tutorial 2: Creating a Framework and Window Tutorial 3: Initializing DirectX 10 Tutorial 4: Buffers, Shaders, and HLSL Tutorial 5: Texturing Tutorial 6: Diffuse Lighting Tutorial 7: 3D Model Rendering Tutorial 8: Loading Maya 2011 Models Tutorial 9: Ambient Lighting Tutorial 10: Specular Lighting Tutorial 11: 2D Rendering Tutorial 12: Font Engine Tutorial 13: Direct Input Tutorial 14: Direct Sound Tutorial 15: FPS, CPU Usage, and Timers Tutorial 16: Frustum Culling Tutorial 17: Multitexturing and Texture Arrays Tutorial 18: Light Maps Tutorial 19: Alpha Mapping Tutorial 20: Bump Mapping Tutorial 21: Specular Mapping Tutorial 22: Render to Texture Tutorial 23: Fog Tutorial 24: Clipping Planes Tutorial 25: Texture Translation Tutorial 26: Transparency Tutorial 27: Reflection Tutorial 28: Screen Fades Tutorial 29: Water Tutorial 30: Multiple Point Lights Tutorial 31: 3D Sound Tutorial 32: Glass and Ice Tutorial 33: Fire Tutorial 34: Billboarding Tutorial 35: Depth Buffer Tutorial 36: Blur Tutorial 37: Coming Soon... DirectX 10 Terrain Tutorials: Tutorial 1: Grid and Camera Movement Tutorial 2: Height Maps Tutorial 3: Terrain Lighting Tutorial 4: Terrain Texturing Tutorial 5: Color Mapped Terrain Tutorial 6: Quad Trees Tutorial 7: Coming Soon... 。。。。。。。。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值