使用基于GPU的Geometry Clipmap渲染地形（Terrain Rendering Using GPU-Based Geometry Clipmaps）（下）

最新推荐文章于 2024-08-15 11:52:29 发布

wenxiongkjsm

最新推荐文章于 2024-08-15 11:52:29 发布

阅读量1.1k

点赞数

文章标签： shader float blend output integer 网格

使用基于GPU的Geometry Clipmap渲染地形（Terrain Rendering Using GPU-Based Geometry Clipmaps）（下）

翻译：clayman
clayman_joe@yahoo.com.cn
仅供个人学习使用，勿用于任何商业用途，转载请注明作者^_^

1.3.5 Vertex Shader

我们使用同一个vertex shader来渲染之前描述的所有2D footprint。首先，对于给定的footprint坐标(x,y)来说，shader通过简单的缩放和变换来计算它的世界坐标(x,y)。接下来，从顶点纹理中读取高度值z。这里不需要任何过滤器，因为顶点和纹理采样是一一对应的。

为了平滑的过渡，vertex shader将对两个不同层次边界的顶点进行几何混合（blend）。根据顶点（x,y）相对于观察点（Vx，Vy）的位置来计算混合参数alpha。这里：

alpha = max( alphax,alphay):

alphax = clamp(( | x – Vx ) – ( (n-1)/2 – w – 1)) /w , 0, 1)

使用类似的方法计算alphay。

这里，所有的坐标都以clipmap网格范围内[ 0….n-1]之间的值来表示，w是交换区域的宽度（这里选择了 w= n/10）。我们希望除了交换区域外，alpha值都为0，而在交换区域内，值线形的从0增加到1，当达到外围的层次时值为1。下图的蓝色部分即为交换区域。

对于几何混合来说，我们对同一位置（x,y）在不同层次之间的高度值Zf（较好层次的高度值）和Zc（教粗糙层次的高度值）之间进行线形插值。

Z’ = ( 1 – alpha)Zf + alpha * Zc

一般情况下，采样点位于较粗糙网格的边缘，Zc则通过对较粗糙层次边缘两个采样点的均值来计算。我们可以在运行时来完成这一步计算，但这将会导致3次顶点纹理查找（Zf一次，Zc = （Zc1 + Zc2）/2 两次），而对于现在的显卡来说，读取顶点纹理的代价还是很大的。

因此，我们把计算Zc作为更新clipmap的一部分来实现，把Zf和Zc打包为同一个单通道的浮点纹理（pack both Zf and Zc into the same 1-channel floation-point texture）。我们把Zf作为浮点数的整数部分，而差值Zd = Zc – Zf则保存为小数部分。这个打包的步骤将在pixar shader中upsampling的时候完成（详见1.4.1）。

以下是实现clipmap渲染的vertex shader的部分HLSL代码：

struct OUTPUT

{

vector pos : POSITON;

float2 uv : TEXCOORD0; //coordinates for normal-map lookup

float z : TEXCOORD1; //coordinates for elevation-map lookup

float alpha : TEXCOORD2; //transition blend on normal map

};

uniform float4 ScaleFactor, FineBlockOrig;

uniform float2 ViewerPos,AlphaOffse,OneOverWidth;

uniform float zScaleFactor,zTexScaleFactor;

uniform matrix WorldWiewProjMatrix;

uniform sampler ElevationSampler = sampler_state // fine level height sampler

{

Texture = (fineLevelTexture);

MipFilter = None;

MinFilter = Point;

MagFilter = Point;

AddressU = Wrap;

AddressV = Wrap;

};

OUTPUT RenderVS(float2 gridPos: TEXCOORD0)

{

OUTPUT output;

//convert from grid xy to workld xy coordinates

//ScaleFactor.xy:grid spacing of current level

//ScaleFactor.zw:origin fo current block within world

float2 worldPos = gridPos*ScaleFactor.xy + ScaleFactor.zw;

//compute coordinates for vertex texture

//FineBlockOrig.xy: 1/(w,h) of textrue

//FineBlockOrig.zw: origin of block in texture

float2 uv = gridPos * FineBlockOrig.xy + FineBlockOrig.zw;

//sample the vertex texture

float zf_zd = tex2Dlod(ElevationSampler, float4(uv, 0, 1)); float zf_zd = tex2Dlod(ElevationSampler,float4(uv,0,1));

//unpack to obtain zf and zd = (zc-zf)

//zf is elevation value in current(fine) level

//zc is elevation value in coarser level

float zf = floor(zf_zd);

float zd = frac(zf_zd) * 512 - 256; // zd = zc - zf

//computer alpha (transition parameter) and blend elevation

float2 alpha = clamp((abs(worldPos - ViewerPos) - AlphaOffset)* OneOverWidth , 0, 1);

alpha.x = max(alpha.x,alphal.y);

float z = zf + alpha.x * zd;

z = z * ZScaleFactor;

output.uv = uv;

output.z = z * ZtexScaleFactor;

output.alpha = alpha.x;

return output;

}

1.3.6 The Pixar Shader

Pixar shader访问法线图（normal map）并对表面进行着色。我们让法线图的分辨率为几何地形的两倍，因为每个顶点一条法线效果是很模糊的。当更新clipmap时，我们将逐渐计算法线图（详见1.4.3）。

为了平滑的着色过渡，着色器查找当前层次和下一个较粗糙层次的法线值，之后使用vertex shader中计算的alpha值进行混合。一般情况下，这将导致2次纹理查找。因此，我们把两个法线值以（Nx , Ny , Ncx , Ncy）的形式放到一张4通道，每通道8bit的纹理中，这里假设Nz = 1，Ncz = 1。因此在读取之后需要重新归一化（renormalize）。

所上的颜色通过读取一张基于z方向（z-based）的1D纹理贴图来获得。

以下代码是实现clipmap rendering的pixar shader的部分HLSL代码（Upsample和Zcoarser函数详见附件）：

//paramters uv,alpha,and z are interpolated form vertex shader.

//Two texture samplers have min amd mag filters set to linear:

//NormalMapSampler: 2D texture containing normal map

//ZBasedColorSampler: 1D texture containing elevation-based color

uniform float3 LightDirection;

//pixar shader for rendering the geometry clipmap

float4 RenderPS(float2 uv : TEXCOORD0,

float z : TEXCOORD1,

float alpha : TEXCOORD2) : COLOR

{

float4 normal_fc = text2D(NormalMapSampler,uv);

//normal_fc.xy contains normal at current(fine) level

//normal_fc.zw contains normal at coarser level

//blend normals using alpha computed in vertex shader

float3 normal = float3((1-alpha) * normal_fc.xy + alpha * (normal_fc.zw),1);

//unpack coordinates from[0,1] to [-1,1]range,and renormalize

normal = normalize(normal * 2 -1);

//computer simple diffuse lighting

float s = clamp(dot(normal,LightDirection),0,1);

//assign terrain color based on its elevation

return s * tex1D(ZbasedColorSampler,z);

}

1.4 更新

当观察者在场景中移动时,clipmap每层中的窗口都必须进行变换，以保证观察点位于窗口的中央，因此，我们必须有根据的来更新这些窗口。由于观察者的移动是连续的，因此每帧基本上只需要为窗口更新一个L形的区域。另外，对较粗糙的层次来说，相对于观察者移动的距离，它所改变的位置是成指数减少的（the relative motion of the viewer within the windows decreases exponentially at coarser levels），因此，很少需要进行更新。

我们以从粗糙层到较好层的顺序来更新有效的clipmap层。还记得每个clipmap层都储存了两张纹理吗：一张单通道的浮点高度图，一张4通道，每通道8bit的法线图。在更新的过程中，我们通过使用pixel shader进行渲染的方法来对这些纹理区域进行修改。为了避免改变已有的数据，所有的访问操作都是以一种环形（toroidal）的方式来进行，如下图所示：

（红色为进行更新的区域，绿色箭头为观察者移动的方向）

我们使用环绕寻址（wraparound addressing）的方式来定位clipmap窗口在纹理图中的位置。在这种环形的访问方式下，L形的更新区域通常变为一个十字形的区域，如上图所示。通过渲染2个矩形来组成这个渲染区域。需要注意的是如果更新区域跨越了纹理的边界，则有可能需要3到4个矩形。

1.4.1 Upsampling

我们使用一种插值细分（interpolatory subdivison）的方法，通过较混乱的层次来预测较好层次的几何体。这里将使用众所周知的四点插值细分曲线方法的张量积版本， mask weights取( -1/16, 9/16,9/16,-1/16)（We use the tensor-product version of the well-known four-point subdivision curve interpolant ,which has mask weights ( -1/16, 9/16,9/16,-1/16)）。这种upsampling过滤器恰好可以达到我们所希望的C1平滑效果。

根据采样点在网格中的位置（偶数-偶数，偶数-奇数，奇数-偶数，奇数-奇数），选则不同的mask。对一个偶数-偶数的像素来说，只需对纹理进行一次查找，因为采样是通过插值计算得来的；而对一个奇数-奇数的像素来说，则需要对纹理进行4x4次查找；另外的两种情况均需要4次查找。对于CPU来说，可以用一个if块来方便的选择所用的mask值。但是，分支语句对于pixar shader来说代价是很大的。因此，我们总是进行16次纹理查找，另外，通过对一张2x2纹理的查找来选择不同的mask值。

以下是部分upsampling shader的HLSL代码实现：

//residualSampler: 2D texture containing residuals,which can

//be either decompressed data or synthesized noise

//p_uv:coordinates of the grid sample

uniform float2 Scale;

//pixel shader for updating the elevation map

float4 UpsamplePS(float2 p_uv : TEXCOORD0) : COLOR

{

float2 uv = floor(p_uv);

//the Updample function samples the coarser elevation map

//using a linear interpolatory filter with 4x4 taps

//(depending on the even/odd configuration of location uv,

//it applies 1 of 4 possible masks)

float z_predicted = Upsampler(uv); //detail omitted here

//add the residual to get the actual elevation

float residual = tex2D(residualSampler,p_uv * Scale);

float zf = z_predicted + residual;

//zf should always be an integer,since ite gets packed

//into the integer component of the floating-point texture

zf = floor(zf);

//compute zc byu linearlyu interploating the vertices of the

//coarse-grid edge on which the sample p_uv lies

float zc = ZCoarser(uv);

float zd = zc - zf;

//pack the signed difference zd into the fractional componnent

float zf_zd = zd + (zd + 256)/512;

return float4(zf_zd,0,0,0);

}

1.4.2 Residuals

可以通过解压数据或合成的方法获得residuals值，并把它添加到使用upsampling方法获得的较粗糙的数据中。

对于使用压缩数据的方法，我们通过图片压缩技术来对residuals数据进行编码。我们使用了一种称为PTC的有损（lossy）图片编码方式，因为它支持一种高效的region-of-interest解码（Malvar 2000）。通过CPU来进行解压缩，并且把结果储存为一张residual纹理图。

而对另一种方法来说，我们使用不相关高斯噪声（uncorrelated Gaussian noise）（Fournier et al. 1982）作为residuals值，来合成地形的细节（fractal detail）。GPU读取一小张预先计算好的包含了高斯噪声的2D纹理来完成合成，这个过程执行起来是相当快的。另外，把纹理环绕起来，扩展为一张无限大的高斯噪声纹理，同时，对纹理坐标进行一点点放大，来消除规则的周期效果（we enable texture wrapping to extend the Gausian texture infinitely,and apply a small magnification to the texture coordinates to break the regular periodicity）。跨越不同层次的噪声叠加在一起，因此对地形产生的起伏效果不会显示出任何模式性，如下图所示：

1.4.3 法线图（Normal Map）

着色器使用当前层的高度图来为这一层更新法线图。计算两个grid-aligned的切线的叉积来获得当前法线。此外，还对较粗糙的层次进行一次纹查找，获得它的所有法线值。最后，把（Nx, Ny, Ncx, Ncy）打包为一张4通道的纹理。代码详见附件。

1.5 结果和讨论

我们主要的数据是一张216000 * 93600的美国地形高度图，其水平分辨率为30米，垂直分辨率为1米。进行了100倍以上的压缩之后，只需要355MB的内存就可以容纳它。我们在P4 2.4G，1G内存，NIVIDIA Geforce6800 GT的系统上，以1024 * 768的分辨率对地形进行了渲染。

渲染率：在L=11，网格大小n=255，使用了裁减的情况下，获得了每秒130帧的效果，每秒渲染了6千万三角形。使用顶点纹理查找是系统的瓶颈，在移除了查找之后，渲染率提高到了185frames/sec （对n=127的情况来说，使用了顶点纹理也可以达到298 frames/sec.）

Previous Implementation GPU-Based Implementation

Upsampling 3ms 1.0ms

Decompression(on CPU) 8ms 8ms

Synthesis 3ms ~0ms

Normal-Map Computation 11ms 0.6ms

更新率：下表显示了更新过程中每个步骤所需要的时间，这里所统计的时间是最坏情况即每次更新整个层所需的时间。在CPU上完成的解压缩显然是系统的瓶颈。

最终的帧数：对于使用解压方式的地形来说，当观察着移动时，帧数大约在87 frames/sec，对于使用合成方法的来说，则是120 frames/sec。

1.6 总结和改进

我们描述了一个使用GPU实现geometry clipmap的框架。几乎把所有显示几何图形的过程都使用GPU来完成，从而减少CPU的负载。

1.6.1 顶点纹理

Geometry clipmap通过一种非常特别也是受限制的方法来使用vertex texture：实质上每个texels是以光栅扫描（raster-scan）的顺序来访问，和顶点是一种一一对应的关系。因此，我们希望以后的访问机制可以提高渲染效率。

1.6.1 取消法线图

通过访问高度图中四个过滤之后的采样（filtered samplers）点，应该可以直接在pixel shader中计算法线。目前，顶点纹理查找只能使用32-bit的浮点图片，因此不能进行高效的双线形过滤（bilinear filtering）.

1.6.3 Memory-free Terrain Synthesis

在GPU上合成地形细节是如此之快，因此我们可以考虑每一帧都重建clipmap层次，从而节省显存。我们分配2张纹理T1和T2，然后在他们之间相互转换。把T1初始化为较粗糙的几何体作为合成过程的种子。首先，通过pixar shader的更新产生源纹理T1，接下来通过upsample和合成，创建一个较好的层次，并把它保存为T2。然后，把T1作为顶点纹理，渲染他所在的clipmap环。之后，交换两张纹理的规则并重复整个过程，直到所有层都完成合成以及渲染。最初的尝试是很有前途的，在L=9的时帧率约59 frames/sec。