关于GPU pixel shader中计算屏幕空间偏导数相关功能及其背后逻辑的思考

最新推荐文章于 2024-08-09 08:36:15 发布

置顶小姚同志

最新推荐文章于 2024-08-09 08:36:15 发布

阅读量566

点赞数 1

分类专栏：计算机图形学

本文链接：https://blog.csdn.net/yaorongzhen123/article/details/89483720

版权

计算机图形学专栏收录该内容

16 篇文章 0 订阅

订阅专栏

在阅读NVIDIA samples的shader源码时，看到了这样一段注释：

//Compute gradient using ddx/ddy before any branching

那么上面这段注释怎么理解呢？

pixel shader 好像是在SM3.0(准确的应该是sm2.x及以后的版本)后提供了ddx和ddy来计算pixel相关属性(如纹理坐标等)的导数功能的函数，这个功能是怎么实现的呢？

我猜测的：之前看gpu architecture相关的资料时知道gpu里面有许多SP Unit(具体看GPU architechture相关资料），我们知道pixel shader的运行是parallel的，同一时间会有很多的pixel运行pixel shader,每个fragment通俗的将都是独立并行运行的，而计算导数需要知道它临近像素的相关值，他是怎么计算的呢？实际上gpu针对每个raster阶段生成的fragment会产生好几个（数量待查）叫做sample的东西，（MSAA就用到这些sample来进行去锯齿操作）。其实就是fragment,只不过这些额外的sample不参见fragment shader运算，但是他们包含有一些有用的属性信息（包含哪些属性待查），所以fragment shader是可以获取到这些邻居sample的属性来计算出ddx/ddy的。

上面猜测的部分是错误的，MSAA需要设置开启才能使用，这个功能会占用更多空间。

看完下面的解释就会理解了。大概意思就是ddx/ddy这种指令的实现需要相邻pixel运行pixel shader时要处于相同的代码路径上，即他们的运算逻辑要执行一样的，才会得到正确的ddx/ddy值。ddx/ddy这种指令本质上就是取相邻pixel的属性差值，如果相邻pixel都不是运行的相同的逻辑代码，比如这个pixel的深度值是通过公式1计算得到的，邻近Pixel的深度值运行不同的代码路径，始终保持0，那ddx/ddy得到的结果就是错误的没意义的，所以才会有这样的规定，即shader 代码中不能在flow control branch的代码块中包含ddx/ddy这种指令或会调用这种指令的函数。

网上相关的解释：

微软网站上的描述说到了这个问题：

Pixel shader flow control instructions have limits affecting how many levels of nesting can be included in the instructions. In addition, there are some limitations for implementing per-pixel flow control with gradient instructions.

The pixel shader instruction set includes several instructions that produce or use gradients of quantities with respect to screen space x and y. The most common use for gradients is to compute level-of-detail calculations for texture sampling, and in the case of anisotropic filtering, selecting samples along the axis of anisotropy. Typically, hardware implementations run the pixel shader on multiple pixels simultaneously (such as a 2x2 grid), so that gradients of quantities computed in the shader can be reasonably approximated as deltas of the values at the same point of execution in adjacent pixels.

When flow control is present in a shader, the result of a gradient calculation requested inside a given branch path is ambiguous when adjacent pixels may execute separate flow control paths. Therefore, it is deemed illegal to use any pixel shader operation that requests a gradient calculation to occur at a location that is inside a flow control construct which could vary across pixels for a given primitive being rasterized.

All pixel shader instructions are partitioned into those operations that are permitted and into those that are not permitted inside of flow control:

Scenario A: Operations that are not permitted inside flow control that could vary across the pixels in a primitive. These include the operations listed in the following table.

Instruction	Is Permitted in Flow Control when:
texld - ps_2_0 and up, texldb - ps and texldp - ps	A temporary register is used for the texture coordinate.
dsx - ps and dsy - ps	A temporary register is used for the operand.

Scenario B: Operations that are permitted anywhere. These include the operations listed in the following table.

Instruction	Is Permitted Anywhere when:
texld - ps_2_0 and up, texldb - ps and texldp - ps	A read-only quantity is used for the texture coordinate (may vary per-pixel, such as interpolated texture coordinates).
dsx - ps and dsy - ps	A read-only quantity is used for the input operand (may vary per-pixel, such as interpolated texture coordinates).
texldl - ps	The user provides level-of-detail as an argument, so there are no gradients, and thus no issue with flow control.
texldd - ps	The user provides gradients as input arguments, so there is no issue with flow control.

These restrictions are strictly enforced in shader validation. Scenarios having a branch condition that looks like it would branch consistently across a primitive, even though an operand in the condition expression is a pixel-shader-computed quantity, nevertheless still fall into scenario A and are not permitted. Similarly, scenarios where gradients are requested on some shader-computed quantity x from inside dynamic flow control, yet where it appears that x is not modified across any of the branches, nevertheless still fall into scenario A and are not permitted.

Predication is included in these restrictions on flow control, so that implementations remain free to trivially interchange the implementation of branch instructions with predicated instructions.

The user can use instructions from scenarios A and B together. For example, suppose the user needs an anisotropic texture sample given a shader computed texture coordinate; however, the texture load is only needed for pixels satisfying some per-pixel condition. To meet these requirements, the user can compute the texture coordinate for all pixels, outside per-pixel varying flow control, immediately computing gradients using dsx - psand dsy - ps instructions. Then, within a per-pixel if bool - ps/endif - ps block, the user can use texldd - ps (texture load with user provided gradients), passing the precalculated gradients. Another way to describe this usage pattern is that, while all pixels in the primitive had to compute the texture coordinates and be involved with gradient calculation, only the pixels that needed to sample a texture actually did so.

Regardless of these rules, the burden is still on the user to ensure that before computing any gradient (or performing a texture sample that implicitly computes a gradient), the register containing the source data must have been initialized for all execution paths beforehand. Initialization of temporary registers is not validated or enforced in general.

也就是说，pixel shader中，if else等这种条件判断分支语句(branching)的code block内不能含有会引发ddx/ddy这种计算导数函数的相关调用，包括sample texture,因为sample texture可能会因为需要计算mip相关信息而调用ddx/ddy,所以不能直接用sample,而可以用那些用户提供mip level值的sample函数。或者如果需要用到这些导数相关的值，可以在branching 语句之前调用ddx/ddy这些函数先求出这些值。

如果对GPU架构有些了解，能更好的理解这个问题。

参考资料：

1.https://docs.microsoft.com/en-us/windows/desktop/direct3dhlsl/dx9-graphics-reference-asm-ps-instructions-flow-control

2.http://download.nvidia.com/developer/cuda/seminar/TDCI_Arch.pdf