3.7 The Geometry Shader 几何着色器
The geometry shader can turn primitives into other primitives, something the tessellation stage cannot do. For example, a triangle mesh could be transformed to a wireframe view by having each triangle create line edges. Alternately, the lines could be replaced by quadrilaterals facing the viewer, so making a wireframe rendering with thicker edges [1492]. The geometry shader was added to the hardware-accelerated graphics pipeline with the release of DirectX 10, in late 2006. It is located after the tessellation shader in the pipeline, and its use is optional. While a required part of Shader Model 4.0, it is not used in earlier shader models. OpenGL 3.2 and OpenGL ES 3.2 support this type of shader as well.
几何着色器可以将图元转换为其他图元,这是曲面细分阶段无法做到的。 例如,通过让每个三角形创建线边,可以将三角形网格转换为线框视图。 或者,这些线可以用面向观察者的四边形代替,因此制作具有较厚边缘的线框渲染 [1492]。 随着 DirectX 10 的发布,几何着色器于 2006 年底添加到硬件加速图形管线中。它位于管线中的曲面细分着色器之后,其使用是可选的。 虽然它是 Shader Model 4.0 的必需部分,但它并未在早期的着色器模型中使用。 OpenGL 3.2 和 OpenGL ES 3.2 也支持这种类型的着色器。
The input to the geometry shader is a single object and its associated vertices. The object typically consists of triangles in a strip, a line segment, or simply a point. Extended primitives can be defined and processed by the geometry shader. In particular, three additional vertices outside of a triangle can be passed in, and the two adjacent vertices on a polyline can be used. See Figure 3.12. With DirectX 11 and Shader Model 5.0, you can pass in more elaborate patches, with up to 32 control points. That said, the tessellation stage is more efficient for patch generation [175].
几何着色器的输入是单个对象及其关联的顶点。 该对象通常由带状三角形、线段或简单的点组成。 扩展图元可以由几何着色器定义和处理。 特别地,可以传入三个额外的三角形外部的顶点,并且可以使用折线上的两个相邻顶点。 见图 3.12。 借助 DirectX 11 和 Shader Model 5.0,您可以传入更精细的补丁,最多可达 32 个控制点。 也就是说,曲面细分阶段对于补丁生成来说更有效 [175]。
Figure 3.12. Geometry shader input for a geometry shader program is of some single type: point, line segment, triangle. The two rightmost primitives include vertices adjacent to the line and triangle objects. More elaborate patch types are possible.
图 3.12。 几何着色器程序的几何着色器输入是某种单一类型:点、线段、三角形。 最右边的两个图元包括与直线和三角形对象相邻的顶点。 更精细的补丁类型是可能的。
The geometry shader processes this primitive and outputs zero or more vertices, which are treated as points, polylines, or strips of triangles. Note that no output at all can be generated by the geometry shader. In this way, a mesh can be selectively modified by editing vertices, adding new primitives, and removing others.
几何着色器处理这个图元并输出零个或多个顶点,这些顶点被视为点、折线或三角形带。 请注意,几何着色器根本不会生成任何输出。 这样,可以通过编辑顶点、添加新图元和移除其他图元来选择性地修改网格。
The geometry shader is designed for modifying incoming data or making a limited number of copies. For example, one use is to generate six transformed copies of data to simultaneously render the six faces of a cube map; see Section 10.4.3. It can also be used to efficiently create cascaded shadow maps for high-quality shadow generation. Other algorithms that take advantage of the geometry shader include creating variablesized particles from point data, extruding fins along silhouettes for fur rendering, and finding object edges for shadow algorithms. See Figure 3.13 for more examples. These and other uses are discussed throughout the rest of the book.
几何着色器设计用于修改传入数据或制作有限数量的副本。 例如,一种用途是生成六个转换后的数据副本以同时渲染立方体贴图的六个面; 请参阅第 10.4.3 节。 它还可用于高效创建级联阴影贴图以生成高质量阴影。 其他利用几何着色器的算法包括从点数据创建可变粒子、沿着轮廓挤压鳍以进行毛皮渲染,以及为阴影算法寻找对象边缘。 有关更多示例,请参见图 3.13。 这些和其他用途将在本书的其余部分进行讨论。
Figure 3.13. Some uses of the geometry shader (GS). On the left, metaball isosurface tessellation is performed on the fly using the GS. In the middle, fractal subdivision of line segments is done using the GS and stream out, and billboards are generated by the GS for display of the lightning. On the right, cloth simulation is performed by using the vertex and geometry shader with stream out. (Images from NVIDIA SDK 10 [1300] samples, courtesy of NVIDIA Corporation.)
图 3.13。 几何着色器 (GS) 的一些用途。 在左侧,使用 GS 动态执行元球等值面细分。 中间使用GS做线段的分形细分并流出,GS生成billboards展示闪电。 在右侧,布料模拟是通过使用带流输出的顶点和几何着色器执行的。 (图片来自 NVIDIA SDK 10 [1300] 样本,由 NVIDIA 公司提供。)
DirectX 11 added the ability for the geometry shader to use instancing, where the geometry shader can be run a set number of times on any given primitive [530, 1971]. In OpenGL 4.0 this is specified with an invocation count. The geometry shader can also output up to four streams. One stream can be sent on down the rendering pipeline for further processing. All these streams can optionally be sent to stream output render targets.
DirectX 11 添加了几何着色器使用实例化的能力,其中几何着色器可以在任何给定图元上运行一定次数 [530, 1971]。 在 OpenGL 4.0 中,这是用调用计数指定的。 几何着色器还可以输出最多四个流。 可以将一个流向下发送到渲染管道以进行进一步处理。 所有这些流都可以选择性地发送到流输出渲染目标。
The geometry shader is guaranteed to output results from primitives in the same order that they are input. This affects performance, because if several shader cores run in parallel, results must be saved and ordered. This and other factors work against the geometry shader being used to replicate or create a large amount of geometry in a single call [175, 530].
几何着色器保证以与输入相同的顺序从基元输出结果。 这会影响性能,因为如果多个着色器核心并行运行,则必须保存和排序结果。 这个和其他因素不利于几何着色器用于在单个调用中复制或创建大量几何体 [175、530]。
After a draw call is issued, there are only three places in the pipeline where work can be created on the GPU: rasterization, the tessellation stage, and the geometry shader. Of these, the geometry shader’s behavior is the least predictable when considering resources and memory needed, since it is fully programmable. In practice the geometry shader usually sees little use, as it does not map well to the GPU’s strengths. On some mobile devices it is implemented in software, so its use is actively discouraged there [69].
发出绘制调用后,管道中只有三个地方可以在 GPU 上创建工作:光栅化、曲面细分阶段和几何着色器。 其中,考虑到所需的资源和内存时,几何着色器的行为是最不可预测的,因为它是完全可编程的。 在实践中,几何着色器通常用处不大,因为它不能很好地映射到 GPU 的优势。 在一些移动设备上,它是在软件中实现的,因此不鼓励使用它 [69]。
3.7.1 Stream Output 流输出
The standard use of the GPU’s pipeline is to send data through the vertex shader, then rasterize the resulting triangles and process these in the pixel shader. It used to be that the data always passed through the pipeline and intermediate results could not be accessed. The idea of stream output was introduced in Shader Model 4.0. After vertices are processed by the vertex shader (and, optionally, the tessellation and geometry shaders), these can be output in a stream, i.e., an ordered array, in addition to being sent on to the rasterization stage. Rasterization could, in fact, be turned off entirely and the pipeline then used purely as a non-graphical stream processor. Data processed in this way can be sent back through the pipeline, thus allowing iterative processing. This type of operation can be useful for simulating flowing water or other particle effects, as discussed in Section 13.8. It could also be used to skin a model and then have these vertices available for reuse (Section 4.4).
GPU 管道的标准用法是通过顶点着色器发送数据,然后光栅化生成的三角形并在像素着色器中处理这些。 以前都是数据一直通过pipeline,中间结果访问不到。 流输出的思想是在Shader Model 4.0中引入的。 在顶点着色器(以及可选的曲面细分和几何着色器)处理顶点之后,除了发送到光栅化阶段之外,它们还可以在流中输出,即有序数组。 事实上,可以完全关闭光栅化,然后将管道纯粹用作非图形流处理器。 以这种方式处理的数据可以通过管道发回,从而允许迭代处理。 这种类型的操作可用于模拟流水或其他粒子效果,如第 13.8 节所述。 它还可以用于为模型蒙皮,然后让这些顶点可用于重用(第 4.4 节)。
Stream output returns data only in the form of floating point numbers, so it can have a noticeable memory cost. Stream output works on primitives, not directly on vertices. If meshes are sent down the pipeline, each triangle generates its own set of three output vertices. Any vertex sharing in the original mesh is lost. For this reason a more typical use is to send just the vertices through the pipeline as a point set primitive. In OpenGL the stream output stage is called transform feedback, since the focus of much of its use is transforming vertices and returning them for further processing. Primitives are guaranteed to be sent to the stream output target in the order that they were input, meaning the vertex order will be maintained [530].
流输出仅以浮点数的形式返回数据,因此它可能会产生明显的内存开销。 流输出适用于图元,而不是直接适用于顶点。 如果网格沿着管道发送,每个三角形都会生成自己的一组三个输出顶点。 原始网格中的任何顶点共享都将丢失。 出于这个原因,更典型的用途是仅将顶点作为点集原语通过管道发送。 在 OpenGL 中,流输出阶段称为变换反馈,因为它的大部分使用重点是变换顶点并返回它们以供进一步处理。 保证图元按照输入的顺序发送到流输出目标,这意味着将保持顶点顺序 [530]。