Real-Time Rendering——5.5.2 Order-Independent Transparency 与顺序无关的透明性

本文链接：https://blog.csdn.net/m0_37609239/article/details/123421741

本文详细探讨了在计算机图形学中处理透明对象的挑战，特别是如何在交互速率下正确混合透明层。深度剥离是一种有效的技术，但可能较慢，因为它需要多次渲染。加权平均和加权混合透明度算法提供了单遍解决方案，但可能受到顺序和内存管理的限制。此外，还介绍了其他混合方法，如基于距离的权重和彩色透射效果。尽管存在多种算法，但没有一种完美适应所有情况，选择取决于具体场景和性能需求。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

The under equations are used by drawing all transparent objects to a separate color buffer, then merging this color buffer atop the opaque view of the scene using over.Another use of the under operator is for performing an order-independent transparency(OIT) algorithm known as depth peeling. Order-independent means that the application does not need to perform sorting. The idea behind depth peeling is to use two z-buffers and multiple passes. First, a rendering pass is made so that all surfaces’ z-depths, including transparent surfaces, are in the first z-buffer. In the second pass all transparent objects are rendered. If the z-depth of an object matches the value in the first z-buffer, we know this is the closest transparent object and save its RGBα to a separate color buffer. We also “peel” this layer away by saving the z-depth of whichever transparent object, if any, is beyond the first z-depth and is closest.This z-depth is the distance of the second-closest transparent object. Successive passes continue to peel and add transparent layers using under. We stop after some number of passes and then blend the transparent image atop the opaque image. See Figure 5.35.

通过将所有透明对象绘制到单独的颜色缓冲区，然后使用over将该颜色缓冲区合并到场景的不透明视图的顶部，可以使用under等式。under运算符的另一个用途是执行一种称为深度剥离的顺序独立透明(OIT)算法。顺序无关意味着应用程序不需要执行排序。深度剥离背后的想法是使用两个z缓冲器和多个通道。首先，进行渲染，使所有表面的z深度(包括透明表面)都在第一个z缓冲区中。在第二个过程中，渲染所有透明对象。如果对象的z深度与第一个z缓冲区中的值匹配，我们知道这是最接近的透明对象，并将其RGBα保存到单独的颜色缓冲区。我们还通过保存任何透明对象(如果有的话)的z深度来“剥离”该层，该透明对象超出第一个z深度并且是最接近的。这个z深度是第二近的透明对象的距离。连续的过程继续使用“under”来剥离和添加透明层。我们在一些通道后停止，然后在不透明图像上混合透明图像。参见图5.35。

Figure 5.35. Each depth peel pass draws one of the transparent layers. On the left is the first pass,showing the layer directly visible to the eye. The second layer, shown in the middle, displays the second-closest transparent surface at each pixel, in this case the backfaces of objects. The third layer,on the right, is the set of third-closest transparent surfaces. Final results can be found in Figure 14.33 on page 624. (Images courtesy of Louis Bavoil.)

图5.35。每个深度剥离过程绘制一个透明层。左边是第一遍，显示肉眼直接可见的层。中间显示的第二层显示每个像素处第二接近的透明表面，在本例中为对象的背面。右边的第三层是一组第三接近的透明表面。最终结果见图14.33。(图片由路易斯·巴沃尔提供。)

Several variants on this scheme have been developed. For example,Thibieroz gives an algorithm that works back to front, which has the advantage of being able to blend the transparent values immediately, meaning that no separate alpha channel is needed. One problem with depth peeling is knowing how many passes are sufficient to capture all the transparent layers. One hardware solution is to provide a pixel draw counter, which tells how many pixels were written during rendering; when no pixels are rendered by a pass, rendering is done. The advantage of using under is that the most important transparent layers—those the eye first sees—are rendered early on.Each transparent surface always increases the alpha value of the pixel it covers. If the alpha value for a pixel nears 1.0, the blended contributions have made the pixel almost opaque, and so more distant objects will have a negligible effect [394]. Front-to-back peeling can be cut short when the number of pixels rendered by a pass falls below some minimum, or a fixed number of passes can be specified. This does not work as well with back-to-front peeling, as the closest (and usually most important) layers are drawn last and so may be lost by early termination.

这个方案的几个变种已经开发出来了。例如，Thibieroz给出了一种从后向前工作的算法，其优点是能够立即混合透明值，这意味着不需要单独的alpha通道。深度剥离的一个问题是知道多少遍足以捕获所有的透明层。一种硬件解决方案是提供一个像素绘制计数器，它告诉我们在渲染过程中写入了多少像素；当一个过程没有渲染任何像素时，渲染完成。使用under的优点是最重要的透明层(眼睛首先看到的层)会在早期渲染。每个透明表面总是增加它覆盖的像素的alpha值。如果一个像素的阿尔法值接近1.0，混合贡献使像素几乎不透明，因此更远的物体将有一个可以忽略不计的影响[394]。当一个过程渲染的像素数低于某个最小值时，可以缩短从前到后的剥离，或者可以指定一个固定的过程数。这与从后向前剥离的效果不一样，因为最近的(通常也是最重要的)层是最后绘制的，因此可能会因提前终止而丢失。

While depth peeling is effective, it can be slow, as each layer peeled is a separate rendering pass of all transparent objects. Bavoil and Myers presented dual depth peeling, where two depth peel layers, the closest and the farthest remaining, are stripped off in each pass, thus cutting the number of rendering passes in half. Liu et al. explore a bucket sort method that captures up to 32 layers in a single pass.One drawback of this type of approach is that it needs considerable memory to keep a sorted order for all layers. Antialiasing via MSAA or similar would increase the costs astronomically.

虽然深度剥离是有效的，但它可能很慢，因为剥离的每个层都是所有透明对象的单独渲染过程。Bavoil和Myers提出了双重深度剥离，其中两个深度剥离层，即最接近和最远的剩余层，在每个过程中被剥离，从而将渲染过程的数量减少一半。刘等人探索了一种桶排序方法，该方法在一次通过中捕获多达32个层。这种方法的一个缺点是，它需要相当大的内存来保持所有层的排序顺序。通过MSAA或类似的反走样将增加天文数字的成本。

The problem of blending transparent objects together properly at interactive rates is not one in which we are lacking algorithms, it is one of efficiently mapping those algorithms to the GPU. In 1984 Carpenter presented the A-buffer, another form of multisampling. In the A-buffer, each triangle rendered creates a coverage mask for each screen grid cell it fully or partially covers. Each pixel stores a list of all relevant fragments. Opaque fragments can cull out fragments behind them, similar to the z-buffer. All the fragments are stored for transparent surfaces. Once all lists are formed, a final result is produced by walking through the fragments and resolving each sample.

以交互速率将透明对象恰当地混合在一起的问题不是我们缺少算法，而是将这些算法有效地映射到GPU。1984年，卡彭特提出了A缓冲器，另一种形式的多重采样。在A-buffer中，渲染的每个三角形为它完全或部分覆盖的每个屏幕网格单元创建一个覆盖遮罩。每个像素存储所有相关片段的列表。不透明片段可以剔除它们后面的片段，类似于z缓冲区。所有片段都存储在透明曲面中。一旦所有的列表都形成了，通过遍历片段和解析每个样本就产生了一个最终结果。

The idea of creating linked lists of fragments on the GPU was made possible through new functionality exposed in DirectX 11. The features used include unordered access views (UAVs) and atomic operations, described in Section 3.8.Antialiasing via MSAA is enabled by the ability to access the coverage mask and to evaluate the pixel shader at every sample. This algorithm works by rasterizing each transparent surface and inserting the fragments generated in a long array. Along with the colors and depths, a separate pointer structure is generated that links each fragment to the previous fragment stored for the pixel. A separate pass is then performed,where a screen-filling quadrilateral is rendered so that a pixel shader is evaluated at every pixel. This shader retrieves all the transparent fragments at each pixel by following the links. Each fragment retrieved is sorted in turn with the previous fragments. This sorted list is then blended back to front to give the final pixel color. Because blending is performed by the pixel shader, different blend modes can be specified per pixel, if desired. Continuing evolution of the GPU and APIs have improved performance by reducing the cost of using atomic operators

通过DirectX 11中公开的新功能，在GPU上创建片段链表的想法成为可能。所使用的功能包括无序访问视图(UAV)和原子操作，如3.8节所述。通过MSAA进行抗锯齿是通过访问覆盖遮罩和评估每个样本的像素着色器来实现的。该算法的工作原理是对每个透明表面进行栅格化，并将生成的片段插入到一个长数组中。除了颜色和深度，还会生成一个单独的指针结构，将每个片段链接到为该像素存储的前一个片段。然后执行一个单独的过程，其中渲染屏幕填充四边形，以便在每个像素处评估像素着色器。该着色器通过跟踪链接来检索每个像素处的所有透明片段。检索到的每个片段与前面的片段依次排序。然后，将这个排序后的列表从后向前混合，以给出最终的像素颜色。因为混合是由像素着色器执行的，所以如果需要，可以为每个像素指定不同的混合模式。GPU和API的持续发展通过降低使用原子操作符的成本提高了性能

The A-buffer has the advantage that only the fragments needed for each pixel are allocated, as does the linked list implementation on the GPU. This in a sense can also be a disadvantage, as the amount of storage required is not known before rendering of a frame begins. A scene with hair, smoke, or other objects with a potential for many overlapping transparent surfaces can produce a huge number of fragments.Andersson notes that, for complex game scenes, up to 50 transparent meshes of objects such as foliage and up to 200 semitransparent particles may overlap.

A-buffer的优点是只分配每个像素所需的片段，就像GPU上的链表实现一样。从某种意义上说，这也是一个缺点，因为在开始渲染帧之前，所需的存储量是未知的。头发、烟雾或其他可能有许多重叠透明曲面的对象的场景会产生大量碎片。Andersson指出，对于复杂的游戏场景，多达50个透明的物体网格(如树叶)和多达200个半透明的粒子可能会重叠。

GPUs normally have memory resources such as buffers and arrays allocated in advance, and linked-list approaches are no exception. Users need to decide how much memory is enough, and running out of memory causes noticeable artifacts. Salvi and Vaidyanathan present an approach tackling this problem, multi-layer alpha blending, using a GPU feature introduced by Intel called pixel synchronization. See Figure 5.36. This capability provides programmable blending with less overhead than atomics. Their approach reformulates storage and blending so that it gracefully degrades if memory runs out. A rough sort order can benefit their scheme. DirectX 11.3 introduced rasterizer order views (Section 3.8), a type of buffer that allows this transparency method to be implemented on any GPU supporting this feature.Mobile devices have a similar technology called tile local storage that permits them to implement multi-layer alpha blending. Such mechanisms have a performance cost, however, so this type of algorithm can be expensive.

GPU通常会预先分配内存资源，如缓冲区和数组，链表方法也不例外。用户需要决定多少内存是足够的，内存不足会导致明显的假象。萨尔维和Vaidyanathan提出了一种解决这个问题的方法，多层阿尔法混合，使用英特尔推出的一种叫做像素同步的GPU功能。参见图5.36。这种能力提供了比原子更少开销的可编程混合。他们的方法重新制定了存储和混合，以便在内存耗尽时优雅地降级。粗略的排序对他们的方案有利。DirectX 11.3引入了光栅化器顺序视图(3.8节)，这是一种缓冲区，允许在支持此功能的任何GPU上实现此透明方法。移动设备有一种类似的技术，称为tile local storage，允许它们实现多层alpha混合。然而，这种机制具有性能成本，因此这种类型的算法可能是昂贵的。

Figure 5.36. In the upper left, traditional back-to-front alpha blending is performed, leading to rendering errors due to incorrect sort order. In the upper right, the A-buffer is used to give a perfect,non-interactive result. The lower left presents the rendering with multi-layer alpha blending. The lower right shows the differences between the A-buffer and multi-layer images, multiplied by 4 for visibility. (Images courtesy of Marco Salvi and Karthik Vaidyanathan, Intel Corporation.)

图5.36。在左上角，执行传统的从后到前的alpha混合，由于不正确的排序顺序导致渲染错误。在右上方，A-buffer用于给出一个完美的、非交互式的结果。左下方呈现的是多层alpha混合的渲染。右下角显示了A缓冲区和多层图像之间的差异，乘以4以获得可见性。(图片由英特尔公司的马可·萨尔维和卡蒂克·瓦伊迪亚纳森提供。)

This approach builds on the idea of the k-buffer, introduced by Bavoil et al.,where the first few visible layers are saved and sorted as possible, with deeper layers discarded and merged as possible. Maule et al. use a k-buffer and account for these more distant deep layers by using weighted averaging. Weighted sum and weighted average transparency techniques are order-independent, are single-pass,and run on almost every GPU. The problem is that they do not take into account the ordering of the objects. So, for example, using alpha to represent coverage, a gauzy red scarf atop a gauzy blue scarf gives a violet color, versus properly seeing a red scarf with a little blue showing through. While nearly opaque objects give poor results, this class of algorithms is useful for visualization and works well for highly transparent surfaces and particles. See Figure 5.37.

这种方法建立在由Bavoil等人提出的k-buffer的思想上，在k-buffer中，前几个可见层被尽可能地保存和排序，而更深的层被尽可能地丢弃和合并。Maule等人使用k-buffer，并通过使用加权平均来说明这些更远的深层。加权和与加权平均透明度技术是顺序独立的，是单遍的，并且几乎可以在每个GPU上运行。问题是它们没有考虑对象的顺序。例如，使用alpha来表示覆盖范围，薄纱蓝色围巾上的薄纱红色围巾呈现紫色，而正确地看到红色围巾透着一点蓝色。虽然几乎不透明的对象给出的结果很差，但这类算法对于可视化很有用，并且对于高度透明的表面和粒子很有效。参见图5.37。

Figure 5.37. The object order becomes more important as opacity increases. (Images after Dunn.)

图5.37。随着不透明度的增加，对象顺序变得更加重要。(邓恩之后的图片。)

In weighted sum transparency the formula is

在加权总和透明度中，公式为

where n is the number of transparent surfaces, ci and αi represent the set of transparency values, and cd is the color of the opaque portion of the scene. The two sums are accumulated and stored separately as transparent surfaces are rendered, and at the end of the transparency pass, the equation is evaluated at each pixel. Problems with this method are that the first sum saturates, i.e., generates color values greater than (1.0, 1.0, 1.0), and that the background color can have a negative effect, since the sum of the alphas can surpass 1.0.

其中n是透明表面的数量，ci和αi表示一组透明度值，cd是场景中不透明部分的颜色。在渲染透明曲面时，这两个和会分别累加和存储，在透明过程结束时，会在每个像素处计算该方程。这种方法的问题是第一个总和饱和，即产生大于(1.0，1.0，1.0)的颜色值，并且背景颜色可能具有负面影响，因为阿尔法的总和可能超过1.0。

The weighted average equation is usually preferred because it avoids these problems:

加权平均方程通常是优选的，因为它避免了这些问题:

The first line represents the results in the two separate buffers generated during transparency
rendering. Each surface contributing to csum is given an influence weighted by its alpha; nearly opaque surfaces contribute more of their color, and nearly transparent surfaces have little influence. By dividing csum by αsum we get a weighted average transparency color. The value αavg is the average of all alpha values. The value u is the estimated visibility of the destination (the opaque scene) after this average alpha is applied n times, for n transparent surfaces. The final line is effectively the over operator, with (1 − u) representing the source’s alpha.

第一行表示透明期间生成的两个独立缓冲区中的结果渲染。对csum有贡献的每个表面被赋予由其α加权的影响；几乎不透明的表面贡献更多的颜色，几乎透明的表面几乎没有影响。通过将csum除以αsum，我们得到加权平均透明色。值αavg是所有alpha值的平均值。值u是对n个透明表面应用该平均alpha n次后目标(不透明场景)的估计可见性。最后一行实际上是over运算符，其中(1-u)表示信号源的alpha。

One limitation with weighted average is that, for identical alphas, it blends all colors equally, regardless of order. McGuire and Bavoil introduced weighted blended order-independent transparency to give a more convincing result. In their formulation,the distance to the surface also affects the weight, with closer surfaces given more influence. Also, rather than averaging the alphas, u is computed by multiplying the terms (1 − αi) together and subtracting from one, giving the true alpha coverage of the set of surfaces. This method produces more visually convincing results, as seen in Figure 5.38.

加权平均的一个限制是，对于相同的alphas，它平等地混合所有颜色，而不管顺序。McGuire和Bavoil引入了加权混合顺序无关透明度，以给出更令人信服的结果。在他们的公式中，到表面的距离也影响重量，表面越近影响越大。此外，u的计算方法不是求α的平均值，而是将各项(1 − αi)相乘，然后减去一项，从而得到这组表面的真实α覆盖率。如图5.38所示，这种方法产生了更具视觉说服力的结果。

Figure 5.38. Two different camera locations viewing the same engine model, both rendered with weighted blended order-independent transparency. Weighting by distance helps clarify which surfaces are closer to the viewer. (Images courtesy of Morgan McGuire.)

图5.38。查看同一引擎模型的两个不同相机位置，均使用加权混合顺序无关透明度进行渲染。按距离加权有助于澄清哪些表面离观察者更近。(图片由摩根·麦奎尔提供。)

A drawback is that objects close to one another in a large environment can have nearly equal weightings from distance, making the result little different than the weighted average. Also, as the camera’s distance to the transparent objects changes,the depth weightings may then vary in effect, but this change is gradual.

一个缺点是，在一个大的环境中，彼此靠近的物体从距离上看可能具有几乎相等的权重，使得结果与加权平均值几乎没有不同。此外，当相机到透明物体的距离改变时，深度权重实际上可能会改变，但是这种改变是渐进的。

McGuire and Mara extend this method to include a plausible transmission color effect. As noted earlier, all the transparency algorithms discussed in this section blend various colors instead of filtering them, mimicking pixel coverage.To give a color filter effect, the opaque scene is read by the pixel shader and each transparent surface multiplies the pixels it covers in this scene by its color, saving the result to a third buffer. This buffer, in which the opaque objects are now tinted by the transparent ones, is then used in place of the opaque scene when resolving the transparency buffers. This method works because, unlike transparency due to coverage,colored transmission is order-independent.

McGuire和Mara扩展了这种方法，使之包括了一种似是而非的透射色效果。如前所述，本节讨论的所有透明度算法混合各种颜色，而不是过滤它们，模拟像素覆盖。为了提供颜色过滤效果，不透明场景由像素着色器读取，每个透明表面将其在该场景中覆盖的像素乘以其颜色，将结果保存到第三个缓冲区。该缓冲区中的不透明对象现在被透明对象着色，然后在解析透明缓冲区时用于代替不透明场景。这种方法是可行的，因为不同于由覆盖引起的透明，彩色透射是顺序独立的。

There are yet other algorithms that use elements from several of the techniques presented here. For example, Wyman categorizes previous work by memory requirements, insertion and merge methods, whether alpha or geometric coverage is used, and how discarded fragments are treated. He presents two new methods found by looking for gaps in previous research. His stochastic layered alpha blending method uses k-buffers, weighted average, and stochastic transparency. His other algorithm is a variant on Salvi and Vaidyanathan’s method, using coverage masks instead of alpha.

还有其他算法使用了这里介绍的几种技术的元素。例如，Wyman根据内存需求、插入和合并方法、是否使用alpha或几何覆盖以及如何处理丢弃的片段来对以前的工作进行分类。他介绍了两种通过寻找以前研究中的空白而发现的新方法。他的随机分层阿尔法混合方法使用k-buffers、加权平均和随机透明度。他的另一个算法是萨尔维和瓦伊代纳森方法的变体，使用覆盖蒙版代替阿尔法。

Given the wide variety of types of transparent content, rendering methods, and GPU capabilities, there is no perfect solution for rendering transparent objects. We refer the interested reader to Wyman’s paper and Maule et al.’s more detailed survey of algorithms for interactive transparency. McGuire’s presentation gives a wider view of the field, running through other related phenomena such as volumetric lighting, colored transmission, and refraction, which are discussed in greater depth later in this book.

鉴于透明内容、渲染方法和GPU能力的种类繁多，没有完美的透明对象渲染解决方案。感兴趣的读者可以参考Wyman的论文和Maule等人对交互式透明算法的更详细的调查。McGuire的介绍给出了该领域的一个更广泛的视角，贯穿了其他相关现象，如体积照明、彩色透射和折射，这些将在本书的后面进行更深入的讨论。