A Quick Overview of MSAA

A Quick Overview of MSAA

Previous article in the series: Applying Sampling Theory to Real-Time Graphics

MSAA can be a bit complicated, due to the fact that it affects nearly the entire rasterization pipeline used in GPU’s. It’s also complicated because really understanding why it works requires at least a basic understanding of signal processing and image resampling. With that in mind I wanted to provide an quick overview of how MSAA works on a GPU, in order to provide the some background material for the following article where we’ll experiment with MSAA resolves. Like the previous article on signal processing, feel free to skip if you’re already an expert. Or better yet, read through it and correct my mistakes!

MSAA有点复杂,原因是它几乎会影响到整个光栅化管线在GPU中的使用。此外,它复杂的另一个原因是,完全明白为什么它要这样工作至少需要理解基本的信号处理和图像采样。记住上面两点,我想介绍MSAA在GPU上是怎样工作的快速回顾,为的是在接下来的文章中,我们将会对MSAA的resolve试验提供一个背景材料。像之前的信号处理文章一样,如果你已经在这方面是专家的话,你可以放心的跳过。或者,通读它并且改正我的错误将会更好。

Rasterization Basics

光栅化基础

A modern D3D11-capable GPU features hardware-supported rendering of point, line, and triangle primitives through rasterization. The rasterization pipeline on a GPU takes as input the vertices of the primitive being rendered, with vertex positions provided in the homogeneous clip space produced by transformation by some projection matrix.  These positions are used to determine the set of pixels in the current render target where the triangle will be visible. This visible set is determined from two things:coverage, and occlusion. Coverage is determined by performing some test to determine if the primitive overlaps a given pixel. In GPU’s, coverage is calculated by testing if the primitive overlaps a single sample point located in the exact center of each pixel 1. The following image demonstrates this process for a single triangle:

现代的具有D3D11能力特性的GPU,通过光栅器能以硬件支持的方式渲染点、线和三角形图元。在GPU上,光栅化管线以被渲染的图元顶点作为输入,伴随着通过某个投影矩阵变换后,在齐次裁剪空间中的顶点位置。这些位置被用来决定这个三角形在当前的render target中哪些像素集是可见的。这个可见集由两个条件决定:coverage和occlusion。通过执行一些测试能决定coverage,这个测试决定着这个图元是否和一个给定的像素重叠。在GPU中,通过测试这个图元是否和一个采样点重叠来计算coverage,而这个采样点位于像素的中间。下面的图展示一个三角形的这个处理过程:


Coverage being calculated for a rasterized triangle. The blue circles represent a grid of sample points, each located at the center of a pixel. The red circles represent sample points covered by the triangle.

一个光栅化的三角形计算出coverage。蓝色圆圈代表一个网格的采样点,每个都位于一个像素的中间。红色的圆圈代表采样点被三角形覆盖。

Occlusion tells us whether a pixel covered by a primitive is also covered by any other triangles, and is handled by z-buffering in GPU’s. A z-buffer, or depth buffer, stores the depth of the closest primitive relative to the camera at each pixel location. When a primitive is rasterized, its interpolated depth is compared against the value in the depth buffer to determine whether or not the pixel is occluded. If the depth test succeeds, the appropriate pixel in the depth buffer is updated with new closest depth. One thing to note about the depth test is that while it is often shown as occurring after pixel shading, almost all modern hardware can execute some form of the depth test before shading occurs. This is done as an optimization, so that occluded pixels can skip pixel shading. GPU’s still support performing the depth test after pixel shading in order to handle certain cases where an early depth test would produce incorrect results. One such case is where the pixel shader manually specifies a depth value, since the depth of the primitive isn’t known until the pixel shader runs.

Occlusion告诉我们一个像素是否被两个不同的三角形覆盖,在GPU中,它是通过z-buffering来处理的。z-buffer或depth buffer在每个像素的位置上存储相对于摄像机最近的图元深度。当一个图元被光栅化后,经过插值的深度与depth buffer中的值进行比较来决定这个像素是否被遮挡。如果depth test成功了,在depth buffer中对应的像素就会被更新。depth test中需要注意的一件事是,它通常会在像素着色后进行,而几乎所有的现代硬件都能够在着色发生前执行某些形式的depth test(K:所谓的early z)。这是一个优化过程,以便被遮挡的像素能够跳过像素着色的阶段。GPU仍然支持在着色之后的depth test,目的是处理某些早期的depth test产生的错误结果。其中的一个例子是,在pixel shader中手动地指定深度值,导致图元的深度直到pixel shader运行完毕后才知道。

Together, coverage and occlusion tells us the visibility of a primitive. Since visibility can be defined as 2D function of X and Y, we can treat it as a signal and define its behavior in terms of concepts from signal processing. For instance, since coverage and depth testing is performed at each pixel location in the render target the visibility sampling rate is determined by the X and Y resolution of that render target. We should also note that triangles and lines will inherently have discontinuities, which means that the signal is not bandlimited and thus no sampling rate will be adequate to avoid aliasing in the general case.

结合一起,coverage和occlusion告诉我们图元的可见性。由于可见性能够被一个二维函数X和Y定义,我们可以将其视作为一个信号并以信号处理的形式定义它的行为。举个例子,由于coverage和depth testing在render target中的每个像素上执行,可见性的采样率是由这个render target在X和Y方向上的分辨率决定的。我们也应该注意到三角形和直线将会天然地具有不连续性,意味着这个信号不是bandlimited的,因此,在通常的情况下,没有采样率能够足够精确的避免走样。


Oversampling and Supersampling

过采样和超采样(K:Oversampling和Supersampling,Supersampling是Oversampling中的一种?)

While it’s generally impossible to completely avoid aliasing of an arbitrary signal with infinite frequency, we can still reduce the appearance of aliasing artifacts through a process known as oversampling. Oversampling is the process of sampling a signal at some rate that’s higher than our intended final output, and then reconstructing and resampling the signal again at the output sample rate. As you’ll recall from the first article, sampling at a higher rate causes the  clones of a signal’s spectrum to be further apart. This results in less of the higher-frequency components leaking into the reconstructed version of the signal, which in the case of an image means a reduction in the appearance of aliasing artifacts.

因为对于一个任意的具有无限频率的信号来讲,完全避免走样通常是不可能的,但我们仍然可以通过一个被称为oversampling的过程来降低走样的表现。Oversampling是以一种比我们原来设定的输出比率还要高的比率,去采样一个信号。就如你在第一篇文章中回忆起的,更高的采样比率将会导致信号的频谱(K:在频域中的图像)之间和信号副本分离得更开(K:因为信号的副本会在2B的频率范围外产生,假设这个信号的baseband bandwidth是B,而且是bandlimited的)。导致的结果是,更高频率的部分将更难浸透到重建后的信号中,意味着走样的降低。

When applied to graphics and 2D images we call this supersampling, often abbreviated as SSAA. Implementing it in a 3D rasterizer is trivial: render to some resolution higher than the screen, and then downsample to screen resolution using a reconstruction filter. The following image shows the results of various supersampling patterns applied to a rasterized triangle:

当应用到图形学和2D图像中,我们叫这个为supersampling,通常的缩写为SSAA。它的在3D光栅器中实现是很简单的:以比屏幕的分辨率更高的分辨率去渲染,然后使用重建滤波器downsample到屏幕的分辨率。下面的图像显示了使用各种supersampling样式的光栅化三角形:


Supersampling applied to a rasterized triangle, using various sub-pixel patterns. Notice how aliasing is reduced as the sample rate increases, even though the number of pixels is the same in all cases. Image from Real-Time Rendering, 3rdEdition, A K Peters 2008

Supersampling应用到一个光栅化后的三角形中,使用各种sub-pixel样式。注意,在采样率增加后,走样是怎样降低的,尽管像素的数量是相同的时候。图像来自Real-Time Rendering 3 Edition,A K Peters 2008

The simplicity and effectiveness of supersampling resulted in it being offered as a driver option for many early GPU’s. The problem, however, is performance. When the resolution of the render target is increased, the sampling rate of visibility increases. However since the execution of the pixel shader is also tied to the resolution of the pixels, the pixel shading rate would also increase. This meant that any work performed in the pixel shader, such as lighting or texture fetches, would be performed at a higher rate and thus consume more resources. The same goes for bandwidth used when writing the results of the pixel shader to the render target, since the write (and blending, if enabled) is performed for each pixel. Memory consumption is also increased, since the render target and corresponding z buffer must be larger in size. Because of these adverse performance characteristics, supersampling was mostly relegated to a high-end feature for GPU’s with spare cycles to burn.

对于许多早期的GPU来讲,由于supersampling的简单和效果,它被作为一个驱动选项被提供。然而,问题出在性能上。当render target的分辨率增加后,可见度的采样率就会增加(K:因为能够看到的像素会增加)。然而,因为pixel shader的执行也是和像素的分辨率相关,像素的着色率也会增加。这意味着任何在pixel shader执行的工作,例如灯光和纹理提取,将会以更高的频率被执行,因此将会消耗更多的资源。对于bandwidth的使用来讲也是一样的,当将pixel shader的结果写入到render target中,因为写入的操作(包括blending,如果开启了的话)是逐像素执行的。内存的消耗也会增加,因为render target和对应的z buffer的大小必须更大。因为这些对性能不利的特性,supersampling过去几乎都只归入到高端GPU中,因为它们有更多空闲的时钟周期可以使用。

Supersampling Evolves into MSAA

Supersampling进化成MSAA

So we’ve established that supersampling works in principle for reducing aliasing in 3D graphics, but that it’s also prohibitively expensive. In order to keep most of the benefit of supersampling without breaking the bank in terms of performance, we can observe that aliasing of triangle visibility function (AKA geometric aliasing) only occurs at the edges of rasterized triangles. If we hopped into a time machine and traveled back to 2001, we would also observe that pixel shading mostly consists of texture fetches and thus doesn’t suffer from aliasing (due to mipmaps). These observations would lead us to conclude that geometric aliasing is the primary form of aliasing for games, and should be our main focus. This conclusion is what caused MSAA to be born.

我们已经建立了supersmapling在3D图形学中,降低走样的工作原理,但是它也是过度的昂贵。为了保留大部分supersampling的优点而不至于在性能上消耗过多,我们可以观察到三角形可见度函数(AKA几何走样)的走样只发生在光栅化的三角形的边沿处。如果我们跳进时光机器回到2001年,我们也可以观察到,像素的着色大部分是由纹理提取组成,因此不会遇到走样(原因是mipmaps的作用,K:因为对于mipmaps来讲,最低限度相当于进行过一次双线性采样了,甚至在runtime的时候对两个mipmaps采样,就是所谓的三线性采样了)。这个观察到的结果让我们得出结论就是,对于游戏来讲,几何走样是主要的走样形式,应该得到我们最大的重视。这就是导致MSAA产生的原因。

In terms of rasterization, MSAA works in a similar manner to supersampling. The coverage and occlusion tests are both performed at higher-than-normal resolution, which is typically 2x through 8x. For coverage, the hardware implements this by having N sample points within a pixel, where N is the multisample rate. These samples are known as subsamples, since they are sub-pixel samples. The following image shows the subsample placement for a typical 4x MSAA rotated grid pattern:

就光栅化而言,MSAA与supersampling具有类似的工作方式。coverage和occlusion会在比原来更高的分辨率下进行,通常会是原来的2到8倍。对于coverage来讲,硬件在一个像素下使用N个采样点来实现,N就是multisample rate。这些采样被称为subsamples,因为它们是sub-pixel samples(K:所谓的sub-pixel samples对于MSAA和SSAA来讲,都是一个真正的像素,因为这些采样像素都是为最终的像素服务的)。下面的图像显示了subsample被放置在一个传统的4x MSAA旋转网格形式下:


Typical MSAA 4x Sample Pattern

The triangle is tested for coverage at each of the N sample points, essentially building a bitwise coverage mask representing the portion of the pixel covered by a triangle 2. For occlusion testing, the triangle depth is interpolated at each covered sample point and tested against the depth value in the z buffer. Since the depth test is performed for each subsample and not for each pixel, the size of the depth buffer must be augmented to store the additional depth values. In practice this means that the depth buffer will N times the size of the non-MSAA case. So for 2xMSAA the depth buffer will be twice the size, for 4x it will be four times the size, and so on.

Coverage是通过在N个采样点下测试得出,本质上是为了生成一个按位储存的coverage mask,它表示这个像素被三角形覆盖的部分。对于occlusion测试,三角形的深度值会在每个被覆盖的采样点下插值生成,并且与z buffer中的深度值进行比较。因为深度测试是在每个子采样点而不是每个像素下进行的,所以depth buffer的大小必须要增加,以储存额外的深度值。在实践中,这意味着depth buffer会是非MSAA时候的N倍。所以对于2x MSAA的depth buffer会是原来大小的2倍,对于4x就是四倍,等等。

Where MSAA begins to differ from supersampling is when the pixel shader is executed. In the standard MSAA case, the pixel shader is not executed for each subsample. Instead, the pixel shader is executed only once for each pixel where the triangle covers at least one subsample. Or in other words, it is executed once for each pixel where the coverage mask is non-zero. At this point pixel shading occurs in the same manner as non-MSAA rendering: the vertex attributes are interpolated to the center of the pixel and used by the pixel shader to fetch textures and perform lighting calculations. This means that the pixel shader cost does not increase substantially when MSAA is enabled, which is the primary benefit of MSAA over supersampling.

MSAA不同于supersampling的地方是pixel shader的执行。在标准的MSAA情况下,pixel shader不会在每个子采样点下执行。相反的是,pixel shader对于一个像素只会执行一次,而每个三角形会(在一个像素中)覆盖至少一个子采样点。换句话讲,它对于一个像素只会执行一次,而该像素的coverage mask不会是0(K:也就是说只要pixel shader被执行,coverage mask就不会是0)。在这点上,像素着色的发生和非MSAA的渲染形式是一样的:顶点属性被插值到像素的中心点处,并且被pixel shader用来提取纹理,并执行光照计算。这意味着,当MSAA开启时,pixel shader实质上并没有增加,这是MSAA相对于supersampling来讲获得的最大好处。

Although we only execute the pixel shader once per covered pixel, it is not sufficient to store only one output value per pixel in the render target. We need the render target to support storing multiple samples, so that we can store the results from multiple triangles that may have partially covered a single pixel. Therefore an MSAA render target will have enough memory to store N subsamples for each pixel. This is conceptually similar to an MSAA z buffer, which also has enough memory to store N subsamples. Each subsample in the render target is mapped to one of the subsample points used during rasterization to determine coverage. When a pixel shader outputs its value, the value is only written to subsamples where both the coverage test and the depth test passed for that pixel. So if a triangle covers half the sample points in 4x sample pattern, then half of the subsamples in the render target receive the pixel shader output value. Or if all of the sample points are covered, then all of the subsamples receive the output value. The following image demonstrates this concept:

尽管我们只会对一个被覆盖的像素执行一次pixel shader,但在render target中,对于一个像素储存一个输出的值却是不足够的。我们需要render target支持储存多个采样值,以便我们能够储存最终的结果,这个结果来自于多个三角形,而这些三角形可能部分地覆盖一个单独的像素。因此,一个MSAA的render target对于每个像素来讲将会有足够的内存来储存N个subsamples。这和MSAA的z buffer在概念上是类似的,它也有足够的内存去储存N个subsamples。在render target中的每个subsample对应于其中一个subsample points,这个subsample point在光栅化的过程中用于决定coverage。当一个pixel shader输出它的值时,这个值只会被写入子采样中,对于那个像素,这些subsamples的coverage test和depth test都是通过的。如果一个三角形在4x 采样范式中覆盖一半的采样点,在render target中一半的subsamples会接受到来自pixel shader的输出值。或者,如果所有的采样点都被覆盖了,那么所有的子采样都会接收到输出值。下面的图像展示了这个概念:


Results from non-MSAA and 4x MSAA rendering when a triangle partially covers a pixel. Image from Real-Time Rendering, 3rd Edition

By using the coverage mask to determine which subsamples to be updated, the end result is that a single pixel can end up storing the output from N different triangles that partially cover the sample pixel. This effectively gives us the result we want, which is an oversampled form of triangle visibility. The following image, taken from the Direct3D 10 documentation[1], visually summarizes the rasterization process for the case of 4xMSAA:

通过利用coverage mask来决定哪些subsamples能够被更新,最终的结果是,一个像素最后储存的输出值会来自N个不同的三角形,这些三角形都可能部分地覆盖这个被采样的像素。实际上给了我们想要的结果,这是一个关于三角形可见性的oversampled形式。下面的图像,来自D3D10的文档,形象地总结了4xMSAA的光栅化过程:


An image from the D3D10 documentation detailing the results of rasterizing various primitives with 4xMSAA.

MSAA Resolve

MSAA Resolve

As with supersampling, the oversampled signal must be resampled down to the output resolution before we can display it. With MSAA, this process is referred to as resolving the render target. In its earliest incarnations, the resolve process was carried out in fixed-function hardware on the GPU. The filter commonly used was a 1-pixel-wide box filter, which essentially equates to averaging all subsamples within a given pixel. Such a filter produces results such that fully-covered pixels end up with the same result as non-MSAA rendering, which could be considered either good or bad depending on how you look at it (good because you won’t unintentially reduce details through blurring, bad because a box filter will introduce postaliasing). For pixels with triangle edges, you get a trademark gradient of color values with a number of steps equal to the number of sub-pixel samples. Take a look at the following image to see what this gradient looks like for various MSAA modes:

对于supersampling,在我们能够显示它之前,oversampled的信号必须被重采样到输出的分辨率。对于MSAA来讲,这个过程被认为是resolving对应的render target。在其早期的版本中,resolve的过程在GPU上被固定功能硬件所执行的。通常被使用的滤波器是1个像素宽的box filter,这相当于对所有的subsamples进行平均。这样的一个滤波器最终产生的结果与非MSAA渲染的结果,在一个像素(K:的subsamples)被完全覆盖的时候是一致的,这可以被认为是好的,也能够被认为是不好的,视乎于你是怎样看待(好的原因是你不会意外地因模糊而降低细节,不好的原因是box filter会产生postaliasing)。对于在三角形边沿上的像素,你会得到一个标志性的、若干层次的颜色阶梯值,这个值和sub-pixel samples的数量是相等的。看一下下面的图像,对于各种MSAA模型,这种阶梯值是怎样的:


Trademark MSAA edge gradients resulting from reconstruction using box filtering

One notable exception to box filtering was Nvidia’s “Quincunx” AA, which was available as a driver option on their DX8 and DX9-era hardware (which includes the RSX used by the PS3). When enabled, it would use a 2-pixel-wide triangle filter centered on one of the samples in a 2x MSAA pattern. The “quincunx” name comes the fact that the resolve process ends up using 5 subsamples that are arranged in the cross-shaped quincunx pattern. Since the quincunx resolve uses a wider reconstruction filter, aliasing is reduced compared to the standard box filter resolve. However, using a wider filter can also result in unwanted attenuation of higher frequencies. This can lead to a “blurred” look that appears to lack details, which is a complaint sometimes levied against PS3 games that have used the feature. AMD later added a similar feature to their 3 and 4000-series GPU’s called “Wide Tent” that also made use of a triangle filter with width greater than a pixel.

对于box filtering的一个引人注意的例外是nvidia的"五梅花"反走样,在DX8和DX-9之前的硬件中(包括用在PS3中的RSX),它作为一个驱动选项被提供。当开启这个功能,它会使用两个像素宽度的triangle filter集中到2xMSAA其中之一的sample处。"五梅花"这个名字来源于在resolve过程中使用到5个subsamples,这些subsamples以十字交叉的五梅花样式被排列。因为五梅花resolve使用一个更宽的reconstruction filter,相对于标准的box filter resolve来讲能够减少走样。然而,使用一个更宽的滤波器也会引入不必要的高频衰减。这会导致一个"模糊的"样子,就如缺少细节一样,有时候在ps3的游戏中使用了这种特性会被抱怨。AMD后来也添加了类似的功能到他们的3-4000系列的GPU中,被称为"Wide Tent",也是通过使用多于一个像素的宽度的triangle filter。

As GPU’s became more programmable and the API’s evolved to match them, we eventually gained the ability to perform the MSAA resolve in a custom shader instead of having to rely on an API function to do that. This is an ability we’re going to explore in the following article.

由于GPU的可编程增强,并且API也进化到可以匹配这种可编程能力,我们最终获得在一个自定义的shader中而不是依赖API函数去执行MSAA resolve的能力。这种能力我们会在下一篇文章中讲述。

Compression

压缩

As we saw earlier, MSAA doesn’t actually improve on supersampling in terms of rasterization complexity or memory usage. At first glance we might conclude that the only advantage of MSAA is that pixel shader costs are reduced. However this isn’t actually true, since it’s also possible to improve bandwidth usage. Recall that the pixel shader is only executed once per pixel with MSAA. As a result, the same value is often written to all N subsamples of an MSAA render target. GPU hardware is able to exploit this by sending the pixel shader value coupled with another value indicating which subsamples should be written, which acts as a form of lossless compression. With such a compression scheme the bandwidth required to fill an MSAA render target can be significantly less than it would be for the supersampling case.

就如我们前面所见,MSAA实际上不会对supersampling的光栅化复杂度和内存使用方面带来提升。乍一看,我们可能总结出MSAA唯一的有点是pixel shader的消耗减少了。然而,实际上这并不是(完全)正确的,因为它也可能提升bandwidth的使用。回想一下,使用MSAA时,pixel shader只会在一个像素处被执行一次。结果,相同的结果通常会被写到MSAA render target的N个subsamples中。GPU硬件能够利用这个特点同过传送pixel shader的值外加另外的一些数据,这些数据用以表示哪些subsamples应该被写入,这种形式就如无损压缩(K:这里不会写入所有的subsamples而只有一个mask标记哪些subsamples写入了对应的值)。使用这种压缩方式,在需要填满整个MSAA render target时,bandwidth就能够显著地降低,相对于supersampling的情况。

CSAA and EQAA

Since its introduction, the fundamentals of MSAA have not seen significant changes as graphics hardware has evolved. We already discussed the special resolve modes supported by the drivers for certain Nvidia and ATI/AMD hardware as well as the ability to arbitrarily access subsample data in an MSAA render target, which are two notable exceptions. A third exception has been Nvidia’s Coverage Sampling Antialiasing(CSAA)[2] modes supported by their DX10 and DX11 GPU’s. These modes seek to improve the quality/performance ratio of MSAA by decoupling the coverage of triangles within a pixel from the subsamples storing the value output by the pixel shader. The idea is that while subsamples have high storage cost since they store pixel shader outputs, the coverage can be stored as a compact bitmask. This is exploited by rasterizing at a certain subsample rate and storing coverage at that rate, but then storing the actual subsample values at a lower rate. As an example, the “8x” CSAA mode stored 8 coverage samples and 4 pixel shader output values. When performing the resolve, the coverage data is used to augment the quality of the results. Unfortunately Nvidia does not provide public documentation of this step, and so the specifics will not be discussed here. They also do not provide programmatic access to the coverage data in shaders, thus the data will only be used when performing a standard resolve through D3D or OpenGL functions.

就如它的介绍所说,MSAA的基本原理并没有随着图形硬件的进化而发生重要的改变。我们已经讨论了被某几个Nvidia和ATI/AMD硬件驱动支持的特殊的resolve形式,也讨论了在MSAA render target中任意访问subsample数据的能力,这种能力有两个值得注意的例外。而第三个例外就Nvidia的Coverage Sampling Antialiasing(CSAA)模式,这个模式被他们的DX10和DX11 GPU所支持。这些模式尝试去提高MSAA的质量/性能的比率,这是通过分离三角形的coverage与储存着pixel shader输出值的subsamples来实现的。这个想法源自,subsamples储存pixel shader的输出会消耗更多,而coverage能够使用更紧凑的bitmask形式进行储存。这是通过利用某个subsample比率来光栅化,而以这个比率来储存coverage,然后以更低的比率储存真正的subsample的值。例如,8x CSAA模式储存8个coverage samples和4个pixel shader输出值。在执行resolve的时候,coverage数据被用来增强相应结果的质量。很不幸的是,Nvidia没有公开提供关于这一步的相关文档,所以相关的细节不会在这里讨论。他们也没有在shader中提供可编程的形式访问coverage数据,因此这些数据将只会通过D3D或者OpenGL的函数来执行一个标准的resolve。

AMD has introduced a very similar feature in their 6900 series GPU’s, which they’ve named EQAA[3]. Like Nvidia, the feature can be enabled through driver options or special MSAA quality modes but it cannot be used in custom resolves performed via shaders.

AMD也在他们的6900系列GPU中介绍了一个非常类似的特性,他们称之为EQAA。和Nvidia一样,这个特性能够通过驱动选项或者特殊的MSAA quality模式来开启,但通过在shaders中执行的自定义resolves也不能够被使用。

Working with HDR and Tone Mapping

配合HDR和Tone Mapping(K:对HDR和Tone Mapping不太了解,翻译有待提高)

Before HDR became popular in real-time graphics, we essentially rendered display-ready color values to our MSAA render target with only simple post-processing passes applied after the resolve. This meant that after resolving with a box filter, the resulting gradients along triangle edges would be perceptually smooth between neighboring pixels3. However when HDR, exposure, and tone mapping are thrown into the mix there is no longer anything close to a linear relationship between the color rendered at each pixel and the perceived color displayed on the screen. As a result, you are no longer guaranteed to get the smooth gradient you would get when using a box filter to resolve LDR MSAA samples. This can seriously affect the output of the resolve, since it can end up appearing as if no MSAA is being used at all if there is extreme contrast on a geometry edge.

HDR在实时图形学中流行起来之前,我们主要渲染即将显示的颜色值到我们的MSAA render target中,并且会在resolve之后进行一些简单的后处理。这意味着,在使用box filter进行resolving之后,三角形边沿上的阶梯效果让人感觉与邻近的像素之间是平滑的。然后,当HDR,曝光和Tone mapping被引入,每个像素渲染后的颜色和在屏幕上观察到的颜色不再是接近线性的关系。结果是,你将不再能够保证可以获取平滑的梯度,当使用一个box filter去reolve LDR的MSAA samples。这会严重影响resolve后的输出,因为,当几何体边沿上出现强烈对比的情况下,它最终就如完全没有使用过MSAA一样。

This strange phenomenon was first pointed out (to my knowledge) byHumus (Emil Persson), who created a sample[4] demonstrating it as well as a corresponding ShaderX6 article. In this same sample he also demonstrated an alternative approach to MSAA resolves, where he used a custom resolve to apply tone mapping to each subsample individually before filtering. His results were pretty striking, as you can see from these images (left is a typical resolve, right is resolve after tone mapping):

这个奇怪的现象是Humus(Emil Persson)第一个指出的,他创建一个例子用于展示这个问题,也在Shader X6的文章中有描述。在这个相同的例子中,他也展示了一个对于MSAA resolves可选的方法,他使用一个自定义的resolve,在filtering之前将tone mapping单独地应用到每一个subsample中。他的结果相当显著,就如你能够在下面的图像中见到的(上边是一个传统的resolve,下边的是在tone mapping之后resolve):



HDR rendering with MSAA. The top image applies tone mapping after a standard MSAA resolve, while the bottom image applies tone mapping before the MSAA resolve.

使用MSAA渲染的HDR。上面的图像在一个标准的MSAAresolve之后应用tone mapping,下面的图像在MSAA resolve之前应用tone mapping。

It’s important to think about what it actually means to apply tone mapping before the resolve. Before tone mapping, we can actually consider ourselves to be working with values representing physical quantities of light within our simulation. Primarily, we’re dealing with the radiance of light reflecting off of a surface towards the eye. During the tone mapping phase, we attempt to convert from a physical quantity of light to a new value representing the color that should be displayed on the screen. What this means is that by changing where the resolve takes places, we’re actually oversampling a different signal! When resolving before tone mapping we’re oversampling the signal representing physical light being reflected towards the camera, and when resolving after tone mapping we’re oversampling the signal representing colors displayed on the screen. Therefore an important consideration we have to make is which signal we actually want to oversample. This directly ties into post-processing, since a modern game will typically have several post-processing effects needing to work with HDR radiance values rather than display colors. Thus we want to perform tone mapping as the last step in our post-processing chain. This presents a potential difficulty with the approach of tone mapping prior to resolve, since it means that all previous post-processing steps must work with a non-resolved MSAA as an input and also produce an MSAA buffer as an output. This can obviously have serious memory and performance implications, depending on how the passes are implemented.

这是很重要的去考虑,在resolve之前应用tone mapping意味着什么。在tone mapping之前,我们实际上可以认为,在我们的模拟当中,我们自己正在和代表光的物理量一起处理。首先,我们处理光从一个表面向眼睛反射出的辐射度。在tone mapping阶段,我们尝试从光的物理量中转换到一个新的值,这个值表示了应该在屏幕上显示的颜色。这意味着,通过改变resolve发生的位置,我们实际上oversampling一个不同的信号!当在tone mapping之前resolving,我们oversampling的信号表示的是被反射到摄像机的物理光,当在tone mapping之后resolving,我们oversampling的信号表示的是显示在屏幕上的颜色。因此一个我们必须关心的问题是,我们实际上想oversample的是哪个信号。这直接和post-processing紧密联系在一起,因为一个现代的游戏通常会有多个post-processing效果需要和HDR的辐射值相关,而不是和显示到屏幕上的颜色值。因此,我们想tone mapping在我们的post-processing链中最后一步被执行。Tone mapping先于resolve暗示出一个潜在的困难,因为它意味着所有之前的post-processing步骤必须和一个没有resolved的MSAA一起工作并且需要以一个MSAA的buffer作为一个输出。这显然会有严重的内存和性能问题,并依赖与所有的passes如何被实现。

MLAA and Other Post-Process AA Techniques

MLAA和其他后处理的反走样技术

Morphological Anti-Aliasing is an anti-aliasing technique originally developed by Intel[5] that initiated a wave of performance-oriented AA solutions commonly referred to as post-process anti-aliasing. This name is due to the fact that they do not fundamentally alter the rendering/rasterization pipeline like MSAA does. Instead, they work with only a non-MSAA render target to produce their results. In this way these techniques are rather interesting, in that they do not actually rely on increasing the sampling rate in order to reduce aliasing. Instead, they use what could be considered an advanced reconstruction filter in order to approximate the results that you would get from oversampling. In the case of MLAA in particular, this reconstruction filter uses pattern-matching in an attempt to detect the edges of triangles. The pattern-matching relies on the fact that for a fixed sample pattern, common patterns of pixels will be produced by the rasterizer for a triangle edge. By examining the color of the local neighborhood of pixels, the algorithm is able to estimate where a triangle edge is located and also the orientation of the line making up that particular edge. The edge and color information is then enough to estimate an analytical description of that particular edge, which can be used to calculate the exact fraction of the pixel that will be covered by the triangle. This is very powerful if the edge was calculated correctly, since it eliminates the need for multiple sub-pixel coverage samples. In fact if the coverage amount is used to blend the triangle color with the color behind that triangle, the results will match the output of standard MSAA rendering with infinite subsamples! The following image shows some of the patterns used for edge detection, and the result after blending:

Morphological Anti-Aliasing(形态学反走样)是一种反走样技术,最开始的时候是由Intel开发的,掀起了一波面向性能的针对post-process anti-aliasing通用的反走样解决方案。这个名字来源于这样的一个事实,他们不会像MSAA那样,从根本上改变rendering/rasterization管线。与此相反,他们只与非MSAA render target一起工作以产生出他们的结果。通过这些方式,这些技术更让人感兴趣,原因是他们实际上并不会依赖增加采样率来减低走样。他们使用一种被认为是高级的reconstruction filter以便能够接近使用oversampling的结果。在MLAA的情况下,这个reconstruction filter尝试使用模式匹配去检测三角形的边沿。模式匹配依赖于固定的采样形式,对于三角形的边沿来讲,像素的通用形式会被光栅器产生。通过检查邻近的像素颜色,这个算法能够检测出三角形的边沿在哪里以及其朝向。边沿和颜色信息足够用于建立这个特定边沿的数学解析描述,这个描述能够被用于计算出这个三角形的像素被覆盖的精确部分。这非常强大如果边沿能够被正确地计算,因为它消除了对于多个sub-pixel覆盖samples的需要。事实上,如果coverage的数量是被用于混合三角形的颜色与三角形背面的颜色,结果将会与使用无限多个subsamples的标准MSAA输出是一致的!下面的图像显示了一些使用边沿检测的方式,显示的结果是经过了混合之后的:


MLAA edge detection using pattern recognition (from MLAA: Efficiently Moving Antialiasing from the GPU to the CPU)

使用了模式识别的MLAA边沿检测

The major problems with MLAA and similar techniques occur when the algorithm does not accurately estimate the triangle edges. Looking at only a single frame, the resulting artifacts would be difficult or impossible to discern. However in a video stream the problems become apparent due to sub-pixel rotations of triangles that occur as the triangle or the camera move in world space. Take a look at the following image:

MLAA和类似的技术的主要问题发生在当算法不能够精确的估算出三角形的边沿的时候。观察只有一个单帧的情况,人为瑕疵的结果是很难或者不可能辨别出来的。然而,在一个视频流中,这个问题就会因为三角形中的sub-pixel的旋转而变得十分清晰,这种旋转的情况通常发生在三角形或者摄像机在世界空间上移动的时候。看一下下面的图像:


Two different triangle edge orientations resulting in the same rasterization pattern

两个不同的三角形边沿的朝向结果,在光栅化后是一样的

In this image, the blue line represents a triangle edge during one frame and the green line represents the same triangle edge in the following frame. The orientation of the edge relative to the pixels has changed, however in both cases only the leftmost pixel is marked as being “covered” by the rasterizer. Consequently the same pixel pattern (marked by the blue squares in the image) is produced by the rasterizer for both frames, and the MLAA algorithm detects the same edge pattern (denoted by the thick red line in the image). As the edge continues rotating, eventually it will cover the top-middle pixel’s sample point and that pixel will “turn on”. In the resulting video stream that pixel will appear to “pop on”, rather than smoothly transitioning from a non-covered state to a covered state. This is a trademark temporal artifact of geometric aliasing, and MLAA is incapable of reducing it. The artifact can be even more objectionable for thin or otherwise small geometry, where entire portions of the triangle will appear and disappear from frame to frame causing a “flickering” effect. MSAA and supersampling are able to reduce such artifacts due to the increased sampling rate used by the rasterizer, which results in several “intermediate” steps in the case of sub-pixel movement rather than pixels “popping” on and off. The following animated GIFs demonstrate this effect on a single rotating triangle4 (click on the images if they’re not animating for you):

在这个图像中,蓝色的线表示一个三角形的边沿在其中一帧中时的情况,绿色的线表示相同的三角形边沿在接下来一帧中的情况。边沿的朝向相对于像素的改变,在两个情况下只有左上角的像素被光栅器标记为覆盖的。因此,光栅器在两帧的情况下产生的像素形式是相同的(在图像中标记为蓝色方格),MLAA算法检测出相同的边沿形式(使用厚红色线在图像中指示出来了)。当边沿一直旋转时,最终它会覆盖上方中部的像素采样点并且该像素会"开启"。在视频流的结果中,像素将会出“瞬间”出现,而不是平滑地从非覆盖的状态转换成覆盖的状态。这是一个显著的时域人为瑕疵的几何走样,而MLAA没有能力减低这种现象。这种人为瑕疵在薄的或者小的几何体中甚至更让人讨厌,三角形的整个部分都将会随着帧数的改变而出现和消失,这导致了一种“闪烁”的现象。MSAA和supersampling都能够减低这种人为的瑕疵,原因在于通过增加采样率被光栅器利用,在子像素移动的情况下,这样的会产生出多个“中间”的步骤,而不是像素“瞬间”出现和消失。下面的动态GIFs在一个单独的旋转三角形中展示了这个效果:



Two animations of a rotating triangle. The top image has FXAA enabled, which uses techniques similar to MLAA to reconstruct edges. The bottom edge uses 4x MSAA, which supersamples the visibility test at the edges of the triangle. Notice how in the MSAA image pixels will transition through intermediate values as the triangle moves across the sub-pixel sample points. The FXAA image lacks this characteristic, despite producing smoother gradients along the edges.

两张旋转三角形的动画图像。上面的图像开启了FXAA,它使用了类似于MLAA的技术去重建边沿。下面的图像使用了4x MSAA,它在三角形的边沿上supersamples出对应的可见集。注意在MSAA的图像中,像素在三角形移动到子像素采样点位置时是如何从中间值中过度过来的。FXAA图像缺乏了这个特性,尽管能够产生出更平滑的边沿阶梯状。

Another potential issue with MLAA and similar algorithms is that they may fail to detect edges or detect “false” edges if only color information is used. In such cases the accuracy of the edge detection can be augmented by using a depth buffer and/or a normal buffer. Another potential issue is that the algorithm uses the color adjacent to a triangle as a proxy for the color behind the triangle, which could actually be different. However this tends to be non-objectionable in practice.

MLAA和类似的算法的另一个潜在的问题是,他们可能无法检测是边沿,或者检测是错误的边沿如果只有颜色信息被使用时。在这种情况中,使用depth buffer或者normal buffer能增强精确的边沿检测的能力。另一个潜在的问题是,算法使用了邻近的像素而这个像素是属于这个三角形背后的三角形的,实际上是不一样的。然而在实践中,这个问题并不太让人讨厌。

Footnotes

注释

1. The rasterization process on a modern GPU can actually be quite a bit more complicated than this, but those details aren’t particularly relevant to the scope of this article. Return to text
2. This mask is directly available to ps_5_0 pixel shaders in DX11 via the SV_Coverage system value. Return to text
3. The gamma-space rendering commonly used in the days before HDR would actually produce gradients that weren’t completely smooth, although later GPU’s supported performing the resolve in linear space. Either way the results were pretty close to being perceptually smooth, at least compared to the results that can occur with HDR rendering. Return to text
4. These animations were captured from the sample application that I’m going to discuss in the next article. So if you’d like to see live results without compression, you can download the sample app from that articleReturn to text

1.在现代的GPU中光栅化的过程比这个要更复杂一些,但这些细节并没有与这篇文章有特别的相关性。

2.在DX11中,通过SV_Coverage system value,这个mask在ps_5_0的pixel shaders能够直接利用。

3.在还没有HDR的时候,gamma-space的渲染就常常被使用。

4.这个动画图像是从我将会在下篇文章仲渡轮的例子程序中截取下来的。所以如果你想看一下没有经过压缩的结果,你可以下载这个例子app。

References

[1]http://msdn.microsoft.com/en-us/library/windows/desktop/cc627092%28v=vs.85%29.aspx
[2]http://www.nvidia.com/object/coverage-sampled-aa.html
[3]http://developer.amd.com/Resources/archive/ArchivedTools/gpu/radeon/assets/EQAA Modes for AMD HD 690 Series Cards.pdf
[4]http://www.humus.name/index.php?page=3D&ID=77
[5]MLAA: Efficiently Moving Antialiasing from the GPU to the CPU

Next article in the series: Experimenting with Reconstruction Filters for MSAA Resolve

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值