Games101 作业草稿 II：MSAA 和 SSAA 辨析

最新推荐文章于 2025-04-19 14:57:22 发布

RzBu11d023r

最新推荐文章于 2025-04-19 14:57:22 发布

阅读量2.4k

点赞数 3

分类专栏：图形学理解性笔记文章标签：线性代数图形学

本文链接：https://blog.csdn.net/u010180372/article/details/121277823

版权

理解性笔记同时被 2 个专栏收录

22 篇文章

订阅专栏

图形学

8 篇文章

订阅专栏

这篇文章是 2021年10月19日搞了大半天，其实最后还是有些小细节的问题没理解。

games101 到目前除了之前的矩阵变换花了点时间复习线性代数的变换，还没有感觉要做笔记的。但是这个 MSAA 和 SSAA 课程描述太简略了导致写的程序很多是错的和一些理解都是 missing something 的，网上的很多 sol 也是这样，所以还是要 mark 一下。（关键是很多不同的资料对这后者的定义都不一致）。

深入剖析MSAA - 风恋残雪 - 博客园 (cnblogs.com)

一开始是按自己之前的理解做出来的 SSAA 效果是不错的，我实际实现是在 rasterization 过程就做了 down sampling 加权平均。

到后面我看到 MSAA 的真正实现是要放到最后着色才能进行求权值算颜色的（采坑记录之抗锯齿-在软渲染器中实现MSAA - 知乎 (zhihu.com)的评论区）。

所以出现黑线的那些错误都是直接在光栅化阶段就进行了模糊的操作，但是这一步实际要等到最后（所有的三角形都光栅化完）才做。

请问FXAA、FSAA与MSAA有什么区别？效果和性能上哪个好？ - 知乎

然后我想起之前看到这个还不知道 shading 是什么，以为不兼容是因为 MSAA 是在后面再来 down sampling 而 SSAA 是中间就 sampling，实际我想错了。both 都要最后再 down sampling 的。

黑线的分析（他有蓝色也有绿色还有黑色）：一开始 rasterize green triangle 的时候，因为下面有 black 然后就卷到一起了，这样 frame buffer 的 point 就染上了黑色了，之后做 blue triangle 再卷一次，黑色部分并没有去除！

而 SSAA 如果在 rasterization 也 down sampling 的话也一样的，这样的结果是最终的 buffer 就有黑色了。那么为什么出来的结果没有黑线灰线呢？这是因为我们尽管这个时候 down 到最终的 buffer 里了，但是我写的时候每个迭代还是搞 sampling 用的 4x buffer 的，所以这个结果最终是正确的，只不过这样写是做了无用功，而应该把 down sampling 的过程方案最后 present 整个 buffer 之前。

我主要看这个文章（3dfx 的一个帮助论文）？

Super-sampling Anti-aliasing Analyzed Kristof Beets Dave Barron

他把这个叫做 FULL SCREEN AA。

这里就说了实际上是 render 结束了才进行 downsampling 的。

（这里的 OG 只不过是我们最常规的格子 SS，实际还有用 random 取点或者某些分布去点的）

找到这个文章也很详细，

A Quick Overview of MSAA – The Danger Zone (wordpress.com)https://mynameismjp.wordpress.com/2012/10/24/msaa-overview/

As we saw earlier, MSAA doesn’t actually improve on supersampling in terms of rasterization complexity or memory usage. At first glance we might conclude that the only advantage of MSAA is that pixel shader costs are reduced. However this isn’t actually true, since it’s also possible to improve bandwidth usage. Recall that the pixel shader is only executed once per pixel with MSAA. As a result, the same value is often written to all N subsamples of an MSAA render target. GPU hardware is able to exploit this by sending the pixel shader value coupled with another value indicating which subsamples should be written, which acts as a form of lossless compression. With such a compression scheme the bandwidth required to fill an MSAA render target can be significantly less than it would be for the supersampling case.

还有一些文章说 MSAA 是邻近取样，两个相邻的进行权值共享（这下卷积神经网络了属于），这种模糊应该也算。

这个时候才看懂了知乎文刀秋二那个回答：

MSAA（Multi-Sampling AA）则很聪明的只是在光栅化阶段，判断一个三角形是否被像素覆盖的时候会计算多个覆盖样本（Coverage sample），但是在pixel shader着色阶段计算像素颜色的时候每个像素还是只计算一次。例如下图是4xMSAA，三角形只覆盖了4个coverage sample中的2个。所以这个三角形需要生成一个fragment在pixel shader里着色，只不过生成的fragment还是在像素中央（位置，法线等信息插值到像素中央）然后只运行一次pixel shader，最后得到的结果在resolve阶段会乘以0.5，因为这个三角形只cover了一半的sample。现代所有GPU都在硬件上实现了这个算法，而且在shading的运算量远大于光栅化的今天，这个方法远比SSAA快很多。顺便提一下之前NV的CSAA，它就是更进一步的把coverage sample和depth，stencil test分开了。

就是说 SSAA 实际要存 4x 的 color，而且每个点（4倍的点）都要去计算这个 getColor 函数 which 并不是单一色块的，还涉及各种光线最后才能得到 color 的 rgb。

而 MSAA 只是在 rasterization 过程里做一个 mask (实现是用 fragment 记录这些东西)

这里有人和我有同样的疑惑：对多重采样（MSAA）原理的一些疑问？ - 知乎 (zhihu.com)

我也整理一些内容：

首先是 SSAA 的问题主要在于大片地方不需要抗锯齿的他也抗了（只有几何边缘），这样着色浪费了很多（比如一个 4grid ogss 渐变的 downsample 之后也只有一个值但是算了 4 个）。

我还有一个问题是只有一个点被 cover 的话还用中间点着色那不是等于没有着色吗？实际就是不着色！因为才 cover 一个点，直接不着色也不影响。

如果 cover 两个点但是没 cover center 的话，不着色也不影响。没必要把整个都给浅色着色，懂吧。

刚才就看到这个图（Rasterization Rules - Win32 apps | Microsoft Docs），但是没看懂，现在看懂了：

主要看 invocation 的方形就知道了：

实际取了等于没取：

累死了暂且画这么多吧。也可以不用这种方案：

默认每个pixel只执行1次，即Pixel Frequency，也就是说PS只执行算出每个像素中心的color，然后copy给4个sample。当然前面说了，这份copy不一定来自像素中心，也可以是像素的其他位置，例如DX中如果把color的插值方式声明为Centroid，那么当像素中心不在三角形内部（但有Sample在三角形内），则会选择三角形内的Sample，避免“Outerpolate”。

链接：https://www.zhihu.com/question/58595055/answer/157756410

这种情况的话就只 hit 一个没 hit center 也有颜色了，另一个答案也有讲了这个：

至于这次调用用的是什么坐标 —— 比如是 pixel center 还是某个 sample 坐标，就因硬件而异了

From <对多重采样（MSAA）原理的一些疑问？ - 知乎>

可能后面整个图形管线搞清楚了会更容易理解，到时候再回来看看吧。

（的确浪费了很多时间查中文博客资料各种参差不齐，可能一些文档论文或者 real-time rendering 都有更详细的，但是现在时间原因读不下英文大部头属于）

结果此时我的理解还是错的，又写了一遍，还是有黑线（其实这个让硬件光栅做可能更好，软件实现比较麻烦）

而且问题是，他的确按我想的来了，对绿色，取了 portion of 绿色或者蓝色，总之就是灰度值很低，所以是黑色的。

所以说必须要做一个 average 才行的，不可能取一个 triangle 的颜色就行了的。

计算机图形学四：抗锯齿SSAA及MSAA算法和遮挡剔除Z-Buffer算法_吃人的博客-CSDN博客_ssaa抗锯齿

看这个，和 SSAA 比较只是一个是算所有采样点的平均，一个是用中心来拟合采样点的。

一般GPU都支持两种shader iteration mode, per pixel and per sample. 从名字上也可以猜出他们的意义。Per pixel模式下，每个可见pixel只调用一次PS，所使用的position, normal coord, or color取自像素的中心采样点，这种情况相当于没有进行MSAA。Per sample是对每一个可见sample点调用一次PS，所采用的参数分别来自各采样点的内插值。一般使能MSAA的情况下，driver都会使用per sample 模式。

这个和 SSAA 有什么区别？

msaa:一个三角形覆盖片段内的多个采样点，只进行一次片段计算，如果你的renderTarget是一个多采样纹理，那么多个采样点的输出是一模一样的(深度除外)。而ssaa每个采样点都需要一次片段计算，输出是不一样的。

From <对多重采样（MSAA）原理的一些疑问？ - 知乎>

所以用 center 肯定是错的（我搞出了黑线）

这里根据知乎评论区一个大佬的帮助，给出了参考资料，我直接 rtr4 了，看来英文大部头不看也起码要做 reference book：

根据 cited 的注释，这里给了这个文章：

GLSL: Center or Centroid? (Or When Shaders Attack!){/exp:channel_entries} (opengl.org)

The pixel centroid is the center of gravity of the intersection of a pixel square and the interior of the primitive. For a fully covered pixel this is exactly the pixel center. For a partially covered pixel this is often a location other than the pixel center.

OpenGL allows implementers to choose the ideal centroid, or any location that is inside the intersection of the pixel square and the primitive, such as a sample point or a pixel center.

From <GLSL: Center or Centroid? (Or When Shaders Attack!){/exp:channel_entries}>

我现在可能有一种思路，因为原来 centroid 的意思是质心，centroid 内插值怎么实现上图的取蓝色点呢？理论上实现并没有指定，既可以用数字电路在 mask 里面取一个 positive 的：0010 只有一个 1 就取他的颜色，0011 就取这两个点任意一个？或者取两个点的插值，三个点就能取重心但是怎么想硬件流水线实现都不可能求 centroid

微软的 rasterization rules 也有这个（失策了没有详细看微软的说明（最后那几段）：

Centroid Sampling of Attributes when Multisample Antialiasing

By default, vertex attributes are interpolated to a pixel center during multisample antialiasing; if the pixel center is not covered, attributes are extrapolated to a pixel center. If a pixel shader input that contains the centroid semantic (assuming the pixel is not fully covered) will be sampled somewhere within the covered area of the pixel, possibly at one of the covered sample locations. A sample mask (specified by the rasterizer state) is applied prior to centroid computation. Therefore, a sample that is masked out will not be used as a centroid location.

The reference rasterizer chooses a sample location for centroid sampling similar to this:

The sample mask allows all samples. Use a pixel center if the pixel is covered or if none of the samples are covered. Otherwise, the first covered sample is chosen, starting from the pixel center and moving outward.
The sample mask turns off all samples but one (a common scenario). An application can implement multipass supersampling by cycling through single-bit sample-mask values and re-rendering the scene for each sample using centroid sampling. This would require that an application adjust derivatives to select appropriately more detailed texture mips for the higher texture sampling density.

From <Rasterization Rules - Win32 apps | Microsoft Docs>

就是从 mask 进行某些位运算从而只保留一个？

之前google这个 msaa 关键字啥也没收到，原来要用 multisampling 做关键字。。。。

OpenGL 里可以通过给一个

gl_SampleMask

From <Multisampling - OpenGL Wiki>

指定用哪个来做 output sample，这个在管线里会和 coverage mask 进行 and 然后输出那个 sample 的颜色。

OpenGL-Centroid Sampling - 知乎 (zhihu.com)

这个对 OpenGL 蓝宝书书本OpenGL SuperBible: Comprehensive Tutorial and Reference 翻译给出了部分实现说明：

然实现的具体方式可能会互相不一样，但是大部分使用离散的微分方法，通过邻近的像素叠加一部分变化的数值来求得插值结果。当我们插值的目标点在像素的同一个位置的时候，这样的算法是没什么问题的。在这种情况下，无论你使用哪个样本的位置都没关系。这些样本总是能够精确的被认为是一个个单独的像素。然而，当启动了centroid采样的时候，那些插值的时候邻近像素的目标参考样本的采样点可能就跟自己在像素中的位置不一样。因此，样本和像素就不是一一对应的关系了，所以离散的微分算法插值出来的数据就不够精确了。

所以还是某种近似，只是相对本来 center 来说更加精确一点点而已。

上面说的 rtr4 里面 cite 的那个文章也说了什么时候不要用 centroid：

The shading language specification considers derivatives derived from centroid varings to be so fraught with inaccuracy that it was resolved they are simply undefined.

From <GLSL: Center or Centroid? (Or When Shaders Attack!){/exp:channel_entries}> 、

还说了如果大多数情况都不会发生 extrapolation 的时候就不要开 centroid。

但是我感觉 extrapolations 发生怎么可能能推断出来？要这种情况发生我不是得知道三角形边界是不是穿过一个 pixel 的 center point，这种事怎么可能做到？？

结果是不读原著反而花费了更多时间去乱想。

突然想起学数电的时候建立的一种抽象，所有的位串变换逻辑都可以在有限个TTL门延时下完成！

比如我只需要列一个4位的位串 mapping，从1个到多个激活mask 串到只剩下一个 bit 被激活的位串只需要列出所有 possible cases 然后进行选择和用逻辑代数进行表示最后化简就行了，赢！

这么看来最后一块拼图也找到了。看来这下搞明白 MSAA 了！

然而一个现实是无论你的可理解性的逻辑模型做得多好，深度学习这种炼丹技术总是秦始皇摸电线的，这下 DLSS 彻底赢麻了，而且性能好多了，稳定高帧率的情况下能和 64x SSAA不相上下。