Day 5: Deconvolution and Checkerboard Artifacts

最新推荐文章于 2022-09-04 21:02:03 发布

ttppss

最新推荐文章于 2022-09-04 21:02:03 发布

阅读量234

点赞数

分类专栏：论文研读文章标签：深度学习神经网络计算机视觉卷积 ieee论文

本文链接：https://blog.csdn.net/ttppss/article/details/116646101

版权

18 篇文章 3 订阅

订阅专栏

今天在看关于反卷积（deconvolution）的时候，发现里面提到了这篇 paper ，遂找来一看，希望以后在使用deconvolution的时候，能够有些可以改进的地方。

看的这篇知乎中讲到，反卷积会造成棋盘格伪影，而接下来要讲的这篇文章对此现象进行了分析。

Core Ideas and Contribution

在很多网络中，最后一步将 feature map 还原成图像，或者在GAN中生成图像时，经常会碰到需要从低分辨率的图像用反卷积做上采样的过程。其实反卷积的过程，可以理解为从 feature map 上的点，“绘制”成一个较大的正方形的过程。（deconvolution layers allow the model to use every point in the small image to “paint” a square in the larger one）

在这里插入图片描述

这个问题在 kernel size 不能被 stride 整除的时候，最为严重（deconvolution has uneven overlap when the kernel size (the output window size) is not divisible by the stride (the spacing between points on the top)）。
以下是 stride = 2, kernel size = 3，和 stride = 3, kernel size = 3 时的情况，一目了然
网络在理论上来说会自己学习参数来避免这种情况，但在实际中，网络却几乎不能完全避免。
当图片有非常强烈的颜色（例如红色）时，效果更突出，因为神经网络层更倾向于输出平均色。但当需要输出一些离平均色比较远的颜色时，反卷积就需要出更多力量，因此造成更严重的棋盘伪影情况。(These artifacts tend to be most prominent when outputting unusual colors. Since neural network layers typically have a bias (a learned value added to the output) it’s easy to output the average color. The further a color — like bright red — is away from the average color, the more deconvolution needs to contribute.)
二维反卷积比一维造成的棋盘格效果更为突出。

在这里插入图片描述

用 stride = 1 的反卷积（它经常被用在一些成功的模型的最后一层）能有效地消除这种现象，但有的时候仍然会出现泄露。（并不太理解为什么 stride = 1 就更好，以下是 stride = 1, kernal size = 3 的情况，并不太看得出来）
原文为：They can remove artifacts of frequencies that divide their size, and reduce others artifacts of frequency less than their size.

这里的 frequency 到底是什么意思，没有太明白，是指和 stride size 相关的吗？

在这里插入图片描述

即使最好的情况下，反卷积也是非常脆弱的，因为即使在各项尺寸都精心挑选好的情况下，也很容易造成这些 artifact；最坏的情况下，只能说产生 artifacts 是反卷积的默认性质。（At best, deconvolution is fragile because it very easily represents artifact creating functions, even when the size is carefully chosen. At worst, creating artifacts is the default behavior of deconvolution.）

精心挑选尺寸，让 kernel size 可以被 stride size 整除，比如 kernel size = 3, stride size = 3. 不过这个方法还是容易产生 artifacts。
另一种方法是，把“上采样到更高的分辨率（upsampling to a higher resolution）”和“卷积过程”分开，来计算各种特征。这种方法达到了挺好的效果。（其实没明白下面的图到底是怎么计算的）（resize-convolution is implicitly weight-tying in a way that discourages high frequency artifacts）

在这里插入图片描述

作者使用最近邻插值得到了最好和效果，而双线性插值却很难。他们得出可能是超参正好是最近邻插值的最优点，而仅使用双线性插值会强烈地抗拒图片的高频特征。作者认为二者都不是上采样最终的方法。

关注

专栏目录