深入浅出理解转置卷积Conv2DTranspose

花花少年

已于 2024-10-23 15:27:07 修改

阅读量8.6k

点赞数 40

分类专栏：深度学习文章标签： Conv2DTranspose 转置卷积

于 2023-12-28 23:56:07 首次发布

本文链接：https://blog.csdn.net/m0_37605642/article/details/135280661

版权

深度学习专栏收录该内容

135 篇文章

订阅专栏

温故而知新，可以为师矣！

一、参考资料

论文：A guide to convolution arithmetic for deep learning
github源码：Convolution arithmetic
bilibili视频：转置卷积（transposed convolution）
转置卷积（Transposed Convolution）
【keras/Tensorflow/pytorch】Conv2D和Conv2DTranspose详解
 怎样通俗易懂地解释反卷积？
反卷积操作Conv2DTranspose

二、标准卷积(Conv2D)

在这里插入图片描述

1. `Conv2D` 计算公式

标准卷积计算公式有：
$o=\lfloor\frac{i+2p-k}{s}\rfloor+1 \quad \begin{array}{l} \\i=\textit{size of input}\\o=\textit{size of output}\\p=padding\\k=\textit{size of kernel}\\s=strides\end{array}$

其中， $\lfloor\cdot\rfloor$ 表示向下取整符号。

以特征图的高度Height为例，经过卷积操作之后，输出特征图计算公式为：
$H_{out}=\frac{H_{in}+2p-k}s+1\quad(1)$

2. `Conv2D`中的步长stride

2.1 当步长stride=1,p=0,k=3

在这里插入图片描述

输入特征图（蓝色）： $H_{in},W_{in})=(4,4)$ 。
标准卷积核： $kernel\_size(k)=3,stride(s)=1，padding=0$ 。
输出特征图（绿色）： $H_{out},W_{out})=(2,2)$ 。

代入 $公式 (1)$ 中，可得：
$H_{out}=\frac{H_{in}+2p-K}S+1\\ H_{out}=\frac{4+2*0-3}1+1=2$

2.2 当步长stride=2,p=1,k=3

在这里插入图片描述

输入特征图（蓝色）： $H_{in},W_{in})=(5,5)$ 。
标准卷积核： $kernel\_size(k)=3,stride(s)=2，padding=1$ 。
输出特征图（绿色）： $H_{out},W_{out})=(3,3)$ 。

代入 $公式 (1)$ 中，可得：
$H_{out}=\frac{H_{in}+2p-k}s+1\\ H_{out}=\frac{5+2*1-3}2+1=3$

三、转置卷积(Conv2DTranspose)

1. 引言

对于很多生成模型（如语义分割、自动编码器（Autoencoder）、GAN中的生成器等模型），我们通常希望进行与标准卷积相反的转换，即执行上采样。例如语义分割，首先用编码器提取特征图，然后用解码器恢复原始图像大小。

实现上采样的传统方法是应用插值方法或人工创建规则。而神经网络等现代架构则倾向于让网络自动学习合适的变换，无需人类干预。为了做到这一点，我们可以使用转置卷积。

2. 对转置卷积名称的误解

This operation is sometimes called “deconvolution” after (Zeiler et al., 2010), but is really the transpose (gradient) of atrous_conv2d rather than an actual deconvolution.
Deconvolutional Networks: Zeiler et al., 2010 (pdf)

转置卷积又叫反卷积(deconv or deconvolution)、逆卷积。然而，转置卷积是目前最为正规和主流的名称，因为这个名称更加贴切的描述了Conv2DTranspose 的计算过程，而其他的名字容易造成误导。在主流的深度学习框架中，如TensorFlow，Pytorch，Keras中的函数名都是 conv_transpose。所以，学习转置卷积之前，我们一定要弄清楚标准名称，遇到他人说反卷积、逆卷积也要帮其纠正，不好的名称容易给人以误解。

我们先说一下为什么人们很喜欢将转置卷积称为反卷积或逆卷积。首先举一个例子，将一个4x4的输入通过3x3的卷积核在进行普通卷积（无padding, stride=1），将得到一个2x2的输出。而转置卷积将一个2x2的输入通过同样3x3大小的卷积核将得到一个4x4的输出，看起来似乎是普通卷积的逆过程。就好像是加法的逆过程是减法，乘法的逆过程是除法一样，人们自然而然的认为这两个操作似乎是一个可逆的过程。所以，转置卷积的名字就由此而来。

用公式表达，标准卷积操作可表示为： $y = C x$ ，其中 $C$ 卷积矩阵， $x$ 为输入矩阵。如果是数学关系上的逆卷积则应该表示为： $x=C^{-1}y$ ，但是反卷积真实的关系应该为： $x=C^Ty$ 。由公式可以证明：所谓的“反卷积”，更准确来说就是“转置卷积”。

有些文献，转置卷积又被称为分数步长卷积(convolution with fractional strides) 或者反卷积(Deconvolution) 或者后向卷积(backwards strided convolution)，但 Deconvolution 具有误导性，不建议使用。因此，博主强烈推荐使用 Conv2DTranspose 和 convolution with fractional strides 两个名字，分别对应代码版和学术论文版。

I think transpose_conv2d or conv2d_transpose are the cleanest names.

3. `Conv2DTranspose`简介

转置卷积不是卷积的逆运算（一般卷积操作是不可逆的），转置卷积也是卷积。转置卷积并不是正向卷积的完全逆过程（逆运算），它不能完全恢复输入矩阵的数据，只能恢复输入矩阵的形状（shape）。也就是说，转置卷积只能还原出原始图像的尺寸，不能得到与原始图像一模一样的图像。

3.1 `Conv2DTranspose`的概念

转置卷积(Transposed Convolution)是一种特殊的正向卷积，先按照一定的比例通过padding零元素来扩充输入图像的尺寸，接着旋转卷积核，再进行正向卷积。转置卷积在语义分割或者对抗神经网络（GAN）中比较常见，其主要作用是做上采样（UpSampling）。
在这里插入图片描述

3.2 `Conv2D`与`Conv2DTranspose`对比

转置卷积和标准卷积有很大的区别，直接卷积是用一个“小窗户”去看一个“大世界”，而转置卷积是用一个“大窗户”的一部分去看“小世界”。

标准卷积（大图变小图）中，输入（5,5），步长（2,2），输出（3,3）。

在这里插入图片描述

转置卷积操作中（小图变大图），输入（3,3）输出（5,5）。
在这里插入图片描述

3.3 用矩阵乘法描述转置卷积

对于输入的元素矩阵 X 和输出的元素矩阵 Y ，用矩阵运算描述标准卷积计算过程：
$Y = CX$
转置卷积的操作就是要对这个矩阵运算过程进行逆运算，即通过 $C$ 和 $Y$ 得到 $X$ ，根据各个矩阵的尺寸大小，我们很轻易的得到转置卷积的计算过程：
$X=C^{T}Y$
如果代入数字计算会发现，转置卷积的操作只是恢复了矩阵 $X$ 的尺寸大小，并不能恢复 $X$ 的每个元素值。 该结论的证明，后文有介绍。

3.4 转置卷积的数学推导

转置卷积（Transpose Convolution）
抽丝剥茧，带你理解转置卷积（反卷积）

定义一个尺寸为 4×4 的输入矩阵 input：
$\left.input=\left[\begin{array}{cccc}x_1&x_2&x_3&x_4\\x_5&x_6&x_7&x_8\\x_9&x_{10}&x_{11}&x_{12}\\x_{13}&x_{14}&x_{15}&x_{16}\end{array}\right.\right]$
一个尺寸为3×3 的标准卷积核 kernel：
$kernel=\begin{bmatrix}w_{0,0}&w_{0,1}&w_{0,2}\\w_{1,0}&w_{1,1}&w_{1,2}\\w_{2,0}&w_{2,1}&w_{2,2}\end{bmatrix}$
令步长 $s t r i d es = 1$ ，填充 $p a dd in g = 0$ ，即 $i = 4, k = 3, s = 1, p = 0$ ，则按照 $公式 (1)$ 计算可得尺寸为 2×2的输出矩阵 $o u tp u t$ ：
$output=\begin{bmatrix}y_0&y_1\\y_2&y_3\end{bmatrix}$
这里，我们换一个表达方式，我们将输入矩阵 input 和输出矩阵 output 展开成列向量 X 和列向量 Y ，那么向量 X 和向量 Y 的尺寸就分别是 16×1 和 4×1，可以分别用如下公式表示：

把输入矩阵 input 展开成一个16×1列向量 $X$ ：
$\begin{array}{llllllllllll}X=[x_{1}&x_{2}&x_{3}&x_{4}&x_{5}&x_{6}&x_{7}&x_{8}&x_{9}&x_{10}&x_{11}&x_{12}&x_{13}&x_{14}&x_{15}&x_{16}]^T\end{array}$
把输出矩阵 $o u tp u t$ 展开成一个 4×1列向量 $Y$ ：
$Y=\begin{bmatrix}y_1&y_2&y_3&y_4\end{bmatrix}^T$
再用矩阵运算来描述标准卷积运算，这里使用矩阵 C 来表示标准卷积核矩阵：
$Y = CX$
经过推导，我们可以得到这个稀疏矩阵 C 的尺寸为 4×16：
$C=\begin{bmatrix}u_{0,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{1,2}&0&w_{2,0}&w_{2,1}&w_{2,3}&0&0&0&0&0\\0&w_{0,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{1,2}&0&w_{2,0}&w_{2,1}&w_{2,2}&0&0&0&0\\0&0&0&0&w_{0,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{1,2}&0&w_{2,0}&w_{2,1}&w_{2,2}&0\\0&0&0&0&0&w_{0,0}&w_{0,1}&w_{0,2}&0&w_{1,0}&w_{1,1}&w_{1,2}&0&w_{2,0}&w_{2,1}&w_{2,2}\end{bmatrix}$
上述矩阵运算如下图所示：
在这里插入图片描述

对于转置卷积，其实是正向卷积的逆运算，即通过 $C$ 和 $Y$ 得到 $X$ ：
$X=C^{T}Y$
此时，新的稀疏矩阵就变成了尺寸为 16×4 的 $C^T$ ，如下图直观展示转置后的卷积运算。这里，用来进行转置卷积的权重矩阵不一定来自于原标准卷积核矩阵，只是权重矩阵的形状和转置后的卷积核矩阵相同。
在这里插入图片描述

再将 16×1 的输出结果进行重新排序，就可以通过尺寸为 2×2 的输入矩阵得到尺寸为4×4 的输出矩阵。

注意：如果代入数字计算会发现，转置卷积的操作只是恢复了矩阵 $X$ 的尺寸大小，并不能恢复 $X$ 的每个元素值。 该结论的证明，后文有介绍。

4. (PyTorch)`Conv2DTranspose`计算过程

转置卷积核大小为kernel(k)，步长stride(s)，填充padding§，则转置卷积的计算步骤可以总结为三步：

第一步：计算新的输入特征图；
第二步：计算转置卷积核；
第三步：执行标准卷积操作。

4.1 第一步：计算新的输入特征图

对输入特征图 $M$ 进行插值（interpolation）零元素，得到新的输入特征图 $M^{\prime}$ 。

以特征图的高度Height为例，输入特征图的Height高为 $H_{in}$ ，中间有 $H_{in}-1)$ 个空隙。
两个相邻位置中间的插值零元素的个数： $s - 1$ ， $s$ 表示步长。
Height方向上总共插值零元素的个数： $H_{in}-1) * (s-1)$ 。
新的输入特征图大小： $H_{in}^{\prime} = H_{in} + (H_{in}-1)*(s-1)$ 。

4.2 第二步：计算转置卷积核

对标准卷积核 $K$ 进行上下、左右翻转，得到转置卷积核 $K^{\prime}$ 。

已知：
标准卷积核大小： $k$ ,
标准卷积核stride： $s$ ,
标准卷积核padding： $p$

转置卷积核大小： $k^{\prime}=k$ 。
转置卷积核stride： $s^{\prime}=1$ ，该值永远为1。
转置卷积核padding： $p^{\prime} = k-p-1$ 。该公式是如何产生的，下文有解释。

4.3 第三步：执行标准卷积操作

用转置卷积核对新的输入特征图进行标准卷积操作，得到的结果就是转置卷积的结果。

根据标准卷积的计算公式可知：
$\mathrm{H_{out}}=\frac{(\mathrm{H_{in}^{\prime}}+2\mathrm{p^{\prime}}-k^{\prime})}{\mathrm{s^{\prime}}}+1\quad(2)$

$H^{\prime} = H_{in} + (H_{in}-1)*(s-1)$ ，
$k^{\prime}=k$ ,
$s^{\prime}=1$ ,
$p^{\prime} = k-p-1$ 。

将第一、二步中变换的结果代入上式，可得：
$\text{H}_{out}=\frac{(\text{H}_{in}+\text{H}_{in}*s-\text{H}-\text{s}+1)+2*(\text{k}-\text{p}-1)-\text{k}}{\text{s}'}+1\quad(3)$
化简，可得：
$\text{H}_{out}=\frac{(\text{H}_{in}-1\text{)*s}+\text{k}-2\text{p}-1}{\text{s}'}+1\quad(4)$
上式中，分母步长 $s^{\prime}=1$ ，则最终结果为：
$\mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2\text{p}+\mathrm{k}\quad(5.1)$

综上所述，可以求得特征图Height和Width两个方向上进行转置卷积计算的结果：
$H_{out}=(H_{in}−1)×stride[0]−2×padding[0]+kernel\_size[0]\\ W_{out}=(W_{in}−1)×stride[1]−2×padding[1]+kernel\_size[1]$

4.4 证明 $p^{\prime}=k-p-1$

变换 $公式 (5.1)$ 可得：
$H_{in}=\frac{H_{out}+2p-k}s+1\quad(5.2)$

由 $公式 (5.2)$ 和 $公式 (1)$ 可以看出， Conv2D 和 Conv2DTranspose 在输入和输出形状大小互为逆（inverses）。

Note: torch.nn.ConvTranspose2d
The padding argument effectively adds dilation * (kernel_size - 1) - padding amount of zero padding to both sizes of the input. This is set so that when a Conv2d and a ConvTranspose2d are initialized with same parameters, they are inverses of each other in regard to the input and output shapes.

参数padding有效地将 $dilation * (kernel\_size - 1) - padding$ 零填充的填充量添加到两种大小的输入中。这样设置是为了当Conv2d和ConvTranspose2d用相同的参数初始化时，它们输入和输出的形状大小互为逆。
简单理解，参数padding的作用是，使得 Conv2d 和 ConvTranspose2d 输入输出的形状大小互为逆。

第二步中 $p^{\prime} = k-p-1$ 计算公式是如何产生的呢？其实就是根据“ Conv2d 和 ConvTranspose2d 输入输出的形状大小互为逆” 的条件推导（反推）得来的。可以简单证明：

已知条件：
$H^{\prime} = H_{in} + (H_{in}-1)*(s-1)$ ，
$k^{\prime}=k$ ,
$s^{\prime}=1$ ,
$p^{\prime}$ 未知待求。

将已知条件代入 $公式 (2)$ 中，可得：
$\text{H}_{out}=\frac{ H_{in} + (H_{in}-1)*(s-1)+2*p^{\prime}-k^{\prime}}{\text{s}'}+1$
化简，可得：
$\mathrm{H}_{out}=(H_{in}-1)*s+2*p^{\prime}-k+2\quad(6)$
根据“ Conv2d 和 ConvTranspose2d 输入输出的形状大小互为逆”，可得：
$\mathrm{H}_{in}=(H_{out}-1)*s+2*p^{\prime}-k+2\quad(7)$
变换公式可得：
$\mathrm{H_{out}}=\frac{(\mathrm{H_{in}}-2*p^{\prime}+k-2)}{s}+1\quad(8)$
由 $公式 (8)$ 与 $公式 (1)$ 可得：
$2p-k=-2*p^{\prime}+k-2$
解得：
$p^{\prime}=k-p-1\quad(9)$

证闭。

4.5 `Conv2DTranspose`示例

在这里插入图片描述

输入特征图 $M$ ： $H_{in}=3$ 。
标准卷积核 $K$ ： $k = 3, s = 2, p = 1$ 。
新的输入特征图 $M^{\prime}$ ： $H_{in}^{\prime}=3+(3−1)∗(2−1)=3+2=5$ 。注意加上padding之后才是7。
转置卷积核 $K^{\prime}$ ： $k^{\prime}=k,s^{\prime}=1,p^{\prime}=3−1−1=1$ 。
转置卷积计算的最终结果： $\mathrm{H_{out}}=(3-1)*2-2*1+3=5$ 。

在这里插入图片描述

5. (TensorFlow)`Conv2DTranspose`计算过程

TensorFlow版本计算Conv2DTranspose，首先需要构造output_shape，输出尺寸的计算公式：
$o=s(i-1)+a+k-2p,a\in\{0,\ldots,s-1\}$
这个公式实际上是卷积输出尺寸的逆运算，之所以不唯一原因在于向下取整这个操作，使得输出尺寸由卷积操作的输入决定。

5.1 第一步：计算新的输入特征图

与PyTorch一致。

5.2 第二步：计算转置卷积核

与PyTorch一致。

5.3 第三步：执行标准卷积操作

第三步与PyTorch不同的原因是，TensorFlow的padding算法与PyTorch不同，导致执行标准卷积操作的输出不同。
关于TensorFlow的padding填充算法的介绍，可参考博客：深入浅出理解TensorFlow的padding填充算法。

以特征图的高度Height为例，TensorFlow的转置卷积计算公式分为两种情况：

当 $(\mathrm H_{out}+2p-k)\%s=0$ 时，转置卷积公式为：
$\mathrm H_{out}=(\mathrm H_{in}-1)*\mathrm s-2p+\mathrm k \quad(10)$

如上图所示，我们选择一个输入 $in p u t$ 尺寸为 3×3 ，卷积核 $k er n e l$ 尺寸为 3×3 ，步长 $s t r i d es = 2$ ，填充 $p a dd in g = 1$ ，即 $i = 3, k = 3, s = 2, p = 1$ ，则输出 $o u tp u t$ 的尺寸为 $o = (3 - 1) x 2 - 2 + 3 = 5$ 。
当 $(\mathrm H_{out}+2p-k)\%s\neq0$ 时，转置卷积计算公式为：

$\mathrm H_{out}=(\mathrm H_{in}-1)*\mathrm s-2p+\mathrm k+(H\_out+2p-k)\%s \quad(11)$

如上图所示，我们选择一个输入 $in p u t$ 的尺寸为 3×3 ，卷积核 $k er n e l$ 的尺寸为 3×3 ，步长 $s t r i d es = 2$ ，填充 $p a dd in g = 1$ ，即 $i = 3, k = 3, s = 2, p = 1$ ，则输出 $o u tp u t$ 的尺寸为 $o = (3 - 1) x 2 - 2 + 3 + 1 = 6$ 。

上式中， $2p=p\_top+p\_bottom$ ， $p\_top和p\_bottom$ 分别表示Height方向上top顶部和bottom底部的padding。
通常，已知 $H_{out}$ ，或者 $s t r i d e = 1$ ，可求出相关的参数（ $p, H\_out$ ）。

6. 转置卷积只能恢复尺寸，不能恢复数值

标准卷积操作：

import tensorflow as tf


value = tf.reshape(tf.constant([[1., 2., 3., 4., 5.],
                                [6., 7., 8., 9., 10.],
                                [11., 12., 13., 14., 15.],
                                [16., 17., 18., 19., 20.],
                                [21., 22., 23., 24., 25.]]), [1, 5, 5, 1])
filter = tf.reshape(tf.constant([[1., 0.],
                                 [0., 1.]]), [2, 2, 1, 1])
output = tf.nn.conv2d(value, filter, [1, 2, 2, 1], 'SAME')
print(output)

"""
tf.Tensor(
[[[[ 8.]
   [12.]
   [ 5.]]

  [[28.]
   [32.]
   [15.]]

  [[21.]
   [23.]
   [25.]]]], shape=(1, 3, 3, 1), dtype=float32)
"""

标准卷积的结果是：
$output=\begin{bmatrix}8&12&5\\28&32&15\\21&23&25\end{bmatrix}$
我们用和标准卷积操作完全相同的参数对这个结果进行转置卷积操作：

input = tf.reshape(tf.constant([[8., 12., 5.],
                                [28., 32., 15.],
                                [21., 23., 25.]]), [1, 3, 3, 1])

kernel = tf.reshape(tf.constant([[1., 0.],
                               [0., 1.]]), [2, 2, 1, 1])

output = tf.nn.conv2d_transpose(input=input,
                                filters=kernel,
                                output_shape=[1, 5, 5, 1],
                                strides=[1, 2, 2, 1],
                                padding='SAME')
print(output)
"""
tf.Tensor(
[[[[ 8.]
   [ 0.]
   [12.]
   [ 0.]
   [ 5.]]

  [[ 0.]
   [ 8.]
   [ 0.]
   [12.]
   [ 0.]]

  [[28.]
   [ 0.]
   [32.]
   [ 0.]
   [15.]]

  [[ 0.]
   [28.]
   [ 0.]
   [32.]
   [ 0.]]

  [[21.]
   [ 0.]
   [23.]
   [ 0.]
   [25.]]]], shape=(1, 5, 5, 1), dtype=float32)
"""

转置卷积的结果是：
$output=\begin{bmatrix}8&0&12&0&5\\0&8&0&12&0\\28&0&32&0&15\\0&28&0&32&0\\21&0&23&0&25\end{bmatrix}$
由此可见，转置卷积只能恢复尺寸，不能恢复数值。

7. `Conv2DTranspose`中的步长stride

下图展示了转置卷积中不同s和p的情况：


s=1, p=0, k=3	s=2, p=0, k=3	s=2, p=1, k=3

7.1 当步长stride=1,p=0,k=3

在这里插入图片描述

输入特征图（蓝色）： $H_{in},W_{in})=(2,2)$ 。
标准卷积核： $kernel\_size(k)=3,stride(s)=1, padding(p)=0$ 。
新的输入特征图： $H_{in}^{\prime} =2+(2-1)*(1-1)=2$ 。如图上图所示，插值变换后得到的新的输入特征图为(2,2)。
转置卷积核： $kernel\_size(k^{\prime})=3,stride(s^{\prime})=1, padding(p^{\prime})=3-0-1=2$ 。如图上图所示，填充padding为2。
输出特征图（绿色）： $H_{out},W_{out})=(4,4)$ 。

代入 $公式 (5)$ 中，可得：
$\mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2\text{p}+\mathrm{k}\\ \mathrm{H}_{out}=(2-1)*1-2*0+3=4$

7.2 当步长stride=2,p=0,k=3

在这里插入图片描述

输入特征图（蓝色）： $H_{in},W_{in})=(2,2)$ 。
卷积核： $k = 3, s t r i d e (s) = 2, p a dd in g = 0$ 。
新的输入特征图： $H_{in}^{\prime} =2+(2-1)*(2-1)=3$ 。如图上图所示，插值变换后得到的新的输入特征图为(3,3)。
转置卷积核： $kernel\_size(k^{\prime})=3,stride(s^{\prime})=1, padding(p^{\prime})=3-0-1=2$ 。如图上图所示，填充padding为2。
输出特征图（绿色）： $H_{out},W_{out})=(5,5)$ 。

代入 $公式 (5)$ 中，可得：
$\mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2*\text{p}+\mathrm{k}\\ \mathrm{H}_{out}=(2-1)*2-2*0+3=5$

7.3 当步长stride=2,p=1,k=3

在这里插入图片描述

输入特征图（蓝色）： $H_{in},W_{in})=(3,3)$ 。
卷积核： $k = 3, s t r i d e (s) = 2, p a dd in g = 1$ 。
新的输入特征图： $H_{in}^{\prime} =3+(3-1)*(2-1)=5$ 。如图上图所示，插值变换后得到的新的输入特征图为(5,5)。
转置卷积核： $kernel\_size(k^{\prime})=3,stride(s^{\prime})=1, padding(p^{\prime})=3-1-1=1$ 。如图上图所示，填充padding为1。
输出特征图（绿色）： $H_{out},W_{out})=(5,5)$ 。

代入 $公式 (5)$ 中，可得：
$\mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2*\text{p}+\mathrm{k}\\ \mathrm{H}_{out}=(3-1)*2-2*1+3=5$

8. 棋盘效应(Checkerboard Artifacts)

棋盘效应(Checkerboard Artifacts)
卷积操作总结（三）—— 转置卷积棋盘效应产生原因及解决
 Deconvolution and Checkerboard Artifacts

棋盘效应是由于转置卷积的“不均匀重叠”（Uneven overlap）的结果，使图像中某个部位的颜色比其他部位更深。

在这里插入图片描述

在转置卷积过程中，如果参数配置不合理，最终得到的图像容易产生棋盘效应。如果是多层堆叠的转置卷积，棋盘效应也会一层一层传递下去。

9. 总结

Conv2D，特征图变换：

$H_{out}=\frac{H_{in}+2p-k}s+1$

Conv2DTranspose，特征图变换：

$\mathrm{H}_{out}=(\mathrm{H}_{in}-1)*\text{s}-2\text{p}+\mathrm{k}$
3. 标准卷积核： $s, p, k$ ；转置卷积核： $s=1,p^{\prime}=k-p-1,k^{\prime}=k$ 。
4. 第一步的stride决定了插值填充（零元素的数量），扩充的倍数与strides有关，扩大的方式是在元素之间插 strides -1 个0。第三步的 stride=1 永远不变。
5. Conv2D 和 Conv2DTranspose 输入输出的形状大小互为逆。
6. 标准卷积（大图变小图，(5,5)到(3,3)），转置卷积（小图变大图，(3,3)到(5,5)）。

四、相关经验

1. (keras)`tf.keras.layers.Conv2DTranspose`

TF官方文档：tf.keras.layers.Conv2DTranspose
TensorFlow函数：tf.layers.Conv2DTranspose
tensorflow中转置卷积 conv2d_transpose 的实现机理及特殊情况处理方式

函数原型

以 TensorFlow v2.14.0 版本为例，介绍转置卷积。

tf.keras.layers.Conv2DTranspose(
    filters,
    kernel_size,
    strides=(1, 1),
    padding='valid',
    output_padding=None,
    data_format=None,
    dilation_rate=(1, 1),
    activation=None,
    use_bias=True,
    kernel_initializer='glorot_uniform',
    bias_initializer='zeros',
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None,
    **kwargs
)

参数解释

filters：整数，输出的维度(即输出的channel数量)。
kernel_size：一个元组或2个正整数的列表，指定filters的空间维度。
strides：一个元组或2个正整数的列表，指定卷积的步长。
padding：一个字符串，“valid”或者“same”，填充算法。
output_padding：一个元组或2个正整数的列表，指定输出张量高度height和宽度width两个方向上的padding。如果设置为“None”（默认），则自动推断输出形状。
data_format：一个字符串，可以是一个 channels_last(默认)、channels_first，表示输入维度的顺序。channels_last 对应于具有形状(batch, height, width, channels)的输入，而 channels_first 对应于具有形状(batch, channels, height, width)的输入。
dilation_rate：整数，指定膨胀卷积的所有空间维度的膨胀率。
activation：指定激活函数，如果设置为“None”（默认），则不使用激活函数。
use_bias：Boolean，表示该层是否使用偏置。
kernel_initializer：卷积核权重矩阵的初始化器(参阅 keras.initializers)，默认为“glorot_uniform”。
bias_initializer：偏置向量的初始化器，默认为“zeros”。
kernel_regularizer：应用于卷积核权重矩阵的正则化器（参阅 keras.regularizers）。
bias_regularizer：应用于偏置向量的正则化器。
activity_regularizer：应用于激活层输出的正则化器。
kernel_constraint：应用于卷积核的约束函数。
bias_constraint：应用于偏置向量的约束函数。

Input shape

4D tensor with shape: (batch_size, channels, rows, cols) if data_format=channels_first or 4D tensor with shape: (batch_size, rows, cols, channels) if data_format=channels_last.

Output shape

4D tensor with shape: (batch_size, filters, new_rows, new_cols) if data_format=channels_first or 4D tensor with shape: (batch_size, new_rows, new_cols, filters) if data_format=channels_last.

new_rows = ((rows - 1) * strides[0] + kernel_size[0] - 2 * padding[0] +
output_padding[0])
new_cols = ((cols - 1) * strides[1] + kernel_size[1] - 2 * padding[1] +
output_padding[1])

2. (TensorFlow)`tf.nn.conv2d_transpose`

TF官方文档：tf.nn.conv2d_transpose

tf.nn.conv2d_transpose is sometimes called “deconvolution” after (Zeiler et al., 2010), but is really the transpose (gradient) of conv2d rather than an actual deconvolution.

核心源码

查看tf.nn.conv2d_transpose 源码位于nn_ops.py#L2689-L2773，经过一番查找发现函数指向nn_ops.py#L2607：

# 卷积操作的逆运算（反向推导），已知output，计算input
# https://github.com/tensorflow/tensorflow/blob/v2.14.0/tensorflow/python/ops/nn_ops.py#L2547-L2609

@tf_export(v1=["nn.conv2d_backprop_input"])
@dispatch.add_dispatch_support
def conv2d_backprop_input(  # pylint: disable=redefined-builtin,dangerous-default-value
    input_sizes,
    filter=None,
    out_backprop=None,
    strides=None,
    padding=None,
    use_cudnn_on_gpu=True,
    data_format="NHWC",
    dilations=[1, 1, 1, 1],
    name=None,
    filters=None):
  
  filter = deprecation.deprecated_argument_lookup(  # 重命名filter，没有改变值
      "filters", filters, "filter", filter)
  padding, explicit_paddings = convert_padding(padding)  ## 改变了padding
  return gen_nn_ops.conv2d_backprop_input(
      input_sizes, filter, out_backprop, strides, padding, use_cudnn_on_gpu,
      explicit_paddings, data_format, dilations, name)

# conv2d与conv2d_transpose输入输出的形状大小互为逆
# https://github.com/tensorflow/tensorflow/blob/v2.14.0/tensorflow/python/ops/nn_ops.py#L2689-L2773

@tf_export("nn.conv2d_transpose", v1=[])
@dispatch.add_dispatch_support
def conv2d_transpose_v2(
    input,  # pylint: disable=redefined-builtin
    filters,  # pylint: disable=redefined-builtin
    output_shape,
    strides,
    padding="SAME",
    data_format="NHWC",
    dilations=None,
    name=None):
    
  with ops.name_scope(name, "conv2d_transpose",
                      [input, filter, output_shape]) as name:
    if data_format is None:
      data_format = "NHWC"
    channel_index = 1 if data_format.startswith("NC") else 3

    strides = _get_sequence(strides, 2, channel_index, "strides")
    dilations = _get_sequence(dilations, 2, channel_index, "dilations")
    padding, explicit_paddings = convert_padding(padding)  

    return gen_nn_ops.conv2d_backprop_input(  # 卷积的反向推导
        input_sizes=output_shape,
        filter=filters,
        out_backprop=input,
        strides=strides,
        padding=padding,
        explicit_paddings=explicit_paddings,
        data_format=data_format,
        dilations=dilations,
        name=name)

由上述源码可知，转置卷积运算只改变了pading，filter是没有改变的，最终指向了标准卷积的反向推导部分。

2.1 函数原型

tf.nn.conv2d_transpose(
    input,
    filters,
    output_shape,
    strides,
    padding='SAME',
    data_format='NHWC',
    dilations=None,
    name=None
)

参数解释

input：A 4-D Tensor of type float and shape [batch, height, width,in_channels] for NHWC data format or [batch, in_channels, height,width] for NCHW data format.
filters：A 4-D Tensor with the same type as input and shape [height,width, output_channels, in_channels]. filter’s in_channels dimension must match that of input.
output_shape：A 1-D Tensor representing the output shape of the deconvolution op.
strides：一个整数或1,2,4个正整数的列表，指定卷积的步长。
padding：Either the string "SAME" or "VALID" indicating the type of padding algorithm to use.
data_format：A string. ‘NHWC’ and ‘NCHW’ are supported.
dilations：整数或或1,2,4个正整数的列表（默认为1），指定膨胀卷积的所有空间维度的膨胀率。
name：Optional name for the returned tensor.

2.2 代码示例

import tensorflow as tf
import numpy as np

def test_conv2d_transpose():
    # input batch shape = (1, 2, 2, 1) -> (batch_size, height, width, channels) - 2x2x1 image in batch of 1
    x = tf.constant(np.array([[
        [[1], [2]], 
        [[3], [4]]
    ]]), tf.float32)

    # filters shape = (3, 3, 1, 1) -> (height, width, input_channels, output_channels) - 3x3x1 filter
    f = tf.constant(np.array([
        [[[1]], [[1]], [[1]]], 
        [[[1]], [[1]], [[1]]], 
        [[[1]], [[1]], [[1]]]
    ]), tf.float32)

    conv = tf.nn.conv2d_transpose(x, f, output_shape=(1, 4, 4, 1), strides=[1, 2, 2, 1], padding='SAME')

    with tf.Session() as session:
        result = session.run(conv)

    assert (np.array([[
        [[1.0], [1.0],  [3.0], [2.0]],
        [[1.0], [1.0],  [3.0], [2.0]],
        [[4.0], [4.0], [10.0], [6.0]],
        [[3.0], [3.0],  [7.0], [4.0]]]]) == result).all()

2.3 代码分析

已知条件：

# 2*2*1 ——> 4*4*1
(in_height, in_width)=(2, 2)
(filter_height, filter_width)=(3, 3)
(strides[1], strides[2])=(2, 2)

根据TensorFlow的padding填充算法，可知：

in_height % strides[1] = 2%2 = 0
pad_along_height = max(filter_height - stride_height, 0)=max(3-2, 0)=1

pad_top = pad_along_height // 2 = 1 // 2 = 0
pad_bottom = pad_along_height - pad_top = 1-0 = 1

则求得Height方向的padding：

(pad_top, pad_bottom)=(0, 1)

同理，求得Width方向的padding为：

(pad_left, pad_right)=(0, 1)

经过转置卷积操作，输出尺寸翻倍，即 ${H}_{out}=4$ ，则：
$(\mathrm{H}_{out}+2p-k)\%s=(4+(0+1)-3)\%2=0$
那么，将已知条件代入 $公式 (10)$ ，可得：
$H\_out = (2-1)*2-(0+1)+3=4$
同理，求得： $H\_out = (2-1)*2-(0+1)+3=4$

综上所述，输出尺寸为 $(4, 4, 1)$ ，与代码验证的结果一致。

3. (PyTorch)`torch.nn.ConvTranspose2d`

torch.nn.ConvTranspose2d

3.1 函数原型

CLASS torch.nn.ConvTranspose2d(in_channels, 
                               out_channels, 
                               kernel_size, 
                               stride=1, 
                               padding=0, 
                               output_padding=0, 
                               groups=1, 
                               bias=True, 
                               dilation=1, 
                               padding_mode='zeros', 
                               device=None, 
                               dtype=None)

参数解释

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – dilation * (kernel_size - 1) - padding zero-padding will be added to both sides of each dimension in the input. Default: 0
output_padding (int or tuple, optional) – Additional size added to one side of each dimension in the output shape. Default: 0. Note that output_padding is only used to find output shape, but does not actually add zero-padding to output. 在计算得到的输出特征图的高、宽方向各填充几行或列0（注意，这里只是在上下以及左右的一侧one side填充，并不是两侧都填充）
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1. 当使用到组卷积时才会用到的参数，默认为1即普通卷积。
bias (bool, optional) – If True, adds a learnable bias to the output. Default: True
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1. 当使用到空洞卷积（膨胀卷积）时才会使用该参数，默认为1即普通卷积。

Variables

weight (Tensor) – the learnable weights of the module of shape $in_channels , out_channels groups , kernel_size[0] , kernel_size[1] ) (\text{in\_channels}, \frac{\text{out\_channels}}{\text{groups}}, \text{kernel\_size[0]}, \text{kernel\_size[1]})$ .
bias (Tensor) – the learnable bias of the module of shape (out_channels) If bias is True.

3.2 代码示例

下面使用Pytorch框架来模拟 s=1, p=0, k=3 的转置卷积操作：

在这里插入图片描述
在代码中 transposed_conv_official 函数是使用官方的转置卷积进行计算，transposed_conv_self 函数是按照上面讲的步骤自己对输入特征图进行填充并通过标准卷积得到的结果。

import torch
import torch.nn as nn


def transposed_conv_official():
    # 输入特征图
    feature_map = torch.as_tensor([[1, 0],
                                   [2, 1]], dtype=torch.float32).reshape([1, 1, 2, 2])
    print(feature_map)
    
    # 实例化转置卷积对象
    trans_conv = nn.ConvTranspose2d(in_channels=1, out_channels=1,
                                    kernel_size=3, stride=1, bias=False)
    
    # 定义标准卷积核（注意是标准卷积核，而不是转置卷积核）
    trans_conv.load_state_dict({"weight": torch.as_tensor([[1, 0, 1],
                                                           [0, 1, 1],
                                                           [1, 0, 0]], dtype=torch.float32).reshape([1, 1, 3, 3])})
    
    print(trans_conv.weight)
    
	# 执行转置卷积操作
    output = trans_conv(feature_map)
    print(output)


def transposed_conv_self():
    # 新的输入特征图
    feature_map = torch.as_tensor([[0, 0, 0, 0, 0, 0],
                                   [0, 0, 0, 0, 0, 0],
                                   [0, 0, 1, 0, 0, 0],
                                   [0, 0, 2, 1, 0, 0],
                                   [0, 0, 0, 0, 0, 0],
                                   [0, 0, 0, 0, 0, 0]], dtype=torch.float32).reshape([1, 1, 6, 6])
    print(feature_map)
    
    # 实例化标准卷积
    conv = nn.Conv2d(in_channels=1, out_channels=1,
                     kernel_size=3, stride=1, bias=False)
    
    # 由标准卷积核进行上下、左右翻转，得到转置卷积核
    conv.load_state_dict({"weight": torch.as_tensor([[0, 0, 1],
                                                     [1, 1, 0],
                                                     [1, 0, 1]], dtype=torch.float32).reshape([1, 1, 3, 3])})
    print(conv.weight)
    
    # 执行标准卷积操作
    output = conv(feature_map)
    print(output)


def main():
    transposed_conv_official()
    print("---------------")
    transposed_conv_self()


if __name__ == '__main__':
    main()

输出结果

tensor([[[[1., 0.],
          [2., 1.]]]])
Parameter containing:
tensor([[[[1., 0., 1.],
          [0., 1., 1.],
          [1., 0., 0.]]]], requires_grad=True)
tensor([[[[1., 0., 1., 0.],
          [2., 2., 3., 1.],
          [1., 2., 3., 1.],
          [2., 1., 0., 0.]]]], grad_fn=<SlowConvTranspose2DBackward>)
---------------
tensor([[[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 1., 0., 0., 0.],
          [0., 0., 2., 1., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]]])
Parameter containing:
tensor([[[[0., 0., 1.],
          [1., 1., 0.],
          [1., 0., 1.]]]], requires_grad=True)
tensor([[[[1., 0., 1., 0.],
          [2., 2., 3., 1.],
          [1., 2., 3., 1.],
          [2., 1., 0., 0.]]]], grad_fn=<ThnnConv2DBackward>)

Process finished with exit code 0

3.3 代码分析

输入特征图 $M$ ： $H_{in}=2$ 。

标准卷积核 $K$ ： $k = 3, s = 1, p = 0$ 。

新的输入特征图 $M^{\prime}$ ： $H_{in}^{\prime}=2+(2−1)∗(1−1)=2$ 。

转置卷积核 $K^{\prime}$ ： $k^{\prime}=k,s^{\prime}=1,p^{\prime}=3−0−1=2$ 。

转置卷积计算的最终结果： $\mathrm{H_{out}}=(2-1)*1-2*0+3=4$ 。

4. DCGAN

论文：Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Deep Convolutional Generative Adversarial Network

4.1 代码示例

在DCGANS网络的生成器G使用 tf.keras.layers.Conv2DTranspose（上采样）层来从种子（随机噪声）中生成图像。以一个使用该种子作为输入的 Dense 层开始，然后多次上采样，直至达到所需的 28x28x1 的图像大小。

def make_generator_model():
    model = tf.keras.Sequential()  #创建模型实例
    # 第一层须指定维度 #BATCH_SIZE无限制
    model.add(layers.Dense(7*7*BATCH_SIZE, use_bias=False, input_shape=(100,)))  #Desne第一层可以理解为全连接层输入，它的秩必须小于2
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Reshape((7, 7, 256)))
    assert model.output_shape == (None, 7, 7, 256)  # Note: None is the batch size

    # 转化为7*7*128
    model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
    assert model.output_shape == (None, 7, 7, 128)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
	
    # 转化为14*14*64
    model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
    assert model.output_shape == (None, 14, 14, 64)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
	
    #转化为28*28*1
    model.add(layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
    assert model.output_shape == (None, 28, 28, 1)

    return model

4.2 代码分析

step1：`77256 ——> 77128`

对 layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False) 进行分析：

已知条件：

# 7*7*256 ——> 7*7*128
(in_height, in_width)=(7, 7)
(filter_height, filter_width)=(5, 5)
(strides[1], strides[2])=(1, 1)

根据TensorFlow的padding填充算法，可知：

in_height % strides[1] = 7%1 = 0
pad_along_height = max(filter_height - stride_height, 0)=max(5-1, 0)=4

pad_top = pad_along_height // 2 = 4 // 2 = 2
pad_bottom = pad_along_height - pad_top = 4-2 = 2

则求得Height方向的padding：

(pad_top, pad_bottom)=(2, 2)

同理，求得Width方向的padding为：

(pad_left, pad_right)=(2, 2)

由于 $s = 1$ ，则 $(\mathrm{H}_{out}+2p-k)\%s=0$ 等式成立，将已知条件代入 $公式 (10)$ ，可得：
$H\_out = (7-1)*1-(2+2)+5=7$
同理，求得： $W\_out = (7-1)*1-(2+2)+5=7$ 。

综上所述，输出尺寸为 $(7, 7, 128)$ ，与代码验证的结果一致。

step2：`77128 ——> 141464`

对 layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False) 进行分析：

已知条件：

# 7*7*128 ——> 14*14*64
(in_height, in_width)=(7, 7)
(filter_height, filter_width)=(5, 5)
(strides[1], strides[2])=(2, 2)

根据TensorFlow的padding填充算法，可知：

in_height % strides[1] = 7%2 = 1
pad_along_height = max(filter_height - (in_height % stride_height), 0)=max(5-7%2, 0)=4

pad_top = pad_along_height // 2 = 4 // 2 = 2
pad_bottom = pad_along_height - pad_top = 4-2 = 2

则求得Height方向的padding：

(pad_top, pad_bottom)=(2, 2)

同理，求得Width方向的padding为：

(pad_left, pad_right)=(2, 2)

经过转置卷积操作，输出尺寸翻倍，即 ${H}_{out}=14$ ，则：
$(\mathrm{H}_{out}+2p-k)\%s=(14+(2+2)-5)\%2=1$
那么，将已知条件代入 $公式 (11)$ ，可得：
$H\_out = (7-1)*2-(2+2)+5+(14+(2+2)-5)\%2=14$
同理，求得： $W\_out = (7-1)*2-(2+2)+5+(14+(2+2)-5)\%2=14$

综上所述，输出尺寸为 $(14, 14, 128)$ ，与代码验证的结果一致。

step3：`141464 ——> 28281`

对 layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh') 进行分析：

已知条件：

# 14*14*64 ——> 28*28*1
(in_height, in_width)=(14, 14)
(filter_height, filter_width)=(5, 5)
(strides[1], strides[2])=(2, 2)

根据TensorFlow的padding填充算法，可知：

in_height % strides[1] = 14%2 = 0
pad_along_height = max(filter_height - stride_height, 0)=max(5-2, 0)=3

pad_top = pad_along_height // 2 = 3 // 2 = 1
pad_bottom = pad_along_height - pad_top = 3-1 = 2

则求得Height方向的padding：

(pad_top, pad_bottom)=(1, 2)

同理，求得Width方向的padding为：

(pad_left, pad_right)=(1, 2)

经过转置卷积操作，输出尺寸翻倍，即 ${H}_{out}=28$ ，则：
$(\mathrm{H}_{out}+2p-k)\%s=(28+(1+2)-5)\%2=0$
那么，将已知条件代入 $公式 (10)$ ，可得：
$H\_out = (14-1)*2-(1+2)+5=28$
同理，求得： $H\_out = (14-1)*2-(1+2)+5=28$