PyTorch ConvTranspose2d 的定义与计算过程

Yongqiang Cheng

已于 2022-12-13 09:01:43 修改

阅读量3.2k

点赞数 4

分类专栏： 20240202 文章标签： PyTorch ConvTranspose2d 定义与计算过程

于 2022-12-06 00:26:02 首次发布

世上没有白读的书，每一页都算数。

本文链接：https://blog.csdn.net/chengyq116/article/details/128194758

版权

20240202 专栏收录该内容

220 篇文章

订阅专栏

PyTorch ConvTranspose2d 的定义与计算过程

1. ConvTranspose2d

https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html

torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', device=None, dtype=None)

Applies a 2D transposed convolution operator over an input image composed of several input planes.
二维转置卷积运算符。

This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation as it does not compute a true inverse of convolution).
该模块可以看作是 Conv2d 相对于其输入的梯度。它也被称为分数步长卷积或反卷积/逆卷积 (尽管它不是实际的反卷积/逆卷积操作，因为它不是卷积的逆向计算)。

gradient ['ɡreɪdiənt]：n. 倾斜度，梯度变化曲线 adj. 倾斜的，步行的
fractionally [ˈfrækʃənəli]：adv. 很小，很少
cross-correlation：互相关

the visualizations: https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
the paper: https://www.matthewzeiler.com/mattzeiler/deconvolutionalnetworks.pdf

This module supports TensorFloat32.

On certain ROCm devices, when using float16 inputs this module will use different precision for backward.

stride controls the stride for the cross-correlation.
步幅。
padding controls the amount of implicit zero padding on both sides for dilation * (kernel_size - 1) - padding number of points. See note below for details.
output_padding controls the additional size added to one side of the output shape. See note below for details.
dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but the link here has a nice visualization of what dilation does.

{groups_note}

The parameters kernel_size, stride, padding, output_padding can either be:

a single int – in which case the same value is used for the height and width dimensions
a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension

Note:
The :attr:padding argument effectively adds dilation * (kernel_size - 1) - padding
amount of zero padding to both sizes of the input. This is set so that
when a :class:~torch.nn.Conv2d and a :class:~torch.nn.ConvTranspose2d
are initialized with same parameters, they are inverses of each other in
regard to the input and output shapes. However, when stride > 1,
:class:~torch.nn.Conv2d maps multiple input shapes to the same output
shape. :attr:output_padding is provided to resolve this ambiguity by
effectively increasing the calculated output shape on one side. Note
that :attr:output_padding is only used to find output shape, but does
not actually add zero-padding to output.

Note:
{cudnn_reproducibility_note}

Parameters

in_channels (int): Number of channels in the input image
输入的通道数。
out_channels (int): Number of channels produced by the convolution
输出的通道数。
kernel_size (int or tuple): Size of the convolving kernel
卷积核的大小。
stride (int or tuple, optional): Stride of the convolution. Default: 1
卷积的步幅。
padding (int or tuple, optional): dilation * (kernel_size - 1) - padding zero-padding will be added to both sides of each dimension in the input. Default: 0
零填充将添加到输入中每个维度的两侧。
output_padding (int or tuple, optional): Additional size added to one side of each dimension in the output shape. Default: 0
在输出形状的每个维度的一侧添加的附加大小。
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
从输入通道到输出通道的阻塞连接数。
bias (bool, optional): If True, adds a learnable bias to the output. Default: True
偏置。
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
内核元素之间的间距。

Shape

Input: $N, C_{in}, H_{in}, W_{in})$ or $C_{in}, H_{in}, W_{in})$
Output: $N, C_{out}, H_{out}, W_{out})$ or $C_{out}, H_{out}, W_{out})$ , where

$kernel_size [ 0 ] − 1 ) + output_padding [ 0 ] + 1 H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) + \text{output\_padding}[0] + 1$

$kernel_size [ 1 ] − 1 ) + output_padding [ 1 ] + 1 W_{out} = (W_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) + \text{output\_padding}[1] + 1$

Variables

weight (Tensor): the learnable weights of the module of shape
$in_channels , out_channels groups , kernel_size[0] , kernel_size[1] ) (\text{in\_channels}, \frac{\text{out\_channels}}{\text{groups}}, \text{kernel\_size[0]}, \text{kernel\_size[1]})$ . The values of these weights are sampled from $\mathcal{U}(-\sqrt{k}, \sqrt{k})$ where $kernel_size [ i ] k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}$
bias (Tensor): the learnable bias of the module of shape $out_channels ) (\text{out\_channels})$ . If bias is True, then the values of these weights are sampled from $\mathcal{U}(-\sqrt{k}, \sqrt{k})$ where $kernel_size [ i ] k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}$

Examples::

    >>> # With square kernels and equal stride
    >>> m = nn.ConvTranspose2d(16, 33, 3, stride=2)
    >>> # non-square kernels and unequal stride and with padding
    >>> m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
    >>> input = torch.randn(20, 16, 50, 100)
    >>> output = m(input)
    >>> # exact output size can be also specified as an argument
    >>> input = torch.randn(1, 16, 12, 12)
    >>> downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1)
    >>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1)
    >>> h = downsample(input)
    >>> h.size()
    torch.Size([1, 16, 6, 6])
    >>> output = upsample(h, output_size=input.size())
    >>> output.size()
    torch.Size([1, 16, 12, 12])

2. Transposed Convolution

https://d2l.ai/chapter_computer-vision/transposed-conv.html
https://zh.d2l.ai/chapter_computer-vision/transposed-conv.html

The CNN layers we have seen so far, such as convolutional layers and pooling layers , typically reduce (downsample) the spatial dimensions (height and width) of the input, or keep them unchanged. In semantic segmentation that classifies at pixel-level, it will be convenient if the spatial dimensions of the input and output are the same. For example, the channel dimension at one output pixel can hold the classification results for the input pixel at the same spatial position.
到目前为止，我们所见到的卷积神经网络层，例如卷积层和汇聚层，通常会减少 (下采样) 输入图像的空间维度 (高和宽)。然而如果输入和输出图像的空间维度相同，在以像素级分类的语义分割中将会很方便。例如，输出像素所处的通道维度可以保有输入像素在同一位置上的分类结果。

To achieve this, especially after the spatial dimensions are reduced by CNN layers, we can use another type of CNN layers that can increase (upsample) the spatial dimensions of intermediate feature maps. In this section, we will introduce transposed convolution, which is also called fractionally-strided convolution, for reversing downsampling operations by the convolution.
为了实现这一点，尤其是在空间维度被卷积神经网络层缩小后，我们可以使用另一种类型的卷积神经网络层，它可以增加 (上采样) 中间层特征图的空间维度。在本节中，我们将介绍转置卷积 (transposed convolution)，用于逆转下采样导致的空间尺寸减小。

2.1 Basic Operation

Ignoring channels for now, let’s begin with the basic transposed convolution operation with stride of 1 and no padding. Suppose that we are given a $n_h \times n_w$ input tensor and a $k_h \times k_w$ kernel. Sliding the kernel window with stride of 1 for $n_w$ times in each row and $n_h$ times in each column yields a total of $n_h n_w$ intermediate results. Each intermediate result is a $(n_h + k_h - 1) \times (n_w + k_w - 1)$ tensor that are initialized as zeros. To compute each intermediate tensor, each element in the input tensor is multiplied by the kernel so that the resulting $k_h \times k_w$ tensor replaces a portion in each intermediate tensor. Note that the position of the replaced portion in each intermediate tensor corresponds to the position of the element in the input tensor used for the computation. In the end, all the intermediate results are summed over to produce the output.
让我们暂时忽略通道，从基本的转置卷积开始，设步幅为 1 且没有填充。假设我们有一个 $n_h \times n_w$ 的输入张量和一个 $k_h \times k_w$ 的卷积核。以步幅为 1 滑动卷积核窗口，每行 $n_w$ 次，每列 $n_h$ 次，共产生 $n_h n_w$ 个中间结果。每个中间结果都是一个 $(n_h + k_h - 1) \times (n_w + k_w - 1)$ 的张量，初始化为 0。为了计算每个中间张量，输入张量中的每个元素都要乘以卷积核，从而使所得的 $k_h \times k_w$ 张量替换中间张量的一部分。请注意，每个中间张量被替换部分的位置与输入张量中元素的位置相对应。最后，所有中间结果相加以获得最终结果。

在这里插入图片描述
Fig. 2.1 Transposed convolution with a $2\times 2$ kernel. 阴影部分是中间张量的一部分，也是用于计算的输入和卷积核张量元素。

We can implement this basic transposed convolution operation trans_conv for a input matrix X and a kernel matrix K.
我们可以对输入矩阵 X 和卷积核矩阵 K 实现基本的转置卷积运算 trans_conv。

def trans_conv(X, K):
    h, w = K.shape
    Y = torch.zeros((X.shape[0] + h - 1, X.shape[1] + w - 1))
    for i in range(X.shape[0]):
        for j in range(X.shape[1]):
            Y[i: i + h, j: j + w] += X[i, j] * K
    return Y

In contrast to the regular convolution that reduces input elements via the kernel, the transposed convolution broadcasts input elements via the kernel, thereby producing an output that is larger than the input.
与通过卷积核“减少”输入元素的常规卷积相比，转置卷积通过卷积核“广播”输入元素，从而产生大于输入的输出。

X = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
K = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
trans_conv(X, K)

tensor([[ 0.,  0.,  1.],
        [ 0.,  4.,  6.],
        [ 4., 12.,  9.]])

2.2 Padding, Strides, and Multiple Channels (填充、步幅和多通道)

Different from in the regular convolution where padding is applied to input, it is applied to output in the transposed convolution. For example, when specifying the padding number on either side of the height and width as 1, the first and last rows and columns will be removed from the transposed convolution output.
与常规卷积不同，在转置卷积中，填充被应用于的输出 (常规卷积将填充应用于输入)。例如，当将高和宽两侧的填充数指定为 1 时，转置卷积的输出中将删除第一和最后的行与列。

tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, padding=1, bias=False)
tconv.weight.data = K
tconv(X)

tensor([[[[4.]]]], grad_fn=<ConvolutionBackward0>)

In the transposed convolution, strides are specified for intermediate results (thus output), not for input. Using the same input and kernel tensors from Fig. 2.1, changing the stride from 1 to 2 increases both the height and weight of intermediate tensors, hence the output tensor in Fig. 2.2.
在转置卷积中，步幅被指定为中间结果 (输出)，而不是输入。使用图 2.1 中相同输入和卷积核张量，将步幅从 1 更改为 2 会增加中间张量的高和权重，因此输出张量在图 2.2 中。

在这里插入图片描述
Fig. 2.2 Transposed convolution with a kernel with stride of 2. 阴影部分是中间张量的一部分，也是用于计算的输入和卷积核张量元素。

The following code snippet can validate the transposed convolution output for stride of 2 in Fig. 2.2.

tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, stride=2, bias=False)
tconv.weight.data = K
tconv(X)

tensor([[[[0., 0., 0., 1.],
          [0., 0., 2., 3.],
          [0., 2., 0., 3.],
          [4., 6., 6., 9.]]]], grad_fn=<ConvolutionBackward0>)

For multiple input and output channels, the transposed convolution works in the same way as the regular convolution. Suppose that the input has $c_i$ channels, and that the transposed convolution assigns a $k_h\times k_w$ kernel tensor to each input channel. When multiple output channels are specified, we will have a $c_i\times k_h\times k_w$ kernel for each output channel.
对于多个输入和输出通道，转置卷积与常规卷积以相同方式计算。假设输入有个 $c_i$ 通道，且转置卷积为每个输入通道分配了一个 $k_h\times k_w$ 的卷积核张量。当指定多个输出通道时，每个输出通道将有一个 $c_i\times k_h\times k_w$ 的卷积核。

As in all, if we feed $\mathsf{X}$ into a convolutional layer $f$ to output $\mathsf{Y}=f(\mathsf{X})$ and create a transposed convolutional layer $g$ with the same hyperparameters as $f$ except for the number of output channels being the number of channels in $\mathsf{X}$ , then $g (Y)$ will have the same shape as $\mathsf{X}$ . This can be illustrated in the following example.
同样，如果我们将 $\mathsf{X}$ 代入卷积层 $f$ 来输出 $\mathsf{Y}=f(\mathsf{X})$ ，并创建一个与 $f$ 具有相同的超参数、但输出通道数量是 $\mathsf{X}$ 中通道数的转置卷积层 $g$ ，那么的 $g (Y)$ 形状将与 $\mathsf{X}$ 相同。

X = torch.rand(size=(1, 10, 16, 16))
conv = nn.Conv2d(10, 20, kernel_size=5, padding=2, stride=3)
tconv = nn.ConvTranspose2d(20, 10, kernel_size=5, padding=2, stride=3)
tconv(conv(X)).shape == X.shape

True

2.3 Connection to Matrix Transposition (与矩阵变换的联系)

The transposed convolution is named after the matrix transposition. To explain, let’s first see how to implement convolutions using matrix multiplications. In the example below, we define a $\times 3$ input X and a $\times 2$ convolution kernel K, and then use the corr2d function to compute the convolution output Y.
转置卷积因矩阵转置而得名，使用矩阵乘法来实现卷积。在下面的示例中，我们定义了一个 $\times 3$ 的输入 X 和 $\times 2$ 卷积核 K，然后使用 corr2d 函数计算卷积输出 Y。

X = torch.arange(9.0).reshape(3, 3)
K = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
Y = d2l.corr2d(X, K)
Y

tensor([[27., 37.],
        [57., 67.]])

Next, we rewrite the convolution kernel K as a sparse weight matrix W containing a lot of zeros. The shape of the weight matrix is (4, 9), where the non-zero elements come from the convolution kernel K.
接下来，我们将卷积核 K 重写为包含大量 0 的稀疏权重矩阵 W。权重矩阵的形状是 (4, 9)，其中非 0 元素来自卷积核 K。

def kernel2matrix(K):
    k, W = torch.zeros(5), torch.zeros((4, 9))
    k[:2], k[3:5] = K[0, :], K[1, :]
    W[0, :5], W[1, 1:6], W[2, 3:8], W[3, 4:] = k, k, k, k
    return W

W = kernel2matrix(K)
W

tensor([[1., 2., 0., 3., 4., 0., 0., 0., 0.],
        [0., 1., 2., 0., 3., 4., 0., 0., 0.],
        [0., 0., 0., 1., 2., 0., 3., 4., 0.],
        [0., 0., 0., 0., 1., 2., 0., 3., 4.]])

Concatenate the input X row by row to get a vector of length 9. Then the matrix multiplication of W and the vectorized X gives a vector of length 4. After reshaping it, we can obtain the same result Y from the original convolution operation above: we just implemented convolutions using matrix multiplications.
逐行连结输入 X，获得了一个长度为 9 的矢量。然后，W 的矩阵乘法和向量化的 X 给出了一个长度为 4 的向量。重塑它之后，可以获得与上面的原始卷积操作所得相同的结果 Y：我们刚刚使用矩阵乘法实现了卷积。

Y == torch.matmul(W, X.reshape(-1)).reshape(2, 2)

tensor([[True, True],
        [True, True]])

Likewise, we can implement transposed convolutions using matrix multiplications. In the following example, we take the $\times 2$ output Y from the above regular convolution as input to the transposed convolution. To implement this operation by multiplying matrices, we only need to transpose the weight matrix W with the new shape $(9, 4)$ .
同样，我们可以使用矩阵乘法来实现转置卷积。在下面的示例中，我们将上面的常规卷积 $\times 2$ 的输出 Y 作为转置卷积的输入。想要通过矩阵相乘来实现它，我们只需要将权重矩阵 W 的形状转置为 $(9, 4)$ 。

Z = trans_conv(Y, K)
Z == torch.matmul(W.T, Y.reshape(-1)).reshape(3, 3)

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

Consider implementing the convolution by multiplying matrices. Given an input vector $\mathbf{x}$ and a weight matrix $\mathbf{W}$ , the forward propagation function of the convolution can be implemented by multiplying its input with the weight matrix and outputting a vector $\mathbf{y}=\mathbf{W}\mathbf{x}$ . Since backpropagation follows the chain rule and $\nabla_{\mathbf{x}}\mathbf{y}=\mathbf{W}^\top$ , the backpropagation function of the convolution can be implemented by multiplying its input with the transposed weight matrix $\mathbf{W}^\top$ . Therefore, the transposed convolutional layer can just exchange the forward propagation function and the backpropagation function of the convolutional layer: its forward propagation and backpropagation functions multiply their input vector with $\mathbf{W}^\top$ and $\mathbf{W}$ , respectively.
抽象来看，给定输入向量 $\mathbf{x}$ 和权重矩阵 $\mathbf{W}$ ，卷积的前向传播函数可以通过将其输入与权重矩阵相乘并输出向量 $\mathbf{y}=\mathbf{W}\mathbf{x}$ 来实现。由于反向传播遵循链式法则和 $\nabla_{\mathbf{x}}\mathbf{y}=\mathbf{W}^\top$ ，卷积的反向传播函数可以通过将其输入与转置的权重矩阵 $\mathbf{W}^\top$ 相乘来实现。因此，转置卷积层能够交换卷积层的正向传播函数和反向传播函数：它的正向传播和反向传播函数将输入向量分别与 $\mathbf{W}^\top$ 和 $\mathbf{W}$ 相乘。

2.4 Summary

In contrast to the regular convolution that reduces input elements via the kernel, the transposed convolution broadcasts input elements via the kernel, thereby producing an output that is larger than the input.
与通过卷积核减少输入元素的常规卷积相反，转置卷积通过卷积核广播输入元素，从而产生形状大于输入的输出。

If we feed $\mathsf{X}$ into a convolutional layer $f$ to output $\mathsf{Y}=f(\mathsf{X})$ and create a transposed convolutional layer $g$ with the same hyperparameters as $f$ except for the number of output channels being the number of channels in $\mathsf{X}$ , then $g (Y)$ will have the same shape as $\mathsf{X}$ .
如果我们将 $\mathsf{X}$ 输入卷积层 $f$ 来获得输出 $\mathsf{Y}=f(\mathsf{X})$ 并创造一个与 $f$ 有相同的超参数、但输出通道数是 $\mathsf{X}$ 中通道数的转置卷积层 $g$ ，那么 $g (Y)$ 的形状将与 $\mathsf{X}$ 相同。

We can implement convolutions using matrix multiplications. The transposed convolutional layer can just exchange the forward propagation function and the backpropagation function of the convolutional layer.
我们可以使用矩阵乘法来实现卷积。转置卷积层能够交换卷积层的正向传播函数和反向传播函数。

3. transposedConv2dLayer (transposed 2-D convolution layer)

https://www.mathworks.com/help/deeplearning/ref/transposedconv2dlayer.html

3.1 Description

A transposed 2-D convolution layer upsamples two-dimensional feature maps.

This layer is sometimes incorrectly known as a “deconvolution” or “deconv” layer. This layer performs the transpose of convolution and does not perform deconvolution.