卷积与转置卷积——pytorch

Andy Dennis

已于 2023-03-10 13:59:27 修改

阅读量721

点赞数

分类专栏：人工智能文章标签： pytorch python 深度学习

于 2021-11-03 15:58:10 首次发布

本文链接：https://blog.csdn.net/weixin_43850253/article/details/121121929

版权

人工智能专栏收录该内容

47 篇文章

订阅专栏

前言

在学习dcgan的时候，发现有段代码用到了转置卷积，看着他们维度计算部分有点迷糊，决定回忆一下。本来想着既然有人写了，那就不写了吧，后来想着反正以后搞cv，应该大概率需要到，还是复习一下吧。

书写本文时参考了
卷积输出尺寸和转置卷积输出尺寸的计算方式
 卷积之后维度的计算

卷积

import torch
import torch.nn as nn

downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1)
input = torch.randn((1, 16, 13, 13))
h = downsample(input)
print('h.size: ', h.size())
# h.size:  torch.Size([1, 16, 7, 7])

参数：
参考 pytorch 1.10 document nn.Con2d

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1,
 padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros,
 device=None, dtype=None)

在这里插入图片描述

更特殊的，如果是后两个维度一样，即 $H_{in}=W_{in}$ 时，
$H_{out} = \frac{H_{in}-k+2\times p}{s}+1$
即输出的维度为 $N, C_{out}, H_{out}, H_{out})$ , 其中， $H_{in}$ 代表输出的图片输入宽或高(此时宽和高相等), $k$ 代表kernel_size即filter的大小, $p$ 代表padding的大小， $s$ 代表stride的长度。

转置卷积

import torch
import torch.nn as nn

upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1)
h = torch.randn((1, 16, 7, 7))
output = upsample(h)
print('output.size(): ', output.size())
# output.size():  torch.Size([1, 16, 13, 13])

torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1,
padding=0, output_padding=0, groups=1, bias=True, dilation=1,
padding_mode='zeros', device=None, dtype=None)

在这里插入图片描述

$H_{out}=(H_{in}−1)×stride[0]−2×padding[0]+dilation[0]\\ ×(kernel\_size[0]−1)+output\_padding[0]+1$
$W_{out}=(W_{in}−1)×stride[1]−2×padding[1]+dilation[1]\\× (kernel\_size[1]−1)+output\_padding[1]+1$

更特殊的，如果是后两个维度一样，即 $H_{in}=W_{in}$ 时，
$\times s - 2 \times p~+~d \times(k-1) + op + 1$
即输出的维度为 $N, C_{out}, out, out)$ , 其中， $H$ 代表输出的图片输入宽或高(此时宽和高相等), $k$ 代表kernel_size即filter的大小, $p$ 代表padding的大小， $s$ 代表stride的长度, $o p$ 表示output_padding的大小, d(即dilation) controls the spacing between the kernel points。

小例子

例子1

通过公式手工计算输入输出的大小

import torch
import torch.nn as nn


# 仅作演示
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv = nn.Conv2d(3, 16, kernel_size=8, stride=4)
    
    def forward(self, x):
        return self.conv(x)


if __name__ == '__main__':
    # 模拟高为84, 宽为84的彩色图像
    # (batch_size, channel, height, width)
    x = torch.randn(1, 3, 84, 84)

    net = Net()
    y = net(x)
    print(y.shape) # torch.Size([1, 16, 20, 20])

我们定义一个函数

def cal_ouput_feature_map(width, kernel_size, stride, padding=0):
	return (width - kernel_size + 2 * padding) / stride + 1

上述例子我们知道图像宽度(计算高度时代入width位置即可)width为84, 卷积核kernel_size为8，步长stride为4, padding未定义默认为0。于是y的shape可以调用cal_ouput_feature_map函数算出来.
(如果不整除，是向下取整

 y_w = cal_ouput_feature_map(84, 8, 4)
 print(y_w) # 20.0

例子2

例子来源：卷积输出尺寸和转置卷积输出尺寸的计算方式

import torch
import torch.nn as nn


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 3, 3, padding=1) #in, out, kernel
        self.conv2 = nn.Conv2d(3, 3, 3, padding=1)
        self.maxpooling = nn.MaxPool2d(2,2)
        self.trans_conv = nn.ConvTranspose2d(3, 32, 3, stride=2, padding=1)

    def forward(self, x):
        x = self.conv1(x)
        print("after conv1: ", x.size()) # [1, 3, 12, 12]
        x = self.conv2(x)
        print("after conv2: ", x.size()) # [1, 3, 12, 12]
        x = self.maxpooling(x)
        print("after maxpooling: ", x.size()) # [1, 3, 6, 6]
        x = self.trans_conv(x)
        print("after trans_conv: ", x.size()) # [1, 32, 11, 11]
        return x


model = Net()
x = torch.randn(1, 3, 12, 12)
print("input: ", x.size()) # [1, 3, 12, 12]
out = model(x)
print(out.size())
# torch.Size([1, 16, 11, 11])

这里也附带上计算转职卷积feature_map大小的函数

def cal_feature_map_transposed2d(width, kernel_size, stride, padding=0):
	dilation = 1
	return (width - 1) * stride - 2 * padding + dilation * (kernel_size - 1) + 1