前言
在学习dcgan的时候,发现有段代码用到了转置卷积,看着他们维度计算部分有点迷糊,决定回忆一下。本来想着既然有人写了,那就不写了吧,后来想着反正以后搞cv,应该大概率需要到,还是复习一下吧。
书写本文时参考了
卷积输出尺寸和转置卷积输出尺寸的计算方式
卷积之后维度的计算
卷积
import torch
import torch.nn as nn
downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1)
input = torch.randn((1, 16, 13, 13))
h = downsample(input)
print('h.size: ', h.size())
# h.size: torch.Size([1, 16, 7, 7])
参数:
参考 pytorch 1.10 document nn.Con2d
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1,
padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros,
device=None, dtype=None)
更特殊的,如果是后两个维度一样,即
H
i
n
=
W
i
n
H_{in}=W_{in}
Hin=Win时,
H
o
u
t
=
H
i
n
−
k
+
2
×
p
s
+
1
H_{out} = \frac{H_{in}-k+2\times p}{s}+1
Hout=sHin−k+2×p+1
即输出的维度为
(
N
,
C
o
u
t
,
H
o
u
t
,
H
o
u
t
)
(N, C_{out}, H_{out}, H_{out})
(N,Cout,Hout,Hout), 其中,
H
i
n
H_{in}
Hin代表输出的图片输入宽或高(此时宽和高相等),
k
k
k 代表kernel_size即filter的大小,
p
p
p 代表padding的大小,
s
s
s 代表stride的长度。
转置卷积
import torch
import torch.nn as nn
upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1)
h = torch.randn((1, 16, 7, 7))
output = upsample(h)
print('output.size(): ', output.size())
# output.size(): torch.Size([1, 16, 13, 13])
torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1,
padding=0, output_padding=0, groups=1, bias=True, dilation=1,
padding_mode='zeros', device=None, dtype=None)
H
o
u
t
=
(
H
i
n
−
1
)
×
s
t
r
i
d
e
[
0
]
−
2
×
p
a
d
d
i
n
g
[
0
]
+
d
i
l
a
t
i
o
n
[
0
]
×
(
k
e
r
n
e
l
_
s
i
z
e
[
0
]
−
1
)
+
o
u
t
p
u
t
_
p
a
d
d
i
n
g
[
0
]
+
1
H_{out}=(H_{in}−1)×stride[0]−2×padding[0]+dilation[0]\\ ×(kernel\_size[0]−1)+output\_padding[0]+1
Hout=(Hin−1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1
W
o
u
t
=
(
W
i
n
−
1
)
×
s
t
r
i
d
e
[
1
]
−
2
×
p
a
d
d
i
n
g
[
1
]
+
d
i
l
a
t
i
o
n
[
1
]
×
(
k
e
r
n
e
l
_
s
i
z
e
[
1
]
−
1
)
+
o
u
t
p
u
t
_
p
a
d
d
i
n
g
[
1
]
+
1
W_{out}=(W_{in}−1)×stride[1]−2×padding[1]+dilation[1]\\× (kernel\_size[1]−1)+output\_padding[1]+1
Wout=(Win−1)×stride[1]−2×padding[1]+dilation[1]×(kernel_size[1]−1)+output_padding[1]+1
更特殊的,如果是后两个维度一样,即
H
i
n
=
W
i
n
H_{in}=W_{in}
Hin=Win时,
o
u
t
=
(
H
−
1
)
×
s
−
2
×
p
+
d
×
(
k
−
1
)
+
o
p
+
1
out = (H-1) \times s - 2 \times p~+~d \times(k-1) + op + 1
out=(H−1)×s−2×p + d×(k−1)+op+1
即输出的维度为
(
N
,
C
o
u
t
,
o
u
t
,
o
u
t
)
(N, C_{out}, out, out)
(N,Cout,out,out), 其中,
H
H
H代表输出的图片输入宽或高(此时宽和高相等),
k
k
k 代表kernel_size即filter的大小,
p
p
p 代表padding的大小,
s
s
s 代表stride的长度,
o
p
op
op 表示output_padding的大小, d(即dilation) controls the spacing between the kernel points。
小例子
例子1
通过公式手工计算输入输出的大小
import torch
import torch.nn as nn
# 仅作演示
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv = nn.Conv2d(3, 16, kernel_size=8, stride=4)
def forward(self, x):
return self.conv(x)
if __name__ == '__main__':
# 模拟高为84, 宽为84的彩色图像
# (batch_size, channel, height, width)
x = torch.randn(1, 3, 84, 84)
net = Net()
y = net(x)
print(y.shape) # torch.Size([1, 16, 20, 20])
我们定义一个函数
def cal_ouput_feature_map(width, kernel_size, stride, padding=0):
return (width - kernel_size + 2 * padding) / stride + 1
上述例子我们知道图像宽度(计算高度时代入width位置即可)width为84, 卷积核kernel_size为8, 步长stride为4, padding未定义默认为0。于是y的shape可以调用cal_ouput_feature_map函数算出来.
(如果不整除,是向下取整
y_w = cal_ouput_feature_map(84, 8, 4)
print(y_w) # 20.0
例子2
例子来源:卷积输出尺寸和转置卷积输出尺寸的计算方式
import torch
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 3, 3, padding=1) #in, out, kernel
self.conv2 = nn.Conv2d(3, 3, 3, padding=1)
self.maxpooling = nn.MaxPool2d(2,2)
self.trans_conv = nn.ConvTranspose2d(3, 32, 3, stride=2, padding=1)
def forward(self, x):
x = self.conv1(x)
print("after conv1: ", x.size()) # [1, 3, 12, 12]
x = self.conv2(x)
print("after conv2: ", x.size()) # [1, 3, 12, 12]
x = self.maxpooling(x)
print("after maxpooling: ", x.size()) # [1, 3, 6, 6]
x = self.trans_conv(x)
print("after trans_conv: ", x.size()) # [1, 32, 11, 11]
return x
model = Net()
x = torch.randn(1, 3, 12, 12)
print("input: ", x.size()) # [1, 3, 12, 12]
out = model(x)
print(out.size())
# torch.Size([1, 16, 11, 11])
这里也附带上计算转职卷积feature_map大小的函数
def cal_feature_map_transposed2d(width, kernel_size, stride, padding=0):
dilation = 1
return (width - 1) * stride - 2 * padding + dilation * (kernel_size - 1) + 1
另外超级喜欢One Dark Pro和Bracket Pair Colorizer两个vscode插件的高亮效果。