1 研究思路
尝试提出一个和架构无关的新型CNN结构来提升CNN的性能。稳重提取了非对称卷积块(ACB),使用三个并行的 d × d , 1 × d , d × 1 d\times d,1\times d, d\times 1 d×d,1×d,d×1代替原始的 d × d d\times d d×d卷积核进行特征提取。
2 结构
2.1 非对称卷积
非对称卷积基本上是相对对称卷积来说的,对称卷积一般为
3
×
3
3\times 3
3×3,非对称卷积一般为
1
×
3
1\times 3
1×3或者
3
×
1
3\times 1
3×1两类。在inceptionv3中基本上证明了垂直和水平两个方向的非对称卷积并行链接某种程度上等效于单个对称卷积且更少的参数,并且vgg证明了多个小卷积核串联等价于单个大卷积核且更少的参数。同理多个非对称卷积的串联可以拥有更广的感受野。
2.2 公式推导
对于一般的卷积,有:
O
:
,
:
,
j
=
∑
k
=
1
C
M
:
,
:
,
k
∗
F
:
,
:
,
k
(
j
)
O_{:,:,j}=\sum_{k=1}^{C}M_{:,:,k}\ast F^{(j)}_{:,:,k}
O:,:,j=k=1∑CM:,:,k∗F:,:,k(j)
- M ∈ R U × V × C M\in R^{U\times V\times C} M∈RU×V×C表示输入;
- F ∈ R H × W × C F\in R^{H\times W\times C} F∈RH×W×C表示卷积核;
- O ∈ R R × T × D O\in R^{R\times T\times D} O∈RR×T×D表示输出特征图;
- ∗ \ast ∗表示卷积操作。
对以上输出经过bn的结果为:
O
:
,
:
,
j
=
(
∑
k
=
1
C
M
:
,
:
,
k
∗
F
:
,
:
,
k
(
j
)
−
μ
j
)
γ
j
σ
j
+
β
j
O_{:,:,j}=(\sum_{k=1}^{C}M_{:,:,k}\ast F^{(j)}_{:,:,k} - \mu_j)\frac{\gamma_j}{\sigma_j}+\beta_j
O:,:,j=(k=1∑CM:,:,k∗F:,:,k(j)−μj)σjγj+βj
- μ j \mu_j μj表示均值;
- σ j \sigma_j σj表示标准差;
- γ j \gamma_j γj表示缩放系数;
- β j \beta_j βj表示偏移量。
卷积可加性:
I
∗
K
(
1
)
+
I
∗
K
(
2
)
=
I
∗
(
K
(
1
)
⊕
K
(
2
)
)
I\ast K^{(1)}+I\ast K^{(2)}=I\ast(K^{(1)}\oplus K^{(2)})
I∗K(1)+I∗K(2)=I∗(K(1)⊕K(2))
- K ( 1 ) , K ( 2 ) K^{(1)},K^{(2)} K(1),K(2)为两个兼容尺寸的2d核;
- I I I为输入矩阵;
-
⊕
\oplus
⊕为按位置求和。
2.3 code
class CropLayer(nn.Module):
# E.g., (-1, 0) means this layer should crop the first and last rows of the feature map. And (0, -1) crops the first and last columns
def __init__(self, crop_set):
super(CropLayer, self).__init__()
self.rows_to_crop = - crop_set[0]
self.cols_to_crop = - crop_set[1]
assert self.rows_to_crop >= 0
assert self.cols_to_crop >= 0
def forward(self, input):
return input[:, :, self.rows_to_crop:-self.rows_to_crop, self.cols_to_crop:-self.cols_to_crop]
class ACBlock(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, padding_mode='zeros', deploy=False):
super(ACBlock, self).__init__()
self.deploy = deploy
if deploy:
self.fused_conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=(kernel_size,kernel_size), stride=stride,
padding=padding, dilation=dilation, groups=groups, bias=True, padding_mode=padding_mode)
else:
self.square_conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
kernel_size=(kernel_size, kernel_size), stride=stride,
padding=padding, dilation=dilation, groups=groups, bias=False,
padding_mode=padding_mode)
self.square_bn = nn.BatchNorm2d(num_features=out_channels)
center_offset_from_origin_border = padding - kernel_size // 2
ver_pad_or_crop = (center_offset_from_origin_border + 1, center_offset_from_origin_border)
hor_pad_or_crop = (center_offset_from_origin_border, center_offset_from_origin_border + 1)
if center_offset_from_origin_border >= 0:
self.ver_conv_crop_layer = nn.Identity()
ver_conv_padding = ver_pad_or_crop
self.hor_conv_crop_layer = nn.Identity()
hor_conv_padding = hor_pad_or_crop
else:
self.ver_conv_crop_layer = CropLayer(crop_set=ver_pad_or_crop)
ver_conv_padding = (0, 0)
self.hor_conv_crop_layer = CropLayer(crop_set=hor_pad_or_crop)
hor_conv_padding = (0, 0)
self.ver_conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=(3, 1),
stride=stride,
padding=ver_conv_padding, dilation=dilation, groups=groups, bias=False,
padding_mode=padding_mode)
self.hor_conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=(1, 3),
stride=stride,
padding=hor_conv_padding, dilation=dilation, groups=groups, bias=False,
padding_mode=padding_mode)
self.ver_bn = nn.BatchNorm2d(num_features=out_channels)
self.hor_bn = nn.BatchNorm2d(num_features=out_channels)
def forward(self, input):
if self.deploy:
return self.fused_conv(input)
else:
square_outputs = self.square_conv(input)
square_outputs = self.square_bn(square_outputs)
# print(square_outputs.size())
# return square_outputs
vertical_outputs = self.ver_conv_crop_layer(input)
vertical_outputs = self.ver_conv(vertical_outputs)
vertical_outputs = self.ver_bn(vertical_outputs)
# print(vertical_outputs.size())
horizontal_outputs = self.hor_conv_crop_layer(input)
horizontal_outputs = self.hor_conv(horizontal_outputs)
horizontal_outputs = self.hor_bn(horizontal_outputs)
# print(horizontal_outputs.size())
return square_outputs + vertical_outputs + horizontal_outputs
3 结果
消融实验: