CBAM
Abstract
Contribution
1.提出了CBAM,可以广泛应用于提高CNN的表示能力;
2.利用大量的消融实验验证了CBAM的有效性;
3.证实了加入CBAM之后,在多个基准数据集上 (ImageNet-1K, MS COCO, and VOC 2007) ,各种网络的性能都得到了极大地提升。
Convolutional Block Attention Module
中间特征图:
F
∈
R
C
×
H
×
W
F\in R^{C\times H\times W}
F∈RC×H×W
1D通道注意力图:
M
c
∈
R
C
×
1
×
1
M_c\in R^{C\times 1\times 1}
Mc∈RC×1×1
2D空间注意力图:
M
s
∈
R
1
×
H
×
W
M_s\in R^{1\times H\times W}
Ms∈R1×H×W
整体注意力过程:
F
′
=
M
c
(
F
)
⨂
F
,
F' = M_c(F)\bigotimes F,
F′=Mc(F)⨂F,
F
′
′
=
M
s
(
F
′
)
⨂
F
′
,
F'' = M_s(F')\bigotimes F',
F′′=Ms(F′)⨂F′,
⨂
\bigotimes
⨂表示对应元素相乘,在乘法过程中,注意值会相应地广播(复制):通道注意值沿空间维度广播的,反之亦然。 如下图Fig.1所示:
Channel attention module
两个不同的上下文空间描述符:
F
a
v
g
c
∈
R
C
×
1
×
1
F_{avg}^c\in R^{C\times 1\times 1}
Favgc∈RC×1×1:平均池化特征;
F
m
a
x
c
∈
R
C
×
1
×
1
F_{max}^c\in R^{C\times 1\times 1}
Fmaxc∈RC×1×1:最大池化特征
共享MLP隐藏层的激活大小:
R
C
/
r
×
1
×
1
R^{C/r\times 1\times1}
RC/r×1×1,
r
r
r缩小率
通道注意力图:
M
c
∈
R
C
×
1
×
1
M_c\in R^{C\times 1\times 1}
Mc∈RC×1×1
M
c
(
F
)
=
σ
(
M
L
P
(
A
v
g
P
o
o
l
(
F
)
)
+
M
L
P
(
M
a
x
P
o
o
l
(
F
)
)
)
=
σ
(
W
1
(
W
0
(
F
a
v
g
c
)
)
+
W
1
(
W
0
(
F
m
a
x
c
)
)
)
M_c(F) = \sigma (MLP(AvgPool(F))+MLP(MaxPool(F)))\\ =\sigma(W_1(W_0(F_{avg}^c))+W_1(W_0(F_{max}^c)))
Mc(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))=σ(W1(W0(Favgc))+W1(W0(Fmaxc)))
σ
\sigma
σ表示sigmoid函数,
W
0
∈
R
C
/
r
×
C
W_0\in R^{C/r\times C}
W0∈RC/r×C,
W
1
∈
R
C
×
C
/
r
W_1\in R^{C\times C/r}
W1∈RC×C/r,
W
0
,
W
1
W_0,W_1
W0,W1是MLP的权重,被两个输入所共享,ReLU激活功能后跟
W
0
W_0
W0。
Spatial attention module
空间注意力图:
M
s
∈
R
1
×
H
×
W
M_s\in R^{1\times H\times W}
Ms∈R1×H×W
F
a
v
g
c
∈
R
1
×
H
×
W
F_{avg}^c\in R^{1\times H\times W}
Favgc∈R1×H×W:平均池化特征;
F
m
a
x
c
∈
R
1
×
H
×
W
F_{max}^c\in R^{1\times H\times W}
Fmaxc∈R1×H×W:最大池化特征
M
s
(
F
)
=
σ
(
f
7
×
7
[
A
v
g
P
o
o
l
(
F
)
;
M
a
x
P
o
o
l
(
F
)
]
)
=
σ
(
f
7
×
7
(
[
F
a
v
g
s
;
F
m
a
x
s
]
)
)
M_s(F) = \sigma(f^{7\times 7}{[AvgPool(F);MaxPool(F)]})\\ = \sigma(f^{7\times 7}([F_{avg}^s;F_{max}^s]))
Ms(F)=σ(f7×7[AvgPool(F);MaxPool(F)])=σ(f7×7([Favgs;Fmaxs]))
σ
\sigma
σ表示sigmoid函数,
f
7
×
7
f^{7\times 7}
f7×7表示卷积核大小为7x7的卷积操作。
Arrangement of attention modules
sequential比parallel提供更好的结果。 对于sequential的安排,实验结果表明,channel-first比spatial-first略好。
Experiment
将CBAM集成入ResNet中的ResBlock模块示意图:
Ablation studies
- 在通道注意力模块中,同时引入最大池化和平均池化可以得到最好的效果。
- 在空间注意力模块中,同时引入最大池化和平均池化比利用一个 1×1 的卷积要好,同时,卷积层采用 7×7 的卷积核要优于 3×3 的卷积核。
- 通道注意力和空间注意力的组合方式:顺序组合且channel-first最好。
Image Classification on ImageNet-1K
将 CBAM 集成到所有的 ResNet 系列网络中去,都会降低最终的分类错误率,展示了 CBAM 的通用性和巨大的潜力。
Network Visualization with Grad-CAM
利用 Grad-CAM 对不同的网络进行可视化后,可以发现,引入 CBAM 后,特征覆盖到了待识别物体的更多部位,并且最终判别物体的概率也更高,这表明注意力机制的确让网络学会了关注重点信息。
MS COCO Object Detection、VOC 2007 Object Detection
在物体检测领域,引入 CBAM 后模型性能的提升效果同样非常明显。
keras实现的模型代码:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, Sequential
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Dense, Flatten,Reshape, Dropout, BatchNormalization, Activation, GlobalAveragePooling2D
from tensorflow.keras.layers import GlobalMaxPool2D, Concatenate
# 继承Layer,建立resnet50 101 152卷积层模块
def conv_block(inputs, filter_num, reduction_ratio, stride=1, name=None):
x = inputs
x = Conv2D(filter_num[0], (1,1), strides=stride, padding='same', name=name+'_conv1')(x)
x = BatchNormalization(axis=3, name=name+'_bn1')(x)
x = Activation('relu', name=name+'_relu1')(x)
x = Conv2D(filter_num[1], (3,3), strides=1, padding='same', name=name+'_conv2')(x)
x = BatchNormalization(axis=3, name=name+'_bn2')(x)
x = Activation('relu', name=name+'_relu2')(x)
x = Conv2D(filter_num[2], (1,1), strides=1, padding='same', name=name+'_conv3')(x)
x = BatchNormalization(axis=3, name=name+'_bn3')(x)
# Channel Attention
avgpool = GlobalAveragePooling2D(name=name+'_channel_avgpool')(x)
maxpool = GlobalMaxPool2D(name=name+'_channel_maxpool')(x)
# Shared MLP
Dense_layer1 = Dense(filter_num[2]//reduction_ratio, activation='relu', name=name+'_channel_fc1')
Dense_layer2 = Dense(filter_num[2], activation='relu', name=name+'_channel_fc2')
avg_out = Dense_layer2(Dense_layer1(avgpool))
max_out = Dense_layer2(Dense_layer1(maxpool))
channel = layers.add([avg_out, max_out])
channel = Activation('sigmoid', name=name+'_channel_sigmoid')(channel)
channel = Reshape((1,1,filter_num[2]), name=name+'_channel_reshape')(channel)
channel_out = tf.multiply(x, channel)
# Spatial Attention
avgpool = tf.reduce_mean(channel_out, axis=3, keepdims=True, name=name+'_spatial_avgpool')
maxpool = tf.reduce_max(channel_out, axis=3, keepdims=True, name=name+'_spatial_maxpool')
spatial = Concatenate(axis=3)([avgpool, maxpool])
spatial = Conv2D(1, (7,7), strides=1, padding='same',name=name+'_spatial_conv2d')(spatial)
spatial_out = Activation('sigmoid', name=name+'_spatial_sigmoid')(spatial)
CBAM_out = tf.multiply(channel_out, spatial_out)
# residual connection
r = Conv2D(filter_num[2], (1,1), strides=stride, padding='same', name=name+'_residual')(inputs)
x = layers.add([CBAM_out, r])
x = Activation('relu', name=name+'_relu3')(x)
return x
def build_block (x, filter_num, blocks, reduction_ratio=16, stride=1, name=None):
x = conv_block(x, filter_num, reduction_ratio, stride, name=name)
for i in range(1, blocks):
x = conv_block(x, filter_num, reduction_ratio, stride=1, name=name+'_block'+str(i))
return x
# 创建resnet50 101 152
def SE_ResNet(Netname, nb_classes):
ResNet_Config = {'ResNet50':[3,4,6,3],
'ResNet101':[3,4,23,3],
'ResNet152':[3,8,36,3]}
layers_dims=ResNet_Config[Netname]
filter_block1=[64, 64, 256]
filter_block2=[128,128,512]
filter_block3=[256,256,1024]
filter_block4=[512,512,2048]
# Reduction ratio in four blocks
SE_reduction=[16,16,16,16]
img_input = Input(shape=(224,224,3))
# stem block
x = Conv2D(64, (7,7), strides=(2,2),padding='same', name='stem_conv')(img_input)
x = BatchNormalization(axis=3, name='stem_bn')(x)
x = Activation('relu', name='stem_relu')(x)
x = MaxPooling2D((3,3), strides=(2,2), padding='same', name='stem_pool')(x)
# convolution block
x = build_block(x, filter_block1, layers_dims[0], SE_reduction[0], name='conv1')
x = build_block(x, filter_block2, layers_dims[1], SE_reduction[1], stride=2, name='conv2')
x = build_block(x, filter_block3, layers_dims[2], SE_reduction[2], stride=2, name='conv3')
x = build_block(x, filter_block4, layers_dims[3], SE_reduction[3], stride=2, name='conv4')
# top layer
x = GlobalAveragePooling2D(name='top_layer_pool')(x)
x = Dense(nb_classes, activation='softmax', name='fc')(x)
model = models.Model(img_input, x, name=Netname)
return model
if __name__=='__main__':
model = SE_ResNet('ResNet50', 1000)
model.summary()
参考:https://blog.csdn.net/Forrest97/article/details/106708658