理论推导
整体模型框架
试验结果显示串联效果优于并联结果,channel attention在前优于spatial attention在前。
因此,最后采取的是一个channel attention module与一个spatial attention module串联的形式。
表达式如下:
F
′
=
M
c
(
F
)
⨂
F
{F}'=M_{c}(F)\bigotimes F
F′=Mc(F)⨂F
F
′
′
=
M
s
(
F
′
)
⨂
F
′
{F}''=M_{s}({F}')\bigotimes {F}'
F′′=Ms(F′)⨂F′
式中,
F
∈
R
C
×
H
×
W
F \in \mathbb{R}^{C\times H\times W }
F∈RC×H×W表示module输入特征层
M
c
∈
R
C
×
1
×
1
M_{c} \in \mathbb{R}^{C\times 1\times 1 }
Mc∈RC×1×1表示1D channel attention
M
s
∈
R
1
×
H
×
W
M_{s} \in \mathbb{R}^{1\times H\times W }
Ms∈R1×H×W表示2D spatial attention
⨂
\bigotimes
⨂表示同位元素相乘
Channel attention module
1.同时进行了global maxpool和avgpool;
2. 共享权重且具有瓶颈机制两层全连接;
3. 同位相加后采用sigmoid激活,得到输出;
因此对应
M
c
(
F
)
M_{c}(F)
Mc(F)的表达式如下:
M
c
(
F
)
=
σ
(
M
L
P
(
A
v
g
P
o
o
l
(
F
)
)
+
M
L
P
(
M
a
x
P
o
o
l
(
F
)
)
=
σ
(
W
1
(
W
0
(
F
a
v
g
c
)
)
+
W
1
(
W
0
(
F
m
a
x
c
)
)
)
\begin{aligned} M_{c}(F)&=\sigma(MLP(AvgPool(F))+MLP(MaxPool(F))\\&=\sigma(W_{1}(W_{0}(F_{avg}^{c}))+W_{1}(W_{0}(F_{max}^{c}))) \end{aligned}
Mc(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F))=σ(W1(W0(Favgc))+W1(W0(Fmaxc)))
式中,
σ
\sigma
σ表示sigmoid激活
F
a
v
g
c
F_{avg}^{c}
Favgc和
F
m
a
x
c
F_{max}^{c}
Fmaxc分别表示global avgpooling和maxpooling操作;
W
0
∈
R
C
r
×
C
W_{0} \in \mathbb{R}^{\frac{C}{r}\times C }
W0∈RrC×C和
W
1
∈
R
C
×
C
r
W_{1} \in \mathbb{R}^{C \times \frac{C}{r} }
W1∈RC×rC分别表示全连接权重
Spatial attention module
- 对输入特征层沿着channels维度分别进行global maxpool和avgpool,并进行堆叠;
- 采用7x7卷积核进行卷积操作;
- 采用sigmoid激活,得到输出;
M s ( F ) = σ ( f 7 × 7 ( [ A v g P o o l ( F ) ; M a x P o o l ( F ) ] ) ) = σ ( f 7 × 7 ( [ F a v g s ; F m a x s ] ) ) \begin{aligned} M_{s}(F) &= \sigma(f^{7\times 7}([AvgPool(F);MaxPool(F)]))\\&=\sigma(f^{7 \times 7}([F_{avg}^{s};F_{max}^{s}])) \end{aligned} Ms(F)=σ(f7×7([AvgPool(F);MaxPool(F)]))=σ(f7×7([Favgs;Fmaxs]))
式中:
σ \sigma σ表示sigmoid激活
F a v g s ∈ R 1 × H × W F_{avg}^{s} \in \mathbb{R}^{1\times H\times W } Favgs∈R1×H×W和 F m a x s ∈ R 1 × H × W F_{max}^{s} \in \mathbb{R}^{1\times H\times W } Fmaxs∈R1×H×W分别表示沿着通道维度进行maxpool和avgpool
f 7 × 7 f^{7\times7} f7×7表示卷积核尺寸为7x7的卷积操作
对比SENet
相比SENet,CBAM的创新之处是在global pooling的时候同时进行了maxpool和avgpool。
原文介绍:
we show that those are suboptimal features in order to infer fine channel attention, and we suggest to use max-pooled features as well;
max-pooling gathers another important clue about distinctive object features to infer finer channel-wise attention;
max-pooled features are as meaningful as average-pooled features, comparing the accuracy improvement from the baseline;
channel pooling produces better accuracy, indicating that explicitly modeled pooling leads to finer attention inference rather than learnable weighted channel pooling
代码复现
基于之前搭建的 Tensorflow2.0 keras ResNet 50 101 152系列 代码实现。模型搭建请参看
Tensorflow 2.0 keras.models.Sequential() Model() 创建网络的若干方式 及共享权重问题
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, Sequential
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Dense, Flatten,Reshape, Dropout, BatchNormalization, Activation, GlobalAveragePooling2D
from tensorflow.keras.layers import GlobalMaxPool2D, Concatenate
# 继承Layer,建立resnet50 101 152卷积层模块
def conv_block(inputs, filter_num, reduction_ratio, stride=1, name=None):
x = inputs
x = Conv2D(filter_num[0], (1,1), strides=stride, padding='same', name=name+'_conv1')(x)
x = BatchNormalization(axis=3, name=name+'_bn1')(x)
x = Activation('relu', name=name+'_relu1')(x)
x = Conv2D(filter_num[1], (3,3), strides=1, padding='same', name=name+'_conv2')(x)
x = BatchNormalization(axis=3, name=name+'_bn2')(x)
x = Activation('relu', name=name+'_relu2')(x)
x = Conv2D(filter_num[2], (1,1), strides=1, padding='same', name=name+'_conv3')(x)
x = BatchNormalization(axis=3, name=name+'_bn3')(x)
# Channel Attention
avgpool = GlobalAveragePooling2D(name=name+'_channel_avgpool')(x)
maxpool = GlobalMaxPool2D(name=name+'_channel_maxpool')(x)
# Shared MLP
Dense_layer1 = Dense(filter_num[2]//reduction_ratio, activation='relu', name=name+'_channel_fc1')
Dense_layer2 = Dense(filter_num[2], activation='relu', name=name+'_channel_fc2')
avg_out = Dense_layer2(Dense_layer1(avgpool))
max_out = Dense_layer2(Dense_layer1(maxpool))
channel = layers.add([avg_out, max_out])
channel = Activation('sigmoid', name=name+'_channel_sigmoid')(channel)
channel = Reshape((1,1,filter_num[2]), name=name+'_channel_reshape')(channel)
channel_out = tf.multiply(x, channel)
# Spatial Attention
avgpool = tf.reduce_mean(channel_out, axis=3, keepdims=True, name=name+'_spatial_avgpool')
maxpool = tf.reduce_max(channel_out, axis=3, keepdims=True, name=name+'_spatial_maxpool')
spatial = Concatenate(axis=3)([avgpool, maxpool])
spatial = Conv2D(1, (7,7), strides=1, padding='same',name=name+'_spatial_conv2d')(spatial)
spatial_out = Activation('sigmoid', name=name+'_spatial_sigmoid')(spatial)
CBAM_out = tf.multiply(channel_out, spatial_out)
# residual connection
r = Conv2D(filter_num[2], (1,1), strides=stride, padding='same', name=name+'_residual')(inputs)
x = layers.add([CBAM_out, r])
x = Activation('relu', name=name+'_relu3')(x)
return x
def build_block (x, filter_num, blocks, reduction_ratio=16, stride=1, name=None):
x = conv_block(x, filter_num, reduction_ratio, stride, name=name)
for i in range(1, blocks):
x = conv_block(x, filter_num, reduction_ratio, stride=1, name=name+'_block'+str(i))
return x
# 创建resnet50 101 152
def SE_ResNet(Netname, nb_classes):
ResNet_Config = {'ResNet50':[3,4,6,3],
'ResNet101':[3,4,23,3],
'ResNet152':[3,8,36,3]}
layers_dims=ResNet_Config[Netname]
filter_block1=[64, 64, 256]
filter_block2=[128,128,512]
filter_block3=[256,256,1024]
filter_block4=[512,512,2048]
# Reduction ratio in four blocks
SE_reduction=[16,16,16,16]
img_input = Input(shape=(224,224,3))
# stem block
x = Conv2D(64, (7,7), strides=(2,2),padding='same', name='stem_conv')(img_input)
x = BatchNormalization(axis=3, name='stem_bn')(x)
x = Activation('relu', name='stem_relu')(x)
x = MaxPooling2D((3,3), strides=(2,2), padding='same', name='stem_pool')(x)
# convolution block
x = build_block(x, filter_block1, layers_dims[0], SE_reduction[0], name='conv1')
x = build_block(x, filter_block2, layers_dims[1], SE_reduction[1], stride=2, name='conv2')
x = build_block(x, filter_block3, layers_dims[2], SE_reduction[2], stride=2, name='conv3')
x = build_block(x, filter_block4, layers_dims[3], SE_reduction[3], stride=2, name='conv4')
# top layer
x = GlobalAveragePooling2D(name='top_layer_pool')(x)
x = Dense(nb_classes, activation='softmax', name='fc')(x)
model = models.Model(img_input, x, name=Netname)
return model
if __name__=='__main__':
model = SE_ResNet('ResNet50', 1000)
model.summary()