CBAM: Convolutional Block Attention Module及其keras实现

最新推荐文章于 2024-05-08 09:46:34 发布

鱼吐泡泡水

最新推荐文章于 2024-05-08 09:46:34 发布

阅读量1.9k

点赞数 4

分类专栏： Attention 文章标签：计算机视觉神经网络

本文链接：https://blog.csdn.net/m0_37859875/article/details/108692875

版权

Attention 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

CBAM

Abstract
Contribution
Convolutional Block Attention Module
Experiment

Abstract

在这里插入图片描述

Contribution

1.提出了CBAM，可以广泛应用于提高CNN的表示能力；
2.利用大量的消融实验验证了CBAM的有效性；
3.证实了加入CBAM之后，在多个基准数据集上 (ImageNet-1K, MS COCO, and VOC 2007) ，各种网络的性能都得到了极大地提升。

Convolutional Block Attention Module

中间特征图： $F\in R^{C\times H\times W}$
1D通道注意力图： $M_c\in R^{C\times 1\times 1}$
2D空间注意力图： $M_s\in R^{1\times H\times W}$
整体注意力过程： $M_c(F)\bigotimes F,$ $M_s(F')\bigotimes F',$
$\bigotimes$ 表示对应元素相乘，在乘法过程中，注意值会相应地广播（复制）：通道注意值沿空间维度广播的，反之亦然。如下图Fig.1所示：
在这里插入图片描述

Channel attention module

在这里插入图片描述

两个不同的上下文空间描述符：
$F_{avg}^c\in R^{C\times 1\times 1}$ ：平均池化特征； $F_{max}^c\in R^{C\times 1\times 1}$ ：最大池化特征
共享MLP隐藏层的激活大小： $R^{C/r\times 1\times1}$ , $r$ 缩小率
通道注意力图： $M_c\in R^{C\times 1\times 1}$
$M_c(F) = \sigma (MLP(AvgPool(F))+MLP(MaxPool(F)))\\ =\sigma(W_1(W_0(F_{avg}^c))+W_1(W_0(F_{max}^c)))$
$\sigma$ 表示sigmoid函数， $W_0\in R^{C/r\times C}$ ， $W_1\in R^{C\times C/r}$ ， $W_0,W_1$ 是MLP的权重，被两个输入所共享，ReLU激活功能后跟 $W_0$ 。

Spatial attention module

在这里插入图片描述

空间注意力图： $M_s\in R^{1\times H\times W}$
$F_{avg}^c\in R^{1\times H\times W}$ ：平均池化特征； $F_{max}^c\in R^{1\times H\times W}$ ：最大池化特征
$M_s(F) = \sigma(f^{7\times 7}{[AvgPool(F);MaxPool(F)]})\\ = \sigma(f^{7\times 7}([F_{avg}^s;F_{max}^s]))$
$\sigma$ 表示sigmoid函数， $f^{7\times 7}$ 表示卷积核大小为7x7的卷积操作。

Arrangement of attention modules

在这里插入图片描述
sequential比parallel提供更好的结果。对于sequential的安排，实验结果表明，channel-first比spatial-first略好。

Experiment

将CBAM集成入ResNet中的ResBlock模块示意图：
在这里插入图片描述

Ablation studies

在通道注意力模块中，同时引入最大池化和平均池化可以得到最好的效果。
在空间注意力模块中，同时引入最大池化和平均池化比利用一个 1×1 的卷积要好，同时，卷积层采用 7×7 的卷积核要优于 3×3 的卷积核。
通道注意力和空间注意力的组合方式：顺序组合且channel-first最好。

Image Classification on ImageNet-1K

将 CBAM 集成到所有的 ResNet 系列网络中去，都会降低最终的分类错误率，展示了 CBAM 的通用性和巨大的潜力。

Network Visualization with Grad-CAM

利用 Grad-CAM 对不同的网络进行可视化后，可以发现，引入 CBAM 后，特征覆盖到了待识别物体的更多部位，并且最终判别物体的概率也更高，这表明注意力机制的确让网络学会了关注重点信息。

MS COCO Object Detection、VOC 2007 Object Detection

在物体检测领域，引入 CBAM 后模型性能的提升效果同样非常明显。
keras实现的模型代码：

import  tensorflow as tf
from    tensorflow import keras
from    tensorflow.keras import layers, models, Sequential
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Dense, Flatten,Reshape, Dropout, BatchNormalization, Activation, GlobalAveragePooling2D
from tensorflow.keras.layers import GlobalMaxPool2D, Concatenate

# 继承Layer,建立resnet50 101 152卷积层模块
def conv_block(inputs, filter_num, reduction_ratio, stride=1, name=None):
    
    x = inputs
    x = Conv2D(filter_num[0], (1,1), strides=stride, padding='same', name=name+'_conv1')(x)
    x = BatchNormalization(axis=3, name=name+'_bn1')(x)
    x = Activation('relu', name=name+'_relu1')(x)

    x = Conv2D(filter_num[1], (3,3), strides=1, padding='same', name=name+'_conv2')(x)
    x = BatchNormalization(axis=3, name=name+'_bn2')(x)
    x = Activation('relu', name=name+'_relu2')(x)

    x = Conv2D(filter_num[2], (1,1), strides=1, padding='same', name=name+'_conv3')(x)
    x = BatchNormalization(axis=3, name=name+'_bn3')(x)

    # Channel Attention
    avgpool = GlobalAveragePooling2D(name=name+'_channel_avgpool')(x)
    maxpool = GlobalMaxPool2D(name=name+'_channel_maxpool')(x)
    # Shared MLP
    Dense_layer1 = Dense(filter_num[2]//reduction_ratio, activation='relu', name=name+'_channel_fc1')
    Dense_layer2 = Dense(filter_num[2], activation='relu', name=name+'_channel_fc2')
    avg_out = Dense_layer2(Dense_layer1(avgpool))
    max_out = Dense_layer2(Dense_layer1(maxpool))

    channel = layers.add([avg_out, max_out])
    channel = Activation('sigmoid', name=name+'_channel_sigmoid')(channel)
    channel = Reshape((1,1,filter_num[2]), name=name+'_channel_reshape')(channel)
    channel_out = tf.multiply(x, channel)
    
    # Spatial Attention
    avgpool = tf.reduce_mean(channel_out, axis=3, keepdims=True, name=name+'_spatial_avgpool')
    maxpool = tf.reduce_max(channel_out, axis=3, keepdims=True, name=name+'_spatial_maxpool')
    spatial = Concatenate(axis=3)([avgpool, maxpool])

    spatial = Conv2D(1, (7,7), strides=1, padding='same',name=name+'_spatial_conv2d')(spatial)
    spatial_out = Activation('sigmoid', name=name+'_spatial_sigmoid')(spatial)

    CBAM_out = tf.multiply(channel_out, spatial_out)

    # residual connection
    r = Conv2D(filter_num[2], (1,1), strides=stride, padding='same', name=name+'_residual')(inputs)
    x = layers.add([CBAM_out, r])
    x = Activation('relu', name=name+'_relu3')(x)

    return x

def build_block (x, filter_num, blocks, reduction_ratio=16, stride=1, name=None):

    x = conv_block(x, filter_num, reduction_ratio, stride, name=name)

    for i in range(1, blocks):
        x = conv_block(x, filter_num, reduction_ratio, stride=1, name=name+'_block'+str(i))

    return x


# 创建resnet50 101 152
def SE_ResNet(Netname, nb_classes):

    ResNet_Config = {'ResNet50':[3,4,6,3],
                    'ResNet101':[3,4,23,3],
                    'ResNet152':[3,8,36,3]}
    layers_dims=ResNet_Config[Netname]

    filter_block1=[64, 64, 256]
    filter_block2=[128,128,512]
    filter_block3=[256,256,1024]
    filter_block4=[512,512,2048]

    # Reduction ratio in four blocks
    SE_reduction=[16,16,16,16]

    img_input = Input(shape=(224,224,3))
    # stem block 
    x = Conv2D(64, (7,7), strides=(2,2),padding='same', name='stem_conv')(img_input)
    x = BatchNormalization(axis=3, name='stem_bn')(x)
    x = Activation('relu', name='stem_relu')(x)
    x = MaxPooling2D((3,3), strides=(2,2), padding='same', name='stem_pool')(x)
    # convolution block
    x = build_block(x, filter_block1, layers_dims[0], SE_reduction[0], name='conv1')
    x = build_block(x, filter_block2, layers_dims[1], SE_reduction[1], stride=2, name='conv2')
    x = build_block(x, filter_block3, layers_dims[2], SE_reduction[2], stride=2, name='conv3')
    x = build_block(x, filter_block4, layers_dims[3], SE_reduction[3], stride=2, name='conv4')
    # top layer
    x = GlobalAveragePooling2D(name='top_layer_pool')(x)
    x = Dense(nb_classes, activation='softmax', name='fc')(x)

    model = models.Model(img_input, x, name=Netname)

    return model
    

if __name__=='__main__':
    model = SE_ResNet('ResNet50', 1000)
    model.summary()