【论文复现】CBAM（2018）

最新推荐文章于 2025-04-16 20:11:19 发布

满船清梦压星河HK

最新推荐文章于 2025-04-16 20:11:19 发布

阅读量3.4k

点赞数 8

分类专栏： CV论文文章标签： CAM SAM 注意力机制

本文链接：https://blog.csdn.net/qq_38253797/article/details/117292848

版权

CV论文专栏收录该内容

23 篇文章

订阅专栏

在这里插入图片描述

前言：该篇论文(2018年)提出两个attention模块，一个是基于channel（CAM），一个是基于spatial(SAM）。同时，两个模块也可以组合起来使用，形成CBAM。CBAM是一个轻量化模块，两个模块的实现都比较简单，而且能够在现有的网络结构中即插即用。在YOLOv4中就用到了SAM。
论文： https://arxiv.org/abs/1807.06521.
代码： https://github.com/luuuyi/CBAM.PyTorch.

一、提出背景

$\qquad$ 为了提升 CNN 模型的表现，最近的研究主要集中在三个重要的方面：深度、宽度和基数（cardinality）。

深度(Depth)：VGG、ResNet
宽度(Width)：GoogleNet
基数(Cardinality)：Xception、ResNext ，经验表明，基数不仅可以节省参数总量，还可以产生比深度和宽度更强的表示能力。
一般来说，网络越深，所提取到的特征就越抽象；网络越宽，其特征就越丰富；基数越大，越能发挥每个卷积核独特的作用。

$\qquad$ 除了这些因素，作者则研究了网络架构设计的另一个不同方向：注意力。注意力则是一种能够强化重要信息抑制非重要信息的方法。

$\qquad$ 注意力不仅要告诉我们重点关注哪里，还要提高关注点的表示（representation of interests）。我们的目标是通过使用注意机制来增加表现力，关注重要特征并抑制不必要的特征。

$\qquad$ 为了强调空间和通道这两个维度上的有意义特征，作者依次应用通道注意模块（CAM）和空间注意模块(SAM)，来分别在通道和空间维度上学习关注什么、在哪里关注。

$\qquad$ 通道上的 Attention 机制（CAM）早在 2017 年的 SENet 就被提出（感兴趣可以看我另一篇博文: SENet.）。事实上，CAM 与 SENet 相比，只是多了一个并行的 Max Pooling 层。至于为何如此更改，下面 2.1 小节我们会做解释。

$\qquad$ 目前主流的注意力机制可以分为以下三种：通道注意力、空间注意力以及自注意力（Self-attention）。这里我们主要讨论前两种。

二、模块结构

$\qquad$ 对于空间注意力来说，由于将每个通道中的特征都做同等处理，忽略了通道间的信息交互；而通道注意力则是将一个通道内的信息直接进行全局处理，容易忽略通道内的信息交互。
$\qquad$ 所以作者将通道（channel）注意力模块和空间（spatial）注意力模块相结合。这样，效果会更好，而且不仅可以节约参数和计算力，而且保证了其可以作为即插即用的模块集成到现有的网络架构中去。

2.1、通道注意力模块（CAM）

$\qquad$ 通道注意力旨在显示的建模出不同通道（特征图）之间的相关性，通过网络学习的方式来自动获取到每个特征通道的重要程度，最后再为每个通道赋予不同的权重系数，从而来强化重要的特征抑制非重要的特征。

$\qquad$ 本文利用特征的通道间关系, 生成了通道注意图。当一个特征图的每个通道被考虑作为特征探测器, 通道注意聚焦于 ’ what ’ 是有意义的输入图像（信息）。为了有效地计算通道的注意力, 我们压缩了输入特征图的空间维数。为了聚焦空间信息，我们同时使用平均池化和最大池化。
在这里插入图片描述
流程：
$\qquad$ 将输入的特征图 $F (H \times W \times C)$ 分别经过基于 width 和 height 的 global max pooling（全局最大池化）和global average pooling（全局平均池化），分别得到两个1×1×C的特征图 $F^c_{max}$ 和 $F^c_{avg}$ ，接着，再将它们分别送入共享（参数）的一个两层的感知机神经网络（MLP），第一层神经元个数为 C/r（r为减少率），激活函数为 Relu，第二层神经元个数为 C。而后，将MLP输出的两个特征进行基于 element-wise 的加和操作，再经过sigmoid激活操作，生成最终的channel attention feature，即 $M_c(F)$ 。最后，将 $M_c(F)$ 和输入特征图 $F (H \times W \times C)$ 做基于 element-wise 的乘法操作，生成Spatial attention模块需要的输入特征。

公式表达：
$M_c(F) = \sigma(MLP(AvgPool(F)) + MLP(MaxPool(F))= \sigma(W_1(W_0(F^c_{avg}))+W_1(W_0(F^c_{max})))$
其中 $\sigma$ 为 $s i g m o i d$ 函数， $W_0$ 和 $W_1$ 为感知机网络的共享参数，且感知机第一层计算后会接一个ReLU激活函数。

注意1：这里的CAM和SENet不同的是CAM不仅用了AvgPool，还用了MaxPool，因为作者通过实验发现， AvgPool+MaxPool的模式可以大大提高了网络的表示能力。
解释：AvgPool对feature map上的每一个像素点都有反馈，而MaxPool在进行梯度反向传播计算只有feature map中响应最大的地方有梯度的反馈，能作为AvgPool的一个补充；也有可能是池化丢失的信息太多， AvgPool+MaxPool的并行连接方式比单一的池化丢失的信息更少，所以效果会更好一点
如下是作者的实验结果：

注意2：中间的Shared MLP模块，通常采用的是 Conv+ReLU+Conv 实现，第一个 Conv 对feature map进行降维处理（降维因子rate一般设为16，即将维度降为输入feature map的channel的 1/16），第二个 Conv 对降维后的feature map再进行升维处理（升维因子rate一般设为16，即将维度升回输入feature map的channel）。

2.2、空间注意力模块（SAM）

$\qquad$ 空间注意力旨在提升关键区域的特征表达，本质上是将原始图片中的空间信息通过空间转换模块，变换到另一个空间中并保留关键信息，为每个位置生成权重掩膜（mask）并加权输出，从而增强感兴趣的特定目标区域同时弱化不相关的背景区域。
$\qquad$ 本文利用特征间的空间关系, 生成空间注意图。与通道注意力不同的是, 空间注意力集中在 “where” 是一个重要的信息, 这是对通道注意力的补充。

在这里插入图片描述

流程：
将CAM输出的特征图 $M_c(F)$ 作为本模块的输入特征图。首先做一个基于 channel 的global max pooling 和 global average pooling，分别得到两个H×W×1 的特征图 $F^s_{avg}$ 和 $F^S_{max}$ ，然后将这2个特征图基于 channel 做concat操作（通道拼接)。然后经过一个7×7卷积（7×7比3×3效果要好）操作，降维为1个channel，即H×W×1。再经过 sigmoid 生成spatial attention feature，即 $M_s(F)$ 。最后将该feature和该模块的输入feature做乘法，得到最终生成的特征。

公式表达： $M_s(F) = \sigma(f^{7*7}([AvgPool(F);MaxPool(F)]))= \sigma(f^{7*7}([F^s_{avg};F^s_{max}]))$
其中 $\sigma$ 为 $s i g m o i d$ 函数， $f^{7*7}$ 为一个卷积核为 7x7 的普通卷积操作，[ ]为concat操作。

注意1：实际的代码中，这个AvgPool(F)的操作是用torch.mean(x, dim=1, keepdim=true)来实现的，这句代码是求x的每个像素在所有channel相同位置上的平均值；
注意2：MaxPool(F)是用torch.max(x, dim=1, keepdim=true)来实现的，这句代码是求x的每个像素在所有channel相同位置上的最大值。

2.3、组合模块

在这里插入图片描述

如上图，作者通过大量的实验发现，先CAM，再SAM的串型结构，效果最佳。即如下图的结构：
在这里插入图片描述

扩展开将CBAM与ResNet相结合1（每个Block中使用CBAM）如下图：
在这里插入图片描述
还有一种结合是在第一个Block之前和最后一个Block后各接一个CBAM.

三、PyTorch实现

3.1、CBAM + ResNet1

下面实现的是CBAM + ResNet1（每个Block中使用CBAM）：

import torch
import torch.nn as nn
from torchsummary import summary

# 这个model是在ResNet的每个Block中都加入CBAM

__all__ = ['resnet18_cbam', 'resnet34_cbam', 'resnet50_cbam', 'resnet101_cbam', 'resnet152_cbam']

def conv1x1(in_channel, out_channel, stride=1):
    """1x1 convolution"""
    return nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride, bias=False)

def conv3x3(in_channel, out_channel, stride=1):
    "3x3 convolution with padding"
    return nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride,
                     padding=1, bias=False)

class ChannelAttention(nn.Module):
    def __init__(self, in_channel, ratio=16):
        """
        : params: in_planes 输入模块的feature map的channel
        : params: ratio 降维/升维因子
        通道注意力则是将一个通道内的信息直接进行全局处理，容易忽略通道内的信息交互
        """
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)  # 平均池化，是取整个channel所有元素的均值 [3,5,5] => [3,1,1]
        self.max_pool = nn.AdaptiveMaxPool2d(1)  # 最大池化，是取整个channel所有元素的最大值[3,5,5] => [3,1,1]

        # fc = shared MLP
        self.fc = nn.Sequential(nn.Conv2d(in_channel, in_channel // ratio, 1, bias=False),
                                nn.ReLU(),
                                nn.Conv2d(in_channel // ratio, in_channel, 1, bias=False))
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc(self.avg_pool(x))
        max_out = self.fc(self.max_pool(x))
        out = avg_out + max_out
        return self.sigmoid(out)

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        """对空间注意力来说，由于将每个通道中的特征都做同等处理，容易忽略通道间的信息交互"""
        super(SpatialAttention, self).__init__()

        # 这里要保持卷积后的feature尺度不变，必须要padding=kernel_size//2
        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=kernel_size // 2, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):                               # 输入x = [b, c, 56, 56]
        avg_out = torch.mean(x, dim=1, keepdim=True)    # avg_out = [b, 1, 56, 56]  求x的每个像素在所有channel相同位置上的平均值
        max_out, _ = torch.max(x, dim=1, keepdim=True)  # max_out = [b, 1, 56, 56]  求x的每个像素在所有channel相同位置上的最大值
        x = torch.cat([avg_out, max_out], dim=1)        # x = [b, 2, 56, 56]  concat操作
        x = self.sigmoid(self.conv1(x))                 # x = [b, 1, 56, 56]  卷积操作，融合avg和max的信息，全方面考虑
        return x

class CBAM_BasicBlock(nn.Module):
    # resnet18 + resnet34(resdual1)  实线残差结构+虚线残差结构
    expansion = 1  # 残差结构中主分支的卷积核个数是否发生变化（倍数） 第二个卷积核输出是否发生变化

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        """
        : params: in_channel=第一个conv的输入channel
        : params: out_channel=第一个conv的输出channel
        : params: stride=中间conv的stride
        : params: downsample=None:实线残差结构/Not None:虚线残差结构
        """
        super(CBAM_BasicBlock, self).__init__()
        self.conv1 = conv3x3(in_channel=in_channel, out_channel=out_channel, stride=stride)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(in_channel=out_channel, out_channel=out_channel)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

        # 加入CBAM
        self.ca = ChannelAttention(out_channel * self.expansion)
        self.sa = SpatialAttention()

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        # 加入CBAM
        out = self.ca(out) * out
        out = self.sa(out) * out

        out += identity
        out = self.relu(out)
        return out

class CBAM_Bottleneck(nn.Module):
    # resnet50+resnet101+resnet152（resdual2） 实线残差结构+虚线残差结构
    expansion = 4  # 残差结构中主分支的卷积核个数是否发生变化（倍数）  第三个卷积核输出是否发生变化

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        """
        : params: in_channel=第一个conv的输入channel
        : params: out_channel=第一个conv的输出channel
        : params: stride=中间conv的stride
                  resnet50/101/152:conv2_x的所有层s=1   conv3_x/conv4_x/conv5_x的第一层s=2,其他层s=1
        : params: downsample=None:实线残差结构/Not None:虚线残差结构
        """
        super(CBAM_Bottleneck, self).__init__()
        # 1x1卷积一般s=1 p=0 => w、h不变   卷积默认向下取整
        self.conv1 = conv1x1(in_channel=in_channel, out_channel=out_channel, stride=1)
        self.bn1 = nn.BatchNorm2d(out_channel)
        # ----------------------------------------------------------------------------------
        # 3x3卷积一般s=2 p=1 => w、h /2（下采样）     3x3卷积一般s=1 p=1 => w、h不变
        self.conv2 = conv3x3(in_channel=out_channel, out_channel=out_channel, stride=stride)
        self.bn2 = nn.BatchNorm2d(out_channel)
        # ---------------------------------------------------------------------------------
        self.conv3 = conv1x1(in_channel=out_channel, out_channel=out_channel * self.expansion, stride=1)
        self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)
        # ----------------------------------------------------------------------------------
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

        # 加入CBAM
        self.ca = ChannelAttention(out_channel * self.expansion)
        self.sa = SpatialAttention()

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        # 加入CBAM
        out = self.ca(out) * out
        out = self.sa(out) * out

        out += identity
        out = self.relu(out)
        return out

class CBAM_ResNet(nn.Module):

    def __init__(self, block, blocks_num, num_classes=1000):
        """
        : params:  block=BasicBlock/Bottleneck
        : params:  blocks_num=每个layer中残差结构的个数
        : params:  num_classes=数据集的分类个数
        """
        super(CBAM_ResNet, self).__init__()
        self.in_channel = 64  # in_channel=每一个layer层第一个卷积层的输出channel/第一个卷积核的数量

        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)  # 池化默认向下取整

        # 第1个layer的虚线残差结构只需要改变channel,长、宽不变  所以stride=1
        self.layer1 = self._make_layer(block, blocks_num[0], channel=64, stride=1)
        # 第2/3/4个layer的虚线残差结构不仅要改变channel还要将长、宽缩小为原来的一半 所以stride=2
        self.layer2 = self._make_layer(block, blocks_num[1], channel=128, stride=2)
        self.layer3 = self._make_layer(block, blocks_num[2], channel=256, stride=2)
        self.layer4 = self._make_layer(block, blocks_num[3], channel=512, stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # AdaptiveAvgPool2d 自适应池化层  output_size=(1, 1)
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        # 凯明初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def _make_layer(self, block, block_num, channel, stride=1):
        """
        : params: block=BasicBlock/Bottleneck   18/34用BasicBlock   50/101/152用Bottleneck
        : params: block_num=当前layer中残差结构的个数
        : params: channel=每个convx_x中第一个卷积核的数量  每一个layer的这个参数都是固定的
        : params: stride=每个convx_x中第一层中3x3卷积层的stride=每个convx_x中downsample(res)的stride
                  resnet50/101/152   conv2_x=>s=1  conv3_x/conv4_x/conv5_x=>s=2
        """
        downsample = None

        # in_channel:每个convx_x中第一层的第一个卷积核的数量
        # channel*block.expansion:每一个layer最后一个卷积核的数量
        # res50/101/152的conv2/3/4/5_x的in_channel != channel * block.expansion永远成立，所以第一层必有downsample（虚线残差结构）
        # 但是conv2_x的第一层只改变channel不改变w/h（s=1），而conv3_x/conv4_x/conv5_x的第一层不仅改变channel还改变w/h(s=2下采样)
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion)
            )

        layers = []
        # 第一层（含虚线残差结构）加入layers
        layers.append(block(self.in_channel, channel, stride=stride, downsample=downsample))
        # 经过第一层后channel变了
        self.in_channel = channel * block.expansion

        # res50/101/152的conv2/3/4/5_x除了第一层有downsample（虚线残差结构），其他所有层都是实现残差结构（等差映射）
        for _ in range(1, block_num):
            layers.append(block(self.in_channel, channel))  # channel在Bottleneck变化：512->128->512
        return nn.Sequential(*layers)

    def forward(self, x):
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.maxpool(out)

        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)

        out = self.avgpool(out)
        out = torch.flatten(out, 1)
        out = self.fc(out)
        return out

def resnet18_cbam(**kwargs):
    """ResNet-18 + CBAM."""
    model = CBAM_ResNet(CBAM_BasicBlock, [2, 2, 2, 2], **kwargs)
    return model

def resnet34_cbam(**kwargs):
    """ResNet-34 + CBAM."""
    model = CBAM_ResNet(CBAM_BasicBlock, [3, 4, 6, 3], **kwargs)
    return model

def resnet50_cbam(**kwargs):
    """ResNet-50 + CBAM."""
    model = CBAM_ResNet(CBAM_Bottleneck, [3, 4, 6, 3], **kwargs)
    return model

def resnet101_cbam(**kwargs):
    """ResNet-101 + CBAM."""
    model = CBAM_ResNet(CBAM_Bottleneck, [3, 4, 23, 3], **kwargs)
    return model

def resnet152_cbam(**kwargs):
    """ResNet-152 + CBAM."""
    model = CBAM_ResNet(CBAM_Bottleneck, [3, 8, 36, 3], **kwargs)
    return model

if __name__ == '__main__':
    # 权重测试
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print(device)

    model = resnet50_cbam(num_classes=5)
    print(model)
    summary(model, (3, 224, 224))  # params:28,549,733    Total Size (MB): 397.07

3.2、CBAM + ResNet2

下面实现的是CBAM + ResNet2（在第一个Block之前和最后一个Block后各接一个CBAM）：

import torch
import torch.nn as nn
from torchsummary import summary

# 这个model是在ResNet的Block开始和结尾两个地方加入CBAM

__all__ = ['resnet18_cbam', 'resnet34_cbam', 'resnet50_cbam', 'resnet101_cbam', 'resnet152_cbam']


def conv1x1(in_channel, out_channel, stride=1):
    """1x1 convolution"""
    return nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride, bias=False)

def conv3x3(in_channel, out_channel, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride, padding=1, bias=False)


class ChannelAttention(nn.Module):
    def __init__(self, in_channel, ratio=16):
        """
        : params: in_planes 输入模块的feature map的channel
        : params: ratio 降维/升维因子
        """
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        # fc = shared MLP
        self.fc = nn.Sequential(nn.Conv2d(in_channel, in_channel // ratio, 1, bias=False),
                                nn.ReLU(),
                                nn.Conv2d(in_channel // ratio, in_channel, 1, bias=False))
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc(self.avg_pool(x))
        max_out = self.fc(self.max_pool(x))
        out = avg_out + max_out
        return self.sigmoid(out)


class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        """对空间注意力来说，由于将每个通道中的特征都做同等处理，容易忽略通道间的信息交互"""
        super(SpatialAttention, self).__init__()

        # 这里要保持卷积后的feature尺度不变，必须要padding=kernel_size//2
        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=kernel_size // 2, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):                               # 输入x = [b, c, 56, 56]
        avg_out = torch.mean(x, dim=1, keepdim=True)    # avg_out = [b, 1, 56, 56]  求x的每个像素在所有channel相同位置上的平均值
        max_out, _ = torch.max(x, dim=1, keepdim=True)  # max_out = [b, 1, 56, 56]  求x的每个像素在所有channel相同位置上的最大值
        x = torch.cat([avg_out, max_out], dim=1)        # x = [b, 2, 56, 56]  concat操作
        x = self.sigmoid(self.conv1(x))                 # x = [b, 1, 56, 56]  卷积操作，融合avg和max的信息，全方面考虑
        return x


class CBAM_BasicBlock(nn.Module):
    # resnet18 + resnet34(resdual1)  实线残差结构+虚线残差结构
    expansion = 1  # 残差结构中主分支的卷积核个数是否发生变化（倍数） 第二个卷积核输出是否发生变化

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        """
        : params: in_channel=第一个conv的输入channel
        : params: out_channel=第一个conv的输出channel
        : params: stride=中间conv的stride
        : params: downsample=None:实线残差结构/Not None:虚线残差结构
        """
        super(CBAM_BasicBlock, self).__init__()
        self.conv1 = conv3x3(in_channel=in_channel, out_channel=out_channel, stride=stride)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(in_channel=out_channel, out_channel=out_channel)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)
        return out

class CBAM_Bottleneck(nn.Module):
    # resnet50+resnet101+resnet152（resdual2） 实线残差结构+虚线残差结构
    expansion = 4  # 残差结构中主分支的卷积核个数是否发生变化（倍数）  第三个卷积核输出是否发生变化

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        """
        : params: in_channel=第一个conv的输入channel
        : params: out_channel=第一个conv的输出channel
        : params: stride=中间conv的stride
                  resnet50/101/152:conv2_x的所有层s=1   conv3_x/conv4_x/conv5_x的第一层s=2,其他层s=1
        : params: downsample=None:实线残差结构/Not None:虚线残差结构
        """
        super(CBAM_Bottleneck, self).__init__()
        # 1x1卷积一般s=1 p=0 => w、h不变   卷积默认向下取整
        self.conv1 = conv1x1(in_channel=in_channel, out_channel=out_channel, stride=1)
        self.bn1 = nn.BatchNorm2d(out_channel)
        # ----------------------------------------------------------------------------------
        # 3x3卷积一般s=2 p=1 => w、h /2（下采样）     3x3卷积一般s=1 p=1 => w、h不变
        self.conv2 = conv3x3(in_channel=out_channel, out_channel=out_channel, stride=stride)
        self.bn2 = nn.BatchNorm2d(out_channel)
        # ---------------------------------------------------------------------------------
        self.conv3 = conv1x1(in_channel=out_channel, out_channel=out_channel * self.expansion, stride=1)
        self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)
        # ----------------------------------------------------------------------------------
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)
        return out


class CBAM_ResNet(nn.Module):

    def __init__(self, block, blocks_num, num_classes=1000):
        """
        : params:  block=BasicBlock/Bottleneck
        : params:  blocks_num=每个layer中残差结构的个数
        : params:  num_classes=数据集的分类个数
        """
        super(CBAM_ResNet, self).__init__()
        self.in_channel = 64  # in_channel=每一个layer层第一个卷积层的输出channel/第一个卷积核的数量

        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)  # 池化默认向下取整

        # 在Block之前加入CBAM
        self.ca1 = ChannelAttention(self.in_channel)
        self.sa1 = SpatialAttention()

        # 第1个layer的虚线残差结构只需要改变channel,长、宽不变  所以stride=1
        self.layer1 = self._make_layer(block, blocks_num[0], channel=64, stride=1)
        # 第2/3/4个layer的虚线残差结构不仅要改变channel还要将长、宽缩小为原来的一半 所以stride=2
        self.layer2 = self._make_layer(block, blocks_num[1], channel=128, stride=2)
        self.layer3 = self._make_layer(block, blocks_num[2], channel=256, stride=2)
        self.layer4 = self._make_layer(block, blocks_num[3], channel=512, stride=2)

        # 在Block之后加入CBAM
        self.ca2 = ChannelAttention(2048)  # 最后一个Block后输出[2048, 7, 7]
        self.sa2 = SpatialAttention()

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # AdaptiveAvgPool2d 自适应池化层  output_size=(1, 1)
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        # 凯明初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def _make_layer(self, block, block_num, channel, stride=1):
        """
        : params: block=BasicBlock/Bottleneck   18/34用BasicBlock   50/101/152用Bottleneck
        : params: block_num=当前layer中残差结构的个数
        : params: channel=每个convx_x中第一个卷积核的数量  每一个layer的这个参数都是固定的
        : params: stride=每个convx_x中第一层中3x3卷积层的stride=每个convx_x中downsample(res)的stride
                  resnet50/101/152   conv2_x=>s=1  conv3_x/conv4_x/conv5_x=>s=2
        """
        downsample = None

        # in_channel:每个convx_x中第一层的第一个卷积核的数量
        # channel*block.expansion:每一个layer最后一个卷积核的数量
        # res50/101/152的conv2/3/4/5_x的in_channel != channel * block.expansion永远成立，所以第一层必有downsample（虚线残差结构）
        # 但是conv2_x的第一层只改变channel不改变w/h（s=1），而conv3_x/conv4_x/conv5_x的第一层不仅改变channel还改变w/h(s=2下采样)
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion)
            )

        layers = []
        # 第一层（含虚线残差结构）加入layers
        layers.append(block(self.in_channel, channel, stride=stride, downsample=downsample))
        # 经过第一层后channel变了
        self.in_channel = channel * block.expansion

        # res50/101/152的conv2/3/4/5_x除了第一层有downsample（虚线残差结构），其他所有层都是实现残差结构（等差映射）
        for _ in range(1, block_num):
            layers.append(block(self.in_channel, channel))  # channel在Bottleneck变化：512->128->512
        return nn.Sequential(*layers)

    def forward(self, x):
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.maxpool(out)

        # 在Block之前加入CBAM
        out = self.ca1(out) * out
        out = self.sa1(out) * out

        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)

        # 在Block之后加入CBAM
        out = self.ca2(out) * out
        out = self.sa2(out) * out

        out = self.avgpool(out)
        out = torch.flatten(out, 1)
        out = self.fc(out)
        return out


def resnet18_cbam(**kwargs):
    """ResNet-18 + CBAM."""
    model = CBAM_ResNet(CBAM_BasicBlock, [2, 2, 2, 2], **kwargs)
    return model


def resnet34_cbam(**kwargs):
    """ResNet-34 + CBAM."""
    model = CBAM_ResNet(CBAM_BasicBlock, [3, 4, 6, 3], **kwargs)
    return model


def resnet50_cbam(**kwargs):
    """ResNet-50 + CBAM."""
    model = CBAM_ResNet(CBAM_Bottleneck, [3, 4, 6, 3], **kwargs)
    return model


def resnet101_cbam(**kwargs):
    """ResNet-101 + CBAM."""
    model = CBAM_ResNet(CBAM_Bottleneck, [3, 4, 23, 3], **kwargs)
    return model


def resnet152_cbam(**kwargs):
    """ResNet-152 + CBAM."""
    model = CBAM_ResNet(CBAM_Bottleneck, [3, 8, 36, 3], **kwargs)
    return model


if __name__ == '__main__':
    # 权重测试
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print(device)


    model = resnet50_cbam(num_classes=5)
    print(model)
    summary(model, (3, 224, 224))  # params:24,568,073    Total Size (MB): 381.02

四、实验结果

在这里插入图片描述

结果显示cbam_resnet1的结果是优于resnet的，但是这里的cmba_resnet2效果却更差。虽然理论上cbam是可行的，但是实际的效果还是要根据实际的数据集进行分析才可以得到。

Reference

SE: link.
大佬: link.

https://zhuanlan.zhihu.com/p/106084464
https://zhuanlan.zhihu.com/p/98958111
https://blog.csdn.net/u013738531/article/details/82731257
https://blog.csdn.net/Roaddd/article/details/114646354
https://www.cnblogs.com/ansang/p/9371764.html