前言:该篇论文(2018年)提出两个attention模块,一个是基于channel(CAM),一个是基于spatial(SAM)。同时,两个模块也可以组合起来使用,形成CBAM。CBAM是一个轻量化模块,两个模块的实现都比较简单,而且能够在现有的网络结构中即插即用。在YOLOv4中就用到了SAM。
论文: https://arxiv.org/abs/1807.06521.
代码: https://github.com/luuuyi/CBAM.PyTorch.
一、提出背景
\qquad 为了提升 CNN 模型的表现,最近的研究主要集中在三个重要的方面:深度、宽度和基数(cardinality)。
深度(Depth):VGG、ResNet
宽度(Width):GoogleNet
基数(Cardinality):Xception、ResNext ,经验表明,基数不仅可以节省参数总量,还可以产生比深度和宽度更强的表示能力。
一般来说,网络越深,所提取到的特征就越抽象;网络越宽,其特征就越丰富;基数越大,越能发挥每个卷积核独特的作用。
\qquad 除了这些因素,作者则研究了网络架构设计的另一个不同方向:注意力。注意力则是一种能够强化重要信息抑制非重要信息的方法。
\qquad 注意力不仅要告诉我们重点关注哪里,还要提高关注点的表示(representation of interests)。我们的目标是通过使用注意机制来增加表现力,关注重要特征并抑制不必要的特征。
\qquad 为了强调空间和通道这两个维度上的有意义特征,作者依次应用通道注意模块(CAM)和空间注意模块(SAM),来分别在通道和空间维度上学习关注什么、在哪里关注。
\qquad 通道上的 Attention 机制(CAM)早在 2017 年的 SENet 就被提出(感兴趣可以看我另一篇博文: SENet.)。事实上,CAM 与 SENet 相比,只是多了一个并行的 Max Pooling 层。至于为何如此更改,下面 2.1 小节我们会做解释。
\qquad 目前主流的注意力机制可以分为以下三种:通道注意力、空间注意力以及自注意力(Self-attention)。这里我们主要讨论前两种。
二、模块结构
\qquad
对于空间注意力来说,由于将每个通道中的特征都做同等处理,忽略了通道间的信息交互;而通道注意力则是将一个通道内的信息直接进行全局处理,容易忽略通道内的信息交互。
\qquad
所以作者将通道(channel)注意力模块和空间(spatial)注意力模块相结合。这样,效果会更好,而且不仅可以节约参数和计算力,而且保证了其可以作为即插即用的模块集成到现有的网络架构中去。
2.1、通道注意力模块(CAM)
\qquad 通道注意力旨在显示的建模出不同通道(特征图)之间的相关性,通过网络学习的方式来自动获取到每个特征通道的重要程度,最后再为每个通道赋予不同的权重系数,从而来强化重要的特征抑制非重要的特征。
\qquad
本文利用特征的通道间关系, 生成了通道注意图。当一个特征图的每个通道被考虑作为特征探测器, 通道注意聚焦于 ’ what ’ 是有意义的输入图像(信息)。为了有效地计算通道的注意力, 我们压缩了输入特征图的空间维数。为了聚焦空间信息,我们同时使用平均池化和最大池化。
流程:
\qquad
将输入的特征图
F
(
H
×
W
×
C
)
F(H×W×C)
F(H×W×C)分别经过基于 width 和 height 的 global max pooling(全局最大池化)和global average pooling(全局平均池化),分别得到两个1×1×C的特征图
F
m
a
x
c
F^c_{max}
Fmaxc 和
F
a
v
g
c
F^c_{avg}
Favgc ,接着,再将它们分别送入共享(参数)的一个两层的感知机神经网络(MLP),第一层神经元个数为 C/r(r为减少率),激活函数为 Relu,第二层神经元个数为 C。而后,将MLP输出的两个特征进行基于 element-wise 的加和操作,再经过sigmoid激活操作,生成最终的channel attention feature,即
M
c
(
F
)
M_c(F)
Mc(F)。最后,将
M
c
(
F
)
M_c(F)
Mc(F) 和输入特征图
F
(
H
×
W
×
C
)
F(H×W×C)
F(H×W×C) 做基于 element-wise 的乘法操作,生成Spatial attention模块需要的输入特征。
公式表达:
M
c
(
F
)
=
σ
(
M
L
P
(
A
v
g
P
o
o
l
(
F
)
)
+
M
L
P
(
M
a
x
P
o
o
l
(
F
)
)
=
σ
(
W
1
(
W
0
(
F
a
v
g
c
)
)
+
W
1
(
W
0
(
F
m
a
x
c
)
)
)
M_c(F) = \sigma(MLP(AvgPool(F)) + MLP(MaxPool(F))= \sigma(W_1(W_0(F^c_{avg}))+W_1(W_0(F^c_{max})))
Mc(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F))=σ(W1(W0(Favgc))+W1(W0(Fmaxc)))
其中
σ
\sigma
σ 为
s
i
g
m
o
i
d
sigmoid
sigmoid函数,
W
0
W_0
W0和
W
1
W_1
W1为感知机网络的共享参数,且感知机第一层计算后会接一个ReLU激活函数。
注意1:这里的CAM和SENet不同的是CAM不仅用了AvgPool,还用了MaxPool,因为作者通过实验发现, AvgPool+MaxPool的模式可以大大提高了网络的表示能力。
解释:AvgPool对feature map上的每一个像素点都有反馈,而MaxPool在进行梯度反向传播计算只有feature map中响应最大的地方有梯度的反馈,能作为AvgPool的一个补充;也有可能是池化丢失的信息太多, AvgPool+MaxPool的并行连接方式比单一的池化丢失的信息更少,所以效果会更好一点
如下是作者的实验结果:
注意2:中间的Shared MLP模块,通常采用的是 Conv+ReLU+Conv 实现,第一个 Conv 对feature map进行降维处理(降维因子rate一般设为16,即将维度降为输入feature map的channel的 1/16),第二个 Conv 对降维后的feature map再进行升维处理(升维因子rate一般设为16,即将维度升回输入feature map的channel)。
2.2、空间注意力模块(SAM)
\qquad
空间注意力旨在提升关键区域的特征表达,本质上是将原始图片中的空间信息通过空间转换模块,变换到另一个空间中并保留关键信息,为每个位置生成权重掩膜(mask)并加权输出,从而增强感兴趣的特定目标区域同时弱化不相关的背景区域。
\qquad
本文利用特征间的空间关系, 生成空间注意图。与通道注意力不同的是, 空间注意力集中在 “where” 是一个重要的信息, 这是对通道注意力的补充。
流程:
将CAM输出的特征图
M
c
(
F
)
M_c(F)
Mc(F) 作为本模块的输入特征图。首先做一个基于 channel 的global max pooling 和 global average pooling,分别得到两个H×W×1 的特征图
F
a
v
g
s
F^s_{avg}
Favgs 和
F
m
a
x
S
F^S_{max}
FmaxS,然后将这2个特征图基于 channel 做concat操作(通道拼接)。然后经过一个7×7卷积(7×7比3×3效果要好)操作,降维为1个channel,即H×W×1。再经过 sigmoid 生成spatial attention feature,即
M
s
(
F
)
M_s(F)
Ms(F)。最后将该feature和该模块的输入feature做乘法,得到最终生成的特征。
公式表达:
M
s
(
F
)
=
σ
(
f
7
∗
7
(
[
A
v
g
P
o
o
l
(
F
)
;
M
a
x
P
o
o
l
(
F
)
]
)
)
=
σ
(
f
7
∗
7
(
[
F
a
v
g
s
;
F
m
a
x
s
]
)
)
M_s(F) = \sigma(f^{7*7}([AvgPool(F);MaxPool(F)]))= \sigma(f^{7*7}([F^s_{avg};F^s_{max}]))
Ms(F)=σ(f7∗7([AvgPool(F);MaxPool(F)]))=σ(f7∗7([Favgs;Fmaxs]))
其中
σ
\sigma
σ 为
s
i
g
m
o
i
d
sigmoid
sigmoid函数,
f
7
∗
7
f^{7*7}
f7∗7为一个卷积核为 7x7 的普通卷积操作,[ ]为concat操作。
注意1:实际的代码中,这个AvgPool(F)的操作是用torch.mean(x, dim=1, keepdim=true)来实现的,这句代码是求x的每个像素在所有channel相同位置上的平均值;
注意2:MaxPool(F)是用torch.max(x, dim=1, keepdim=true)来实现的,这句代码是求x的每个像素在所有channel相同位置上的最大值。
2.3、组合模块
如上图,作者通过大量的实验发现,先CAM,再SAM的串型结构,效果最佳。即如下图的结构:
扩展开将CBAM与ResNet相结合1(每个Block中使用CBAM)如下图:
还有一种结合是在第一个Block之前和最后一个Block后各接一个CBAM.
三、PyTorch实现
3.1、CBAM + ResNet1
下面实现的是CBAM + ResNet1(每个Block中使用CBAM):
import torch
import torch.nn as nn
from torchsummary import summary
# 这个model是在ResNet的每个Block中都加入CBAM
__all__ = ['resnet18_cbam', 'resnet34_cbam', 'resnet50_cbam', 'resnet101_cbam', 'resnet152_cbam']
def conv1x1(in_channel, out_channel, stride=1):
"""1x1 convolution"""
return nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride, bias=False)
def conv3x3(in_channel, out_channel, stride=1):
"3x3 convolution with padding"
return nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride,
padding=1, bias=False)
class ChannelAttention(nn.Module):
def __init__(self, in_channel, ratio=16):
"""
: params: in_planes 输入模块的feature map的channel
: params: ratio 降维/升维因子
通道注意力则是将一个通道内的信息直接进行全局处理,容易忽略通道内的信息交互
"""
super(ChannelAttention, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1) # 平均池化,是取整个channel所有元素的均值 [3,5,5] => [3,1,1]
self.max_pool = nn.AdaptiveMaxPool2d(1) # 最大池化,是取整个channel所有元素的最大值[3,5,5] => [3,1,1]
# fc = shared MLP
self.fc = nn.Sequential(nn.Conv2d(in_channel, in_channel // ratio, 1, bias=False),
nn.ReLU(),
nn.Conv2d(in_channel // ratio, in_channel, 1, bias=False))
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avg_out = self.fc(self.avg_pool(x))
max_out = self.fc(self.max_pool(x))
out = avg_out + max_out
return self.sigmoid(out)
class SpatialAttention(nn.Module):
def __init__(self, kernel_size=7):
"""对空间注意力来说,由于将每个通道中的特征都做同等处理,容易忽略通道间的信息交互"""
super(SpatialAttention, self).__init__()
# 这里要保持卷积后的feature尺度不变,必须要padding=kernel_size//2
self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=kernel_size // 2, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x): # 输入x = [b, c, 56, 56]
avg_out = torch.mean(x, dim=1, keepdim=True) # avg_out = [b, 1, 56, 56] 求x的每个像素在所有channel相同位置上的平均值
max_out, _ = torch.max(x, dim=1, keepdim=True) # max_out = [b, 1, 56, 56] 求x的每个像素在所有channel相同位置上的最大值
x = torch.cat([avg_out, max_out], dim=1) # x = [b, 2, 56, 56] concat操作
x = self.sigmoid(self.conv1(x)) # x = [b, 1, 56, 56] 卷积操作,融合avg和max的信息,全方面考虑
return x
class CBAM_BasicBlock(nn.Module):
# resnet18 + resnet34(resdual1) 实线残差结构+虚线残差结构
expansion = 1 # 残差结构中主分支的卷积核个数是否发生变化(倍数) 第二个卷积核输出是否发生变化
def __init__(self, in_channel, out_channel, stride=1, downsample=None):
"""
: params: in_channel=第一个conv的输入channel
: params: out_channel=第一个conv的输出channel
: params: stride=中间conv的stride
: params: downsample=None:实线残差结构/Not None:虚线残差结构
"""
super(CBAM_BasicBlock, self).__init__()
self.conv1 = conv3x3(in_channel=in_channel, out_channel=out_channel, stride=stride)
self.bn1 = nn.BatchNorm2d(out_channel)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(in_channel=out_channel, out_channel=out_channel)
self.bn2 = nn.BatchNorm2d(out_channel)
self.downsample = downsample
# 加入CBAM
self.ca = ChannelAttention(out_channel * self.expansion)
self.sa = SpatialAttention()
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
# 加入CBAM
out = self.ca(out) * out
out = self.sa(out) * out
out += identity
out = self.relu(out)
return out
class CBAM_Bottleneck(nn.Module):
# resnet50+resnet101+resnet152(resdual2) 实线残差结构+虚线残差结构
expansion = 4 # 残差结构中主分支的卷积核个数是否发生变化(倍数) 第三个卷积核输出是否发生变化
def __init__(self, in_channel, out_channel, stride=1, downsample=None):
"""
: params: in_channel=第一个conv的输入channel
: params: out_channel=第一个conv的输出channel
: params: stride=中间conv的stride
resnet50/101/152:conv2_x的所有层s=1 conv3_x/conv4_x/conv5_x的第一层s=2,其他层s=1
: params: downsample=None:实线残差结构/Not None:虚线残差结构
"""
super(CBAM_Bottleneck, self).__init__()
# 1x1卷积一般s=1 p=0 => w、h不变 卷积默认向下取整
self.conv1 = conv1x1(in_channel=in_channel, out_channel=out_channel, stride=1)
self.bn1 = nn.BatchNorm2d(out_channel)
# ----------------------------------------------------------------------------------
# 3x3卷积一般s=2 p=1 => w、h /2(下采样) 3x3卷积一般s=1 p=1 => w、h不变
self.conv2 = conv3x3(in_channel=out_channel, out_channel=out_channel, stride=stride)
self.bn2 = nn.BatchNorm2d(out_channel)
# ---------------------------------------------------------------------------------
self.conv3 = conv1x1(in_channel=out_channel, out_channel=out_channel * self.expansion, stride=1)
self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)
# ----------------------------------------------------------------------------------
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
# 加入CBAM
self.ca = ChannelAttention(out_channel * self.expansion)
self.sa = SpatialAttention()
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
# 加入CBAM
out = self.ca(out) * out
out = self.sa(out) * out
out += identity
out = self.relu(out)
return out
class CBAM_ResNet(nn.Module):
def __init__(self, block, blocks_num, num_classes=1000):
"""
: params: block=BasicBlock/Bottleneck
: params: blocks_num=每个layer中残差结构的个数
: params: num_classes=数据集的分类个数
"""
super(CBAM_ResNet, self).__init__()
self.in_channel = 64 # in_channel=每一个layer层第一个卷积层的输出channel/第一个卷积核的数量
self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.in_channel)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) # 池化默认向下取整
# 第1个layer的虚线残差结构只需要改变channel,长、宽不变 所以stride=1
self.layer1 = self._make_layer(block, blocks_num[0], channel=64, stride=1)
# 第2/3/4个layer的虚线残差结构不仅要改变channel还要将长、宽缩小为原来的一半 所以stride=2
self.layer2 = self._make_layer(block, blocks_num[1], channel=128, stride=2)
self.layer3 = self._make_layer(block, blocks_num[2], channel=256, stride=2)
self.layer4 = self._make_layer(block, blocks_num[3], channel=512, stride=2)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # AdaptiveAvgPool2d 自适应池化层 output_size=(1, 1)
self.fc = nn.Linear(512 * block.expansion, num_classes)
# 凯明初始化
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def _make_layer(self, block, block_num, channel, stride=1):
"""
: params: block=BasicBlock/Bottleneck 18/34用BasicBlock 50/101/152用Bottleneck
: params: block_num=当前layer中残差结构的个数
: params: channel=每个convx_x中第一个卷积核的数量 每一个layer的这个参数都是固定的
: params: stride=每个convx_x中第一层中3x3卷积层的stride=每个convx_x中downsample(res)的stride
resnet50/101/152 conv2_x=>s=1 conv3_x/conv4_x/conv5_x=>s=2
"""
downsample = None
# in_channel:每个convx_x中第一层的第一个卷积核的数量
# channel*block.expansion:每一个layer最后一个卷积核的数量
# res50/101/152的conv2/3/4/5_x的in_channel != channel * block.expansion永远成立,所以第一层必有downsample(虚线残差结构)
# 但是conv2_x的第一层只改变channel不改变w/h(s=1),而conv3_x/conv4_x/conv5_x的第一层不仅改变channel还改变w/h(s=2下采样)
if stride != 1 or self.in_channel != channel * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(channel * block.expansion)
)
layers = []
# 第一层(含虚线残差结构)加入layers
layers.append(block(self.in_channel, channel, stride=stride, downsample=downsample))
# 经过第一层后channel变了
self.in_channel = channel * block.expansion
# res50/101/152的conv2/3/4/5_x除了第一层有downsample(虚线残差结构),其他所有层都是实现残差结构(等差映射)
for _ in range(1, block_num):
layers.append(block(self.in_channel, channel)) # channel在Bottleneck变化:512->128->512
return nn.Sequential(*layers)
def forward(self, x):
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.maxpool(out)
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = self.avgpool(out)
out = torch.flatten(out, 1)
out = self.fc(out)
return out
def resnet18_cbam(**kwargs):
"""ResNet-18 + CBAM."""
model = CBAM_ResNet(CBAM_BasicBlock, [2, 2, 2, 2], **kwargs)
return model
def resnet34_cbam(**kwargs):
"""ResNet-34 + CBAM."""
model = CBAM_ResNet(CBAM_BasicBlock, [3, 4, 6, 3], **kwargs)
return model
def resnet50_cbam(**kwargs):
"""ResNet-50 + CBAM."""
model = CBAM_ResNet(CBAM_Bottleneck, [3, 4, 6, 3], **kwargs)
return model
def resnet101_cbam(**kwargs):
"""ResNet-101 + CBAM."""
model = CBAM_ResNet(CBAM_Bottleneck, [3, 4, 23, 3], **kwargs)
return model
def resnet152_cbam(**kwargs):
"""ResNet-152 + CBAM."""
model = CBAM_ResNet(CBAM_Bottleneck, [3, 8, 36, 3], **kwargs)
return model
if __name__ == '__main__':
# 权重测试
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
model = resnet50_cbam(num_classes=5)
print(model)
summary(model, (3, 224, 224)) # params:28,549,733 Total Size (MB): 397.07
3.2、CBAM + ResNet2
下面实现的是CBAM + ResNet2(在第一个Block之前和最后一个Block后各接一个CBAM):
import torch
import torch.nn as nn
from torchsummary import summary
# 这个model是在ResNet的Block开始和结尾两个地方加入CBAM
__all__ = ['resnet18_cbam', 'resnet34_cbam', 'resnet50_cbam', 'resnet101_cbam', 'resnet152_cbam']
def conv1x1(in_channel, out_channel, stride=1):
"""1x1 convolution"""
return nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride, bias=False)
def conv3x3(in_channel, out_channel, stride=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride, padding=1, bias=False)
class ChannelAttention(nn.Module):
def __init__(self, in_channel, ratio=16):
"""
: params: in_planes 输入模块的feature map的channel
: params: ratio 降维/升维因子
"""
super(ChannelAttention, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.max_pool = nn.AdaptiveMaxPool2d(1)
# fc = shared MLP
self.fc = nn.Sequential(nn.Conv2d(in_channel, in_channel // ratio, 1, bias=False),
nn.ReLU(),
nn.Conv2d(in_channel // ratio, in_channel, 1, bias=False))
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avg_out = self.fc(self.avg_pool(x))
max_out = self.fc(self.max_pool(x))
out = avg_out + max_out
return self.sigmoid(out)
class SpatialAttention(nn.Module):
def __init__(self, kernel_size=7):
"""对空间注意力来说,由于将每个通道中的特征都做同等处理,容易忽略通道间的信息交互"""
super(SpatialAttention, self).__init__()
# 这里要保持卷积后的feature尺度不变,必须要padding=kernel_size//2
self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=kernel_size // 2, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x): # 输入x = [b, c, 56, 56]
avg_out = torch.mean(x, dim=1, keepdim=True) # avg_out = [b, 1, 56, 56] 求x的每个像素在所有channel相同位置上的平均值
max_out, _ = torch.max(x, dim=1, keepdim=True) # max_out = [b, 1, 56, 56] 求x的每个像素在所有channel相同位置上的最大值
x = torch.cat([avg_out, max_out], dim=1) # x = [b, 2, 56, 56] concat操作
x = self.sigmoid(self.conv1(x)) # x = [b, 1, 56, 56] 卷积操作,融合avg和max的信息,全方面考虑
return x
class CBAM_BasicBlock(nn.Module):
# resnet18 + resnet34(resdual1) 实线残差结构+虚线残差结构
expansion = 1 # 残差结构中主分支的卷积核个数是否发生变化(倍数) 第二个卷积核输出是否发生变化
def __init__(self, in_channel, out_channel, stride=1, downsample=None):
"""
: params: in_channel=第一个conv的输入channel
: params: out_channel=第一个conv的输出channel
: params: stride=中间conv的stride
: params: downsample=None:实线残差结构/Not None:虚线残差结构
"""
super(CBAM_BasicBlock, self).__init__()
self.conv1 = conv3x3(in_channel=in_channel, out_channel=out_channel, stride=stride)
self.bn1 = nn.BatchNorm2d(out_channel)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(in_channel=out_channel, out_channel=out_channel)
self.bn2 = nn.BatchNorm2d(out_channel)
self.downsample = downsample
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out += identity
out = self.relu(out)
return out
class CBAM_Bottleneck(nn.Module):
# resnet50+resnet101+resnet152(resdual2) 实线残差结构+虚线残差结构
expansion = 4 # 残差结构中主分支的卷积核个数是否发生变化(倍数) 第三个卷积核输出是否发生变化
def __init__(self, in_channel, out_channel, stride=1, downsample=None):
"""
: params: in_channel=第一个conv的输入channel
: params: out_channel=第一个conv的输出channel
: params: stride=中间conv的stride
resnet50/101/152:conv2_x的所有层s=1 conv3_x/conv4_x/conv5_x的第一层s=2,其他层s=1
: params: downsample=None:实线残差结构/Not None:虚线残差结构
"""
super(CBAM_Bottleneck, self).__init__()
# 1x1卷积一般s=1 p=0 => w、h不变 卷积默认向下取整
self.conv1 = conv1x1(in_channel=in_channel, out_channel=out_channel, stride=1)
self.bn1 = nn.BatchNorm2d(out_channel)
# ----------------------------------------------------------------------------------
# 3x3卷积一般s=2 p=1 => w、h /2(下采样) 3x3卷积一般s=1 p=1 => w、h不变
self.conv2 = conv3x3(in_channel=out_channel, out_channel=out_channel, stride=stride)
self.bn2 = nn.BatchNorm2d(out_channel)
# ---------------------------------------------------------------------------------
self.conv3 = conv1x1(in_channel=out_channel, out_channel=out_channel * self.expansion, stride=1)
self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)
# ----------------------------------------------------------------------------------
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
out += identity
out = self.relu(out)
return out
class CBAM_ResNet(nn.Module):
def __init__(self, block, blocks_num, num_classes=1000):
"""
: params: block=BasicBlock/Bottleneck
: params: blocks_num=每个layer中残差结构的个数
: params: num_classes=数据集的分类个数
"""
super(CBAM_ResNet, self).__init__()
self.in_channel = 64 # in_channel=每一个layer层第一个卷积层的输出channel/第一个卷积核的数量
self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.in_channel)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) # 池化默认向下取整
# 在Block之前加入CBAM
self.ca1 = ChannelAttention(self.in_channel)
self.sa1 = SpatialAttention()
# 第1个layer的虚线残差结构只需要改变channel,长、宽不变 所以stride=1
self.layer1 = self._make_layer(block, blocks_num[0], channel=64, stride=1)
# 第2/3/4个layer的虚线残差结构不仅要改变channel还要将长、宽缩小为原来的一半 所以stride=2
self.layer2 = self._make_layer(block, blocks_num[1], channel=128, stride=2)
self.layer3 = self._make_layer(block, blocks_num[2], channel=256, stride=2)
self.layer4 = self._make_layer(block, blocks_num[3], channel=512, stride=2)
# 在Block之后加入CBAM
self.ca2 = ChannelAttention(2048) # 最后一个Block后输出[2048, 7, 7]
self.sa2 = SpatialAttention()
self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # AdaptiveAvgPool2d 自适应池化层 output_size=(1, 1)
self.fc = nn.Linear(512 * block.expansion, num_classes)
# 凯明初始化
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def _make_layer(self, block, block_num, channel, stride=1):
"""
: params: block=BasicBlock/Bottleneck 18/34用BasicBlock 50/101/152用Bottleneck
: params: block_num=当前layer中残差结构的个数
: params: channel=每个convx_x中第一个卷积核的数量 每一个layer的这个参数都是固定的
: params: stride=每个convx_x中第一层中3x3卷积层的stride=每个convx_x中downsample(res)的stride
resnet50/101/152 conv2_x=>s=1 conv3_x/conv4_x/conv5_x=>s=2
"""
downsample = None
# in_channel:每个convx_x中第一层的第一个卷积核的数量
# channel*block.expansion:每一个layer最后一个卷积核的数量
# res50/101/152的conv2/3/4/5_x的in_channel != channel * block.expansion永远成立,所以第一层必有downsample(虚线残差结构)
# 但是conv2_x的第一层只改变channel不改变w/h(s=1),而conv3_x/conv4_x/conv5_x的第一层不仅改变channel还改变w/h(s=2下采样)
if stride != 1 or self.in_channel != channel * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(channel * block.expansion)
)
layers = []
# 第一层(含虚线残差结构)加入layers
layers.append(block(self.in_channel, channel, stride=stride, downsample=downsample))
# 经过第一层后channel变了
self.in_channel = channel * block.expansion
# res50/101/152的conv2/3/4/5_x除了第一层有downsample(虚线残差结构),其他所有层都是实现残差结构(等差映射)
for _ in range(1, block_num):
layers.append(block(self.in_channel, channel)) # channel在Bottleneck变化:512->128->512
return nn.Sequential(*layers)
def forward(self, x):
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.maxpool(out)
# 在Block之前加入CBAM
out = self.ca1(out) * out
out = self.sa1(out) * out
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
# 在Block之后加入CBAM
out = self.ca2(out) * out
out = self.sa2(out) * out
out = self.avgpool(out)
out = torch.flatten(out, 1)
out = self.fc(out)
return out
def resnet18_cbam(**kwargs):
"""ResNet-18 + CBAM."""
model = CBAM_ResNet(CBAM_BasicBlock, [2, 2, 2, 2], **kwargs)
return model
def resnet34_cbam(**kwargs):
"""ResNet-34 + CBAM."""
model = CBAM_ResNet(CBAM_BasicBlock, [3, 4, 6, 3], **kwargs)
return model
def resnet50_cbam(**kwargs):
"""ResNet-50 + CBAM."""
model = CBAM_ResNet(CBAM_Bottleneck, [3, 4, 6, 3], **kwargs)
return model
def resnet101_cbam(**kwargs):
"""ResNet-101 + CBAM."""
model = CBAM_ResNet(CBAM_Bottleneck, [3, 4, 23, 3], **kwargs)
return model
def resnet152_cbam(**kwargs):
"""ResNet-152 + CBAM."""
model = CBAM_ResNet(CBAM_Bottleneck, [3, 8, 36, 3], **kwargs)
return model
if __name__ == '__main__':
# 权重测试
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
model = resnet50_cbam(num_classes=5)
print(model)
summary(model, (3, 224, 224)) # params:24,568,073 Total Size (MB): 381.02
四、实验结果
结果显示cbam_resnet1的结果是优于resnet的,但是这里的cmba_resnet2效果却更差。虽然理论上cbam是可行的,但是实际的效果还是要根据实际的数据集进行分析才可以得到。
Reference
https://zhuanlan.zhihu.com/p/106084464
https://zhuanlan.zhihu.com/p/98958111
https://blog.csdn.net/u013738531/article/details/82731257
https://blog.csdn.net/Roaddd/article/details/114646354
https://www.cnblogs.com/ansang/p/9371764.html