首先先下载YOLOv7的源码:
GitHub - bubbliiiing/yolov7-pytorch: 这是一个yolov7的库,可以用于训练自己的数据集。
这里建议大家使用B导的代码,注释非常全!
首先是CA注意力机制的源代码,如下所示:
import datetime
import os
import numpy as np
import torch
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.nn as nn
import torch.optim as optim
class CA_Block(nn.Module):
def __init__(self, channel, reduction=16):
super(CA_Block, self).__init__()
self.conv_1x1 = nn.Conv2d(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1,
bias=False)
self.relu = nn.ReLU()
self.bn = nn.BatchNorm2d(channel // reduction)
self.F_h = nn.Conv2d(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1,
bias=False)
self.F_w = nn.Conv2d(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1,
bias=False)
self.sigmoid_h = nn.Sigmoid()
self.sigmoid_w = nn.Sigmoid()
def forward(self, x):
_, _, h, w = x.size()
x_h = torch.mean(x, dim=3, keepdim=True).permute(0, 1, 3, 2)
x_w = torch.mean(x, dim=2, keepdim=True)
x_cat_conv_relu = self.relu(self.bn(self.conv_1x1(torch.cat((x_h, x_w), 3))))
x_cat_conv_split_h, x_cat_conv_split_w = x_cat_conv_relu.split([h, w], 3)
s_h = self.sigmoid_h(self.F_h(x_cat_conv_split_h.permute(0, 1, 3, 2)))
s_w = self.sigmoid_w(self.F_w(x_cat_conv_split_w))
out = x * s_h.expand_as(x) * s_w.expand_as(x)
return out
这里的channel指的是输入通道数量,这里一定要注意和上一层网络的输出对应!
然后我们找到nets里的yolo.py文件,并找到yolobody
在这下面我们定义两个通道数分别为512和1024的注意力机制:
self.CA_Blocks_512 = CA_Block(512)
self.CA_Blocks_1024 = CA_Block(1024)
ps:这里是因为在yolov7的backbone中,输出的特征图为三个,这三个特征图的通道分别为:512,512,1024,所以我们只需要定义两个模块即可。
之后在forward前向计算中,将模块加入:
def forward(self, x):
# backbone
feat1, feat2, feat3 = self.backbone.forward(x)
feat1 = self.CA_Blocks_512(feat1) # 我添加的CA注意力模块
feat2 = self.CA_Blocks_1024(feat2)# 我添加的CA注意力模块
#------------------------加强特征提取网络------------------------#
# 20, 20, 1024 => 20, 20, 512
P5 = self.sppcspc(feat3)
P5 = self.CA_Blocks_512(P5)# 我添加的CA注意力模块
# 20, 20, 512 => 20, 20, 256
P5_conv = self.conv_for_P5(P5)
# 20, 20, 256 => 40, 40, 256
P5_upsample = self.upsample(P5_conv)
# 40, 40, 256 cat 40, 40, 256 => 40, 40, 512
P4 = torch.cat([self.conv_for_feat2(feat2), P5_upsample], 1)
# 40, 40, 512 => 40, 40, 256
P4 = self.conv3_for_upsample1(P4)
(后面的没有粘上,读者自行对比),这样就可以训练了。
结果如下:
训练次数2000轮,加入CA注意力机制后的mAP增长了3%,效果还算不错。
当然如果读者感兴趣可以将CA添加在backbone或者其他地方试试看,也许会有不同的效果。