YOLOv5中添加多尺度特征提取模块Scale-Aware RFE Model
Scale-Aware RFE Model
该模块来自于论文:YOLO-FaceV2
RFEM的结构很简单,该结构使用了三个不同空洞率(1、2、3)的空洞卷积提取特征以提取多尺度信息;此外,不同分支共享权重减少参数量;另外,使用残差连接防止梯度爆炸的问题;最后,将四个分支的特征相加得到输出的特征层。
RFEM代码
YOLO-Facev2
在yolov5s_v2_RFEM_MultiSEAM.yaml
中可以找到C3RFEM
模块,在YOLO-Facev2中,输入特征将经过两个分支,左侧分支由一个1*1
的卷积组成,该卷积用于调整输入通道,将输入特征的通道数调整为输出通道数的一半。右侧分支由一个1*1
的卷积和RFEM
模块组成,1*1
的卷积的作用也是将原有输入特征的通道数调整为输出通道数的一半,RFEM
模块的输入通道和输出通道一致。两个分支的进行通道拼接torch.cat
,最后使用一个1*1
的卷积调整通道。
class C3RFEM(nn.Module):
def __init__(self, c1, c2, n=1, shortcut=True, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion
super().__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c1, c_, 1, 1)
self.cv3 = Conv(2 * c_, c2, 1) # act=FReLU(c2)
# self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))
# self.rfem = RFEM(c_, c_, n)
self.m = nn.Sequential(*[RFEM(c_, c_, n=1, e=e) for _ in range(n)])
# self.m = nn.Sequential(*[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)])
def forward(self, x):
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
接下来看一下RFEM
的代码,结构大部分由TridentBlock
函数封装好了
class RFEM(nn.Module):
def __init__(self, c1, c2, n=1, e=0.5, stride=1):
super(RFEM, self).__init__()
c = True
layers = []
layers.append(TridentBlock(c1, c2, stride=stride, c=c, e=e))
c1 = c2
for i in range(1, n):
layers.append(TridentBlock(c1, c2))
self.layer = nn.Sequential(*layers)
# self.cv = Conv(c2, c2)
self.bn = nn.BatchNorm2d(c2)
self.act = nn.SiLU()
def forward(self, x):
out = self.layer(x)
out = out[0] + out[1] + out[2] + x
out = self.act(self.bn(out))
return out
继续看一下TridentBlock
,YOLO-Facev2中说RFEM
结构受到Trident Networks
的启发,Trident Networks,可以看出RFEM
确实和TridentBlock
十分相似。
trident block
的代码也非常清晰,不做过多解释。
class TridentBlock(nn.Module):
def __init__(self, c1, c2, stride=1, c=False, e=0.5, padding=[1, 2, 3], dilate=[1, 2, 3], bias=False):
super(TridentBlock, self).__init__()
self.stride = stride
self.c = c
c_ = int(c2 * e)
self.padding = padding
self.dilate = dilate
self.share_weightconv1 = nn.Parameter(torch.Tensor(c_, c1, 1, 1))
self.share_weightconv2 = nn.Parameter(torch.Tensor(c2, c_, 3, 3))
self.bn1 = nn.BatchNorm2d(c_)
self.bn2 = nn.BatchNorm2d(c2)
self.act = nn.SiLU()
nn.init.kaiming_uniform_(self.share_weightconv1, nonlinearity="relu")
nn.init.kaiming_uniform_(self.share_weightconv2, nonlinearity="relu")
if bias:
self.bias = nn.Parameter(torch.Tensor(c2))
else:
self.bias = None
if self.bias is not None:
nn.init.constant_(self.bias, 0)
def forward_for_small(self, x):
residual = x
out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
out = self.bn1(out)
out = self.act(out)
out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[0],
dilation=self.dilate[0])
out = self.bn2(out)
out += residual
out = self.act(out)
return out
def forward_for_middle(self, x):
residual = x
out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
out = self.bn1(out)
out = self.act(out)
out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[1],
dilation=self.dilate[1])
out = self.bn2(out)
out += residual
out = self.act(out)
return out
def forward_for_big(self, x):
residual = x
out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
out = self.bn1(out)
out = self.act(out)
out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[2],
dilation=self.dilate[2])
out = self.bn2(out)
out += residual
out = self.act(out)
return out
def forward(self, x):
xm = x
base_feat = []
if self.c is not False:
x1 = self.forward_for_small(x)
x2 = self.forward_for_middle(x)
x3 = self.forward_for_big(x)
else:
x1 = self.forward_for_small(xm[0])
x2 = self.forward_for_middle(xm[1])
x3 = self.forward_for_big(xm[2])
base_feat.append(x1)
base_feat.append(x2)
base_feat.append(x3)
return base_feat
RFEM在YOLOv5中的使用
- 在YOLOv5代码的
common.py
中添加class TridentBlock
、class RFEM
和class C3RFEM
即可 - 在
yolo.py
中的parse_model
添加C3RFEM
if m in {
Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,
BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x, C3RFEM}:
c1, c2 = ch[f], args[0]
if c2 != no: # if not output
c2 = make_divisible(c2 * gw, 8)
args = [c1, c2, *args[1:]]
if m in {BottleneckCSP, C3, C3TR, C3Ghost, C3x, C3RFEM}:
args.insert(2, n) # number of repeats
n = 1
- 在
yolov5s.yaml
中添加
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 6, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 3, C3, [1024]],
[-1, 1, SPPF, [1024, 5]], # 9
[-1, 1, C3RFEM, [1024]] # 10
]
运行yolo.py
输出
from n params module arguments
0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 2 115712 models.common.C3 [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 1182720 models.common.C3 [512, 512, 1]
9 -1 1 656896 models.common.SPPF [512, 512, 5]
10 -1 1 855296 models.common.C3RFEM [512, 512, 1]
11 -1 1 131584 models.common.Conv [512, 256, 1, 1]
12 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
13 [-1, 6] 1 0 models.common.Concat [1]
14 -1 1 361984 models.common.C3 [512, 256, 1, False]
15 -1 1 33024 models.common.Conv [256, 128, 1, 1]
16 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
17 [-1, 4] 1 0 models.common.Concat [1]
18 -1 1 90880 models.common.C3 [256, 128, 1, False]
19 -1 1 147712 models.common.Conv [128, 128, 3, 2]
20 [-1, 15] 1 0 models.common.Concat [1]
21 -1 1 296448 models.common.C3 [256, 256, 1, False]
22 -1 1 590336 models.common.Conv [256, 256, 3, 2]
23 [-1, 11] 1 0 models.common.Concat [1]
24 -1 1 1182720 models.common.C3 [512, 512, 1, False]
25 [18, 21, 24] 1 229245 Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5sRFEM summary: 233 layers, 8090685 parameters, 8090685 gradients, 17.1 GFLOPs
YOLOv5s的参数量
YOLOv5s summary: 214 layers, 7235389 parameters, 7235389 gradients, 16.6 GFLOPs
可以看出,参数量的提高比较小。
实验
理论上RFEM模块的引入可以提高多尺度特征的提取,有助于检测精度的提高。因此,在VisDrone数据集进行测试。
训练指令
nohup python train.py --data VisDrone.yaml --cfg yolov5sRFEM.yaml --weights yolov5s.pt --epochs 300 --device 0 > yolov5sRFEM.out &
测试结果
实验中使用预训练模型。
在VisDrone数据集中,YOLOv5s的mAP是32.9%,添加RFEM后,测试得到仍为是32.9%。也许在其他数据集上该模块会有较好的反馈。