YOLO改进：添加多尺度特征提取模块Scale-Aware RFE Model

jhunmk29

已于 2023-10-11 13:52:03 修改

阅读量4.4k

点赞数 4

分类专栏： YOLO 文章标签： YOLO pytorch deep learning

于 2023-10-08 11:16:54 首次发布

本文链接：https://blog.csdn.net/qq_36070656/article/details/133670794

版权

YOLO 专栏收录该内容

8 篇文章

订阅专栏

本文介绍在YOLOv5中添加多尺度特征提取模块Scale - Aware RFE Model。该模块来自YOLO - FaceV2，使用空洞卷积提取多尺度信息，有减少参数量、防止梯度爆炸等特点。文中给出RFEM代码及在YOLOv5中的使用方法，最后在VisDrone数据集测试，添加模块后mAP未提升。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

YOLOv5中添加多尺度特征提取模块Scale-Aware RFE Model

Scale-Aware RFE Model

该模块来自于论文：YOLO-FaceV2

RFEM结构
RFEM的结构很简单，该结构使用了三个不同空洞率（1、2、3）的空洞卷积提取特征以提取多尺度信息；此外，不同分支共享权重减少参数量；另外，使用残差连接防止梯度爆炸的问题；最后，将四个分支的特征相加得到输出的特征层。

RFEM代码

YOLO-Facev2
在yolov5s_v2_RFEM_MultiSEAM.yaml中可以找到C3RFEM模块，在YOLO-Facev2中，输入特征将经过两个分支，左侧分支由一个1*1的卷积组成，该卷积用于调整输入通道，将输入特征的通道数调整为输出通道数的一半。右侧分支由一个1*1的卷积和RFEM模块组成，1*1的卷积的作用也是将原有输入特征的通道数调整为输出通道数的一半，RFEM模块的输入通道和输出通道一致。两个分支的进行通道拼接torch.cat，最后使用一个1*1的卷积调整通道。

class C3RFEM(nn.Module):
    def __init__(self, c1, c2, n=1, shortcut=True, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)  # act=FReLU(c2)
        # self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))
        # self.rfem = RFEM(c_, c_, n)
        self.m = nn.Sequential(*[RFEM(c_, c_, n=1, e=e) for _ in range(n)])
        # self.m = nn.Sequential(*[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)])

    def forward(self, x):
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))

接下来看一下RFEM的代码，结构大部分由TridentBlock函数封装好了

class RFEM(nn.Module):
    def __init__(self, c1, c2, n=1, e=0.5, stride=1):
        super(RFEM, self).__init__()
        c = True
        layers = []
        layers.append(TridentBlock(c1, c2, stride=stride, c=c, e=e))
        c1 = c2
        for i in range(1, n):
            layers.append(TridentBlock(c1, c2))
        self.layer = nn.Sequential(*layers)
        # self.cv = Conv(c2, c2)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.SiLU()

    def forward(self, x):
        out = self.layer(x)
        out = out[0] + out[1] + out[2] + x
        out = self.act(self.bn(out))
        return out

继续看一下TridentBlock，YOLO-Facev2中说RFEM结构受到Trident Networks的启发，Trident Networks，可以看出RFEM确实和TridentBlock十分相似。

trident block的代码也非常清晰，不做过多解释。

class TridentBlock(nn.Module):
    def __init__(self, c1, c2, stride=1, c=False, e=0.5, padding=[1, 2, 3], dilate=[1, 2, 3], bias=False):
        super(TridentBlock, self).__init__()
        self.stride = stride
        self.c = c
        c_ = int(c2 * e)
        self.padding = padding
        self.dilate = dilate
        self.share_weightconv1 = nn.Parameter(torch.Tensor(c_, c1, 1, 1))
        self.share_weightconv2 = nn.Parameter(torch.Tensor(c2, c_, 3, 3))

        self.bn1 = nn.BatchNorm2d(c_)
        self.bn2 = nn.BatchNorm2d(c2)

        self.act = nn.SiLU()

        nn.init.kaiming_uniform_(self.share_weightconv1, nonlinearity="relu")
        nn.init.kaiming_uniform_(self.share_weightconv2, nonlinearity="relu")

        if bias:
            self.bias = nn.Parameter(torch.Tensor(c2))
        else:
            self.bias = None

        if self.bias is not None:
            nn.init.constant_(self.bias, 0)

    def forward_for_small(self, x):
        residual = x
        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
        out = self.bn1(out)
        out = self.act(out)

        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[0],
                                   dilation=self.dilate[0])
        out = self.bn2(out)
        out += residual
        out = self.act(out)

        return out

    def forward_for_middle(self, x):
        residual = x
        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
        out = self.bn1(out)
        out = self.act(out)

        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[1],
                                   dilation=self.dilate[1])
        out = self.bn2(out)
        out += residual
        out = self.act(out)

        return out

    def forward_for_big(self, x):
        residual = x
        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
        out = self.bn1(out)
        out = self.act(out)

        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[2],
                                   dilation=self.dilate[2])
        out = self.bn2(out)
        out += residual
        out = self.act(out)

        return out

    def forward(self, x):
        xm = x
        base_feat = []
        if self.c is not False:
            x1 = self.forward_for_small(x)
            x2 = self.forward_for_middle(x)
            x3 = self.forward_for_big(x)
        else:
            x1 = self.forward_for_small(xm[0])
            x2 = self.forward_for_middle(xm[1])
            x3 = self.forward_for_big(xm[2])

        base_feat.append(x1)
        base_feat.append(x2)
        base_feat.append(x3)

        return base_feat

RFEM在YOLOv5中的使用

在YOLOv5代码的common.py中添加class TridentBlock、class RFEM和class C3RFEM即可
在yolo.py中的parse_model添加C3RFEM

if m in {
        Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,
        BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x, C3RFEM}:
    c1, c2 = ch[f], args[0]
    if c2 != no:  # if not output
        c2 = make_divisible(c2 * gw, 8)

    args = [c1, c2, *args[1:]]
    if m in {BottleneckCSP, C3, C3TR, C3Ghost, C3x, C3RFEM}:
        args.insert(2, n)  # number of repeats
        n = 1

在yolov5s.yaml中添加

backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
   [-1, 1, C3RFEM, [1024]] # 10
  ]

运行yolo.py输出

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Conv                      [3, 32, 6, 2, 2]              
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  2    115712  models.common.C3                        [128, 128, 2]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  3    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1   1182720  models.common.C3                        [512, 512, 1]                 
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]                 
 10                -1  1    855296  models.common.C3RFEM                    [512, 512, 1]                 
 11                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 12                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 13           [-1, 6]  1         0  models.common.Concat                    [1]                           
 14                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 15                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 16                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 17           [-1, 4]  1         0  models.common.Concat                    [1]                           
 18                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 19                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 20          [-1, 15]  1         0  models.common.Concat                    [1]                           
 21                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 22                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 23          [-1, 11]  1         0  models.common.Concat                    [1]                           
 24                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 25      [18, 21, 24]  1    229245  Detect                                  [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5sRFEM summary: 233 layers, 8090685 parameters, 8090685 gradients, 17.1 GFLOPs

YOLOv5s的参数量

YOLOv5s summary: 214 layers, 7235389 parameters, 7235389 gradients, 16.6 GFLOPs

可以看出，参数量的提高比较小。

实验

理论上RFEM模块的引入可以提高多尺度特征的提取，有助于检测精度的提高。因此，在VisDrone数据集进行测试。

训练指令

nohup python train.py --data VisDrone.yaml --cfg yolov5sRFEM.yaml --weights yolov5s.pt --epochs 300 --device 0 > yolov5sRFEM.out &

测试结果
实验中使用预训练模型。
在VisDrone数据集中，YOLOv5s的mAP是32.9%，添加RFEM后，测试得到仍为是32.9%。也许在其他数据集上该模块会有较好的反馈。
YOLOvs+RFEM