YOLO改进:添加多尺度特征提取模块Scale-Aware RFE Model

YOLOv5中添加多尺度特征提取模块Scale-Aware RFE Model

Scale-Aware RFE Model

该模块来自于论文:YOLO-FaceV2

RFEM结构
RFEM的结构很简单,该结构使用了三个不同空洞率(1、2、3)的空洞卷积提取特征以提取多尺度信息;此外,不同分支共享权重减少参数量;另外,使用残差连接防止梯度爆炸的问题;最后,将四个分支的特征相加得到输出的特征层。

RFEM代码

YOLO-Facev2
yolov5s_v2_RFEM_MultiSEAM.yaml中可以找到C3RFEM模块,在YOLO-Facev2中,输入特征将经过两个分支,左侧分支由一个1*1的卷积组成,该卷积用于调整输入通道,将输入特征的通道数调整为输出通道数的一半。右侧分支由一个1*1的卷积和RFEM模块组成,1*1的卷积的作用也是将原有输入特征的通道数调整为输出通道数的一半,RFEM模块的输入通道和输出通道一致。两个分支的进行通道拼接torch.cat,最后使用一个1*1的卷积调整通道。

class C3RFEM(nn.Module):
    def __init__(self, c1, c2, n=1, shortcut=True, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)  # act=FReLU(c2)
        # self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))
        # self.rfem = RFEM(c_, c_, n)
        self.m = nn.Sequential(*[RFEM(c_, c_, n=1, e=e) for _ in range(n)])
        # self.m = nn.Sequential(*[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)])

    def forward(self, x):
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))

接下来看一下RFEM的代码,结构大部分由TridentBlock函数封装好了

class RFEM(nn.Module):
    def __init__(self, c1, c2, n=1, e=0.5, stride=1):
        super(RFEM, self).__init__()
        c = True
        layers = []
        layers.append(TridentBlock(c1, c2, stride=stride, c=c, e=e))
        c1 = c2
        for i in range(1, n):
            layers.append(TridentBlock(c1, c2))
        self.layer = nn.Sequential(*layers)
        # self.cv = Conv(c2, c2)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.SiLU()

    def forward(self, x):
        out = self.layer(x)
        out = out[0] + out[1] + out[2] + x
        out = self.act(self.bn(out))
        return out

继续看一下TridentBlock,YOLO-Facev2中说RFEM结构受到Trident Networks的启发,Trident Networks,可以看出RFEM确实和TridentBlock十分相似。
trident block
trident block的代码也非常清晰,不做过多解释。

class TridentBlock(nn.Module):
    def __init__(self, c1, c2, stride=1, c=False, e=0.5, padding=[1, 2, 3], dilate=[1, 2, 3], bias=False):
        super(TridentBlock, self).__init__()
        self.stride = stride
        self.c = c
        c_ = int(c2 * e)
        self.padding = padding
        self.dilate = dilate
        self.share_weightconv1 = nn.Parameter(torch.Tensor(c_, c1, 1, 1))
        self.share_weightconv2 = nn.Parameter(torch.Tensor(c2, c_, 3, 3))

        self.bn1 = nn.BatchNorm2d(c_)
        self.bn2 = nn.BatchNorm2d(c2)

        self.act = nn.SiLU()

        nn.init.kaiming_uniform_(self.share_weightconv1, nonlinearity="relu")
        nn.init.kaiming_uniform_(self.share_weightconv2, nonlinearity="relu")

        if bias:
            self.bias = nn.Parameter(torch.Tensor(c2))
        else:
            self.bias = None

        if self.bias is not None:
            nn.init.constant_(self.bias, 0)

    def forward_for_small(self, x):
        residual = x
        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
        out = self.bn1(out)
        out = self.act(out)

        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[0],
                                   dilation=self.dilate[0])
        out = self.bn2(out)
        out += residual
        out = self.act(out)

        return out

    def forward_for_middle(self, x):
        residual = x
        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
        out = self.bn1(out)
        out = self.act(out)

        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[1],
                                   dilation=self.dilate[1])
        out = self.bn2(out)
        out += residual
        out = self.act(out)

        return out

    def forward_for_big(self, x):
        residual = x
        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
        out = self.bn1(out)
        out = self.act(out)

        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[2],
                                   dilation=self.dilate[2])
        out = self.bn2(out)
        out += residual
        out = self.act(out)

        return out

    def forward(self, x):
        xm = x
        base_feat = []
        if self.c is not False:
            x1 = self.forward_for_small(x)
            x2 = self.forward_for_middle(x)
            x3 = self.forward_for_big(x)
        else:
            x1 = self.forward_for_small(xm[0])
            x2 = self.forward_for_middle(xm[1])
            x3 = self.forward_for_big(xm[2])

        base_feat.append(x1)
        base_feat.append(x2)
        base_feat.append(x3)

        return base_feat

RFEM在YOLOv5中的使用

  • 在YOLOv5代码的common.py中添加class TridentBlockclass RFEMclass C3RFEM即可
  • yolo.py中的parse_model添加C3RFEM
if m in {
        Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,
        BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x, C3RFEM}:
    c1, c2 = ch[f], args[0]
    if c2 != no:  # if not output
        c2 = make_divisible(c2 * gw, 8)

    args = [c1, c2, *args[1:]]
    if m in {BottleneckCSP, C3, C3TR, C3Ghost, C3x, C3RFEM}:
        args.insert(2, n)  # number of repeats
        n = 1
  • yolov5s.yaml中添加
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
   [-1, 1, C3RFEM, [1024]] # 10
  ]

运行yolo.py输出

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Conv                      [3, 32, 6, 2, 2]              
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  2    115712  models.common.C3                        [128, 128, 2]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  3    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1   1182720  models.common.C3                        [512, 512, 1]                 
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]                 
 10                -1  1    855296  models.common.C3RFEM                    [512, 512, 1]                 
 11                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 12                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 13           [-1, 6]  1         0  models.common.Concat                    [1]                           
 14                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 15                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 16                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 17           [-1, 4]  1         0  models.common.Concat                    [1]                           
 18                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 19                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 20          [-1, 15]  1         0  models.common.Concat                    [1]                           
 21                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 22                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 23          [-1, 11]  1         0  models.common.Concat                    [1]                           
 24                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 25      [18, 21, 24]  1    229245  Detect                                  [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5sRFEM summary: 233 layers, 8090685 parameters, 8090685 gradients, 17.1 GFLOPs

YOLOv5s的参数量

YOLOv5s summary: 214 layers, 7235389 parameters, 7235389 gradients, 16.6 GFLOPs

可以看出,参数量的提高比较小。

实验

理论上RFEM模块的引入可以提高多尺度特征的提取,有助于检测精度的提高。因此,在VisDrone数据集进行测试。

训练指令

nohup python train.py --data VisDrone.yaml --cfg yolov5sRFEM.yaml --weights yolov5s.pt --epochs 300 --device 0 > yolov5sRFEM.out &

测试结果
实验中使用预训练模型。
在VisDrone数据集中,YOLOv5s的mAP是32.9%,添加RFEM后,测试得到仍为是32.9%。也许在其他数据集上该模块会有较好的反馈。
YOLOvs+RFEM

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值