YOLOv5改进网络:CBAM+小目标检测头

文章探讨了在YOLOv5对象检测模型中引入CBAM模块和小目标检测头对模型性能的影响。实验结果显示,添加CBAM模块提升了精确度和召回率,但MAP值下降。通过在backbone中插入C3CBAM和增加特定的小目标检测头,针对TinyPerson数据集的训练显示了显著的提升效果。代码示例展示了如何在模型中实现这些改动,并提供了配置文件用于训练不同版本的模型。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 

 

 

实验结果表明:添加CBAM模块后精确度P和召回率R略有提升,但map值却有所降低。 

 红色框是新增小目标检测头的结构,融合了高层的语义信息,同时又有低层的位置信息。

三个模型:yolov5s    添加注意力机制的yolov5s_CBAM       添加检测头的yolov5s_add 

将三个模型通过对TinyPerson数据集训练,添加检测头的模型提升效果显著,添加注意力机制的模型略有提升。

代码示例:

1、在common.py中写入CBAM相关函数(下面代码),外部调用时调用C3CBAM类

# CBAM
class ChannelAttention(nn.Module):
    def __init__(self, in_planes, ratio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)
        self.f1 = nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False)
        self.relu = nn.ReLU()
        self.f2 = nn.Conv2d(in_planes // ratio, in_planes, 1, bias=False)
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        avg_out = self.f2(self.relu(self.f1(self.avg_pool(x))))
        max_out = self.f2(self.relu(self.f1(self.max_pool(x))))
        out = self.sigmoid(avg_out + max_out)
        return out
class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
        padding = 3 if kernel_size == 7 else 1
        # (特征图的大小-算子的size+2*padding)/步长+1
        self.conv = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        # 1*h*w
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        #2*h*w
        x = self.conv(x)
        #1*h*w
        return self.sigmoid(x)
class CBAM(nn.Module):
    # CSP Bottleneck with 3 convolutions
    def __init__(self, c1, c2, ratio=16, kernel_size=7):  # ch_in, ch_out, number, shortcut, groups, expansion
        super(CBAM, self).__init__()
        self.channel_attention = ChannelAttention(c1, ratio)
        self.spatial_attention = SpatialAttention(kernel_size)
    def forward(self, x):
        out = self.channel_attention(x) * x
        # c*h*w
        # c*h*w * 1*h*w
        out = self.spatial_attention(out) * out
        return out

class CBAMBottleneck(nn.Module):
    # Standard bottleneck
    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5,ratio=16,kernel_size=7):  # ch_in, ch_out, shortcut, groups, expansion
        super(CBAMBottleneck,self).__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_, c2, 3, 1, g=g)
        self.add = shortcut and c1 == c2
        self.channel_attention = ChannelAttention(c2, ratio)
        self.spatial_attention = SpatialAttention(kernel_size)
        #self.cbam=CBAM(c1,c2,ratio,kernel_size)
    def forward(self, x):
        x1=self.cv2(self.cv1(x))
        out = self.channel_attention(x1) * x1
        # print('outchannels:{}'.format(out.shape))
        out = self.spatial_attention(out) * out
        return x + out if self.add else out



class C3CBAM(C3):
    # C3 module with CBAMBottleneck()
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
        super().__init__(c1, c2, n, shortcut, g, e)
        c_ = int(c2 * e)  # hidden channels
        self.m = nn.Sequential(*(CBAMBottleneck(c_, c_,shortcut) for _ in range(n)))

2、yolo.py文件增加判断条件:

3-1、新建配置文件yolov5m_C3CBAM.yaml(主干网络backbone中插入C3CBAM),原文件为yolov5m.yaml     

注释:nc为目标类别数量,我的任务目标有两个类,所以应该为2。下图的80是原代码的类别数量

3-2、新建配置文件yolov5m_addlayer.yaml(主干网络backbone不变,head部分增加小目标检测头,增加小目标预设锚框anchors),原文件为yolov5m.yaml

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 2  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [4,5, 8,10, 22,18] # P2/4  增加P2特征层
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3

   # add feature extration layer  ########
   [-1, 3, C3, [256, False]],  # 17
   [-1, 1, Conv, [128, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 2], 1, Concat, [1]],  # cat backbone P3

   # add detect layer
   [-1, 3, C3, [128, False]],  # 21 (P4/4-minium)

   [-1, 1, Conv, [128, 3, 2]],
   [[-1, 18], 1, Concat, [1]],  # cat head P3
   # end

   [-1, 3, C3, [256, False]],  # 24 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 27 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 30 (P5/32-large)

   [[21, 24, 27, 30], 1, Detect, [nc, anchors]],  # Detect(P2, P3, P4, P5)  输出的层,按上面的层数从上往下数(一共31层(不含输入层和输出))
  ]

3-3、新建配置文件yolov5m_addlayer_C3CBAM.yaml(主干网络backbone中插入C3CBAM,head部分增加小目标检测头,增加小目标预设锚框anchors),原文件为yolov5m.yaml

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 2  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [4,5, 8,10, 22,18] # P2/4  增加P2特征层
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3CBAM, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3CBAM, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3CBAM, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3CBAM, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3

   # add feature extration layer  ########
   [-1, 3, C3, [256, False]],  # 17
   [-1, 1, Conv, [128, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 2], 1, Concat, [1]],  # cat backbone P3

   # add detect layer
   [-1, 3, C3, [128, False]],  # 21 (P4/4-minium)

   [-1, 1, Conv, [128, 3, 2]],
   [[-1, 18], 1, Concat, [1]],  # cat head P3
   # end

   [-1, 3, C3, [256, False]],  # 24 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 27 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 30 (P5/32-large)

   [[21, 24, 27, 30], 1, Detect, [nc, anchors]],  # Detect(P2, P3, P4, P5)  输出的层,按上面的层数从上往下数(一共31层(不含输入层和输出))
  ]

4、训练:按照官网给的命令进行训练即可,注意指定配置文件。

上述三个配置文件对于三个不同实验:

        1、yolov5m_C3CBAM.yaml。增加空间与通道注意力改进yolov5

        2、yolov5m_addlayer.yaml。针对密集或小目标检测,增加小目标检测头的改进。

        3、yolov5m_addlayer_C3CBAM.yaml。结合两者的改进。CBAM是即插即用的二维卷积模块,其放置位置可以改变,不仅在backbone中可以使用,在head层的特征金字塔内亦可使用。

5、三种改进网络训练后的结果对比:

使用生成的result.csv文件进行对比

具体可见:YOLO训练results.csv文件可视化(原模型与改进模型对比可视化)_yolo csv文件怎么用-CSDN博客

### YOLOv5s 中多尺度特征融合的实现方法与原理 #### 特征金字塔网络 (FPN) YOLOv5s 使用了基于 FPN 的架构来增强不同层次特征图之间的信息交流。通过自顶向下的路径,将高层次语义丰富的低分辨率特征逐步传递给高分辨率特征层,使得每一层都能获得更全面的信息[^3]。 ```python def build_backbone(): # 构建骨干网... def build_neck(features): P3, P4, P5 = features[-1], features[-2], features[-3] # 上采样并拼接P5到P4 upsampled_p5 = nn.Upsample(scale_factor=2)(conv(P5)) concat_1 = torch.cat([upsampled_p5, conv(P4)], dim=1) # 继续处理得到最终的P4' processed_P4 = ... # 同样的操作应用于P4' 和 P3 得到更高分辨率的特征图 ... ``` #### 自底向上路径聚合网络 (PANet) 除了传统的 FPN 结构外,YOLOv5 还引入了 PANet 来进一步加强跨尺度特征交互能力。具体来说,在完成一次标准 FPN 融合之后,再执行一轮由下至上的特征重组合成过程,形成双向循环机制,有效提高了小物体检测效果以及整体定位精度[^1]。 ```python def build_panet(fpn_features): p3_out, p4_out, p5_out = fpn_features # 下采样并将p3连接到p4 downsampled_p3 = conv(nn.MaxPool2d(kernel_size=2))(p3_out) combined_p4 = torch.cat((downsampled_p3, p4_out), axis=1) # 对新的p4继续向下传播并与p5结合 final_combined_feature_maps = [ process_layer(p3_out), process_layer(combined_p4), process_layer(torch.cat(( conv(nn.MaxPool2d(kernel_size=2))(combined_p4)), p5_out, ),axis=1))] return final_combined_feature_maps ``` #### 创新性改进 - iAFF 方法 为了更好地解决传统FNP/PAN存在的问题,有研究提出了iAFF(Improved Attention Fusion Feature)方案用于优化YOLOv5中的多尺度特征融合部分。此技术不仅能够自动调整各层权重比例关系,而且可以动态捕捉空间位置依赖特性,进而显著改善模型对于复杂场景的理解力和适应度[^5]。 ```python class IAFF(nn.Module): def __init__(self, channels_list=[...]): super().__init__() self.channels_list = channels_list # 初始化注意力模块和其他组件... def forward(self, inputs): weighted_sum = sum([ attention_module(input_) * input_ for input_, attention_module in zip(inputs, self.attention_modules)]) enhanced_output = activation_function(weighted_sum + lateral_connection_term) return enhanced_output ```
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值