YOLOv8改进003:Neck 添加双向特征金字塔网络 BiFPN + 添加小目标检测头(小目标检测大量涨点)

论文题目:《EfficientDet: Scalable and Efficient Object Detection》

论文地址https://arxiv.org/pdf/1911.09070

官方源码https://github.com/google/automl/tree/master/efficientdet

1. BiFPN 简介

BiFPN 即 “双向特征金字塔网络”,是一种常用于计算机视觉任务,特别是目标检测和实例分割的神经网络架构。它扩展了特征金字塔网络(FPN),通过在金字塔级别之间引入双向连接,使信息能够在网络中同时进行自底向上和自顶向下的流动。

BiFPN 的工作原理:

(1)特征金字塔生成:最初,网络通过从骨干网络(通常是 ResNet 等卷积神经网络)的多个层中提取特征来生成特征金字塔。

(2)双向连接:与传统 FPN 不同,BiFPN 在特征金字塔相邻级别之间引入了双向连接。这意味着信息可以从更高级别的特征流向更低级别的特征(自顶向下路径),也可以从更低级别的特征流向更高级别的特征(自底向上路径)。

(3)特征整合:双向连接允许在两个方向上整合来自特征金字塔不同级别的信息。这种整合有助于有效地捕获多尺度特征。

(4)加权特征融合:BiFPN 采用加权特征融合机制,将不同级别的特征进行组合。融合的权重在训练过程中学习,确保了最佳的特征整合。

BiFPN中 的双向连接有助于更好地在不同尺度上捕获特征表示,提高了网络处理不同尺寸和复杂度对象的能力。这在目标检测任务中尤为重要,因为图像中的对象大小可能差异显著。

2. 项目环境

  • 解释器:3.9.19
  • 框架:Pytorch 2.0.0 + CUDA 11.8
  • 系统:Win10 / Ubuntu 20.04

3. 核心代码

import torch
import torch.nn as nn

__all__ = ['BiFPN_Concat']


def autopad(k, p=None, d=1):
    """
    Pads kernel to 'same' output shape, adjusting for optional dilation; returns padding size.

    `k`: kernel, `p`: padding, `d`: dilation.
    """
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p


class Conv(nn.Module):
    # Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
    default_act = nn.SiLU()  # default activation

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        """Initializes a standard convolution layer with optional batch normalization and activation."""
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

    def forward(self, x):
        """Applies a convolution followed by batch normalization and an activation function to the input tensor `x`."""
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        """Applies a fused convolution and activation function to the input tensor `x`."""
        return self.act(self.conv(x))


class BiFPN_Concat(nn.Module):
    # Concatenate a list of tensors along dimension
    def __init__(self, c1, c2):
        super(BiFPN_Concat, self).__init__()
        self.w1_weight = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True)
        self.w2_weight = nn.Parameter(torch.ones(3, dtype=torch.float32), requires_grad=True)
        self.epsilon = 0.0001
        self.conv = Conv(c1, c2, 1, 1, 0)
        self.act = nn.ReLU()

    def forward(self, x):  # mutil-layer 1-3 layers ADD or Concat
        if len(x) == 2:
            w = self.w1_weight
            weight = w / (torch.sum(w, dim=0) + self.epsilon)
            x = self.conv(self.act(weight[0] * x[0] + weight[1] * x[1]))
        elif len(x) == 3:
            w = self.w2_weight
            weight = w / (torch.sum(w, dim=0) + self.epsilon)
            x = self.conv(self.act(weight[0] * x[0] + weight[1] * x[1] + weight[2] * x[2]))
        return x

4. 添加方法

 第 1 步 :在 ultralytics/nn/add_modules/ 目录下新建 Python 源文件 BiFPN.py,将以上双向特征金字塔网络 BiFPN 的核心代码复制粘贴至 BiFPN.py 文件中。

 第 2 步 :定位到 ultralytics/nn/add_modules/ 目录下的 __init__.py 文件,加入 BiFPN_Concat。

from .BiFPN import BiFPN_Concat

 第 3 步 :定位到 ultralytics/nn/ 目录下的 tasks.py 文件,找到 parse_model 函数添加以下代码。

# ============== BiFPN ==============
elif m is BiFPN_Concat:
    c2 = max([ch[x] for x in f])
# ===================================

添加完成之后,需导入 BiFPN_Concat 模块,如下图所示。

  第 4 步 :在 ultralytics\cfg\models\add\ 目录下新建 YAML 文件 yolov8-BiFPN-P2-TODHead.yaml,复制 yolov8.yaml 中的代码粘贴至此处,大家先添加小目标检测头(添加教程见:《YOLOv8改进002:添加小目标检测头(小目标检测大量涨点)》),之后修改网络的 Neck 部分,即添加双向特征金字塔网络 BiFPN。

在此,我提供三种改进方式(主要针对三头版本)给大家,数据集换成你们自己的,具体哪一种有涨点效果需要大家亲自动手实验。

 改进方式 1 

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0 backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]   # 0-P1/2     · 320 × 320 × 64
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4     · 160 × 160 × 128
  - [-1, 3, C2f, [128, True]]   # 2          · 160 × 160 × 128
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8     · 80 × 80 × 256
  - [-1, 6, C2f, [256, True]]   # 4          · 80 × 80 × 256
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16    · 40 × 40 × 512
  - [-1, 6, C2f, [512, True]]   # 6          · 40 × 40 × 512
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32    · 20 × 20 × 1024
  - [-1, 3, C2f, [1024, True]]  # 8          · 20 × 20 × 1024
  - [-1, 1, SPPF, [1024, 5]]    # 9          · 20 × 20 × 1024

# YOLOv8.0-P2 head
head:
  - [-1, 1, Conv, [512, 1, 1]] # 10                             · 20 × 20 × 512
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11             · 40 × 40 × 512
  - [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4    · 40 × 40 × 512(11) + 40 × 40 × 512(6) 注:YOLOv8s通道数是默认参数的一半!
  - [-1, 3, C2f, [512]] # 13                                    · 40 × 40 × 512

  - [-1, 1, Conv, [256, 1, 1]] # 14                             · 40 × 40 × 256
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15             · 80 × 80 × 256
  - [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3    · 80 × 80 × 256(15) + 80 × 80 × 256(4)
  - [-1, 3, C2f, [256]] # 17 (P3/8-small)                       · 80 × 80 × 256

  - [-1, 1, Conv, [128, 1, 1]] # 18                             · 80 × 80 × 128
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19             · 160 × 160 × 128
  - [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2      · 160 × 160 × 128(19) + 160 × 160 × 128(2)
  - [-1, 3, C2f, [128]] # 21 (P2/4-tiny)                        · 160 × 160 × 128

  - [4, 1, Conv, [128, 1, 1]] # 22                              · 80 × 80 × 128
  - [-2, 1, Conv, [128, 3, 2]] # 23                             · 80 × 80 × 128
  - [[-1, -2, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3     · 80 × 80 × 128(23) + 80 × 80 × 128(22) + 80 × 80 × 128(18)
  - [-1, 3, C2f, [256]] # 25 (P3/8-small)                       · 80 × 80 × 256

  - [6, 1, Conv, [256, 1, 1]] # 26                              · 40 × 40 × 256
  - [-2, 1, Conv, [256, 3, 2]] # 27                             · 40 × 40 × 256
  - [[-1, -2, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4   · 40 × 40 × 256(27) + 40 × 40 × 256(26) + 40 × 40 × 256(14)
  - [-1, 3, C2f, [512]] # 29 (P4/16-medium)                     · 40 × 40 × 512

  - [[21, 25, 29], 1, Detect, [nc]] # Detect(P2, P3, P4)

 改进方式 2 

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0 backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]   # 0-P1/2     · 320 × 320 × 64
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4     · 160 × 160 × 128
  - [-1, 3, C2f, [128, True]]   # 2          · 160 × 160 × 128
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8     · 80 × 80 × 256
  - [-1, 6, C2f, [256, True]]   # 4          · 80 × 80 × 256
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16    · 40 × 40 × 512
  - [-1, 6, C2f, [512, True]]   # 6          · 40 × 40 × 512
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32    · 20 × 20 × 1024
  - [-1, 3, C2f, [1024, True]]  # 8          · 20 × 20 × 1024
  - [-1, 1, SPPF, [1024, 5]]    # 9          · 20 × 20 × 1024

# YOLOv8.0-P2 head
head:
  - [-1, 1, Conv, [512, 1, 1]] # 10                             · 20 × 20 × 512
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11             · 40 × 40 × 512
  - [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4    · 40 × 40 × 512(11) + 40 × 40 × 512(6)
  - [-1, 3, C2f, [512]] # 13                                    · 40 × 40 × 512

  - [-1, 1, Conv, [256, 1, 1]] # 14                             · 40 × 40 × 256
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15             · 80 × 80 × 256
  - [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3    · 80 × 80 × 256(15) + 80 × 80 × 256(4)
  - [-1, 3, C2f, [256]] # 17 (P3/8-small)                       · 80 × 80 × 256

  - [-1, 1, Conv, [128, 1, 1]] # 18                             · 80 × 80 × 128
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19             · 160 × 160 × 128
  - [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2      · 160 × 160 × 128(19) + 160 × 160 × 128(2)
  - [-1, 3, C2f, [128]] # 21 (P2/4-tiny)                        · 160 × 160 × 128

  - [2, 1, Conv, [128, 3, 2]] # 22                              · 80 × 80 × 128
  - [-2, 1, Conv, [128, 3, 2]] # 23                             · 80 × 80 × 128
  - [[-1, -2, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3     · 80 × 80 × 128(23) + 80 × 80 × 128(22) + 80 × 80 × 128(18)
  - [-1, 3, C2f, [256]] # 25 (P3/8-small)                       · 80 × 80 × 256

  - [4, 1, Conv, [256, 3, 2]] # 26                              · 40 × 40 × 256
  - [-2, 1, Conv, [256, 3, 2]] # 27                             · 40 × 40 × 256
  - [[-1, -2, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4   · 40 × 40 × 256(27) + 40 × 40 × 256(26) + 40 × 40 × 256(14)
  - [-1, 3, C2f, [512]] # 29 (P4/16-medium)                     · 40 × 40 × 512

  - [[21, 25, 29], 1, Detect, [nc]] # Detect(P2, P3, P4)

 改进方式 3 

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0 backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]   # 0-P1/2     · 320 × 320 × 64
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4     · 160 × 160 × 128
  - [-1, 3, C2f, [128, True]]   # 2          · 160 × 160 × 128
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8     · 80 × 80 × 256
  - [-1, 6, C2f, [256, True]]   # 4          · 80 × 80 × 256
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16    · 40 × 40 × 512
  - [-1, 6, C2f, [512, True]]   # 6          · 40 × 40 × 512
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32    · 20 × 20 × 1024
  - [-1, 3, C2f, [1024, True]]  # 8          · 20 × 20 × 1024
  - [-1, 1, SPPF, [1024, 5]]    # 9          · 20 × 20 × 1024

# YOLOv8.0-P2 head
head:
  - [-1, 1, Conv, [512, 1, 1]] # 10                             · 20 × 20 × 512
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11             · 40 × 40 × 512
  - [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4    · 40 × 40 × 512(11) + 40 × 40 × 512(6)
  - [-1, 3, C2f, [512]] # 13                                    · 40 × 40 × 512

  - [-1, 1, Conv, [256, 1, 1]] # 14                             · 40 × 40 × 256
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15             · 80 × 80 × 256
  - [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3    · 80 × 80 × 256(15) + 80 × 80 × 256(4)
  - [-1, 3, C2f, [256]] # 17 (P3/8-small)                       · 80 × 80 × 256

  - [-1, 1, Conv, [128, 1, 1]] # 18                             · 80 × 80 × 128
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19             · 160 × 160 × 128
  - [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2      · 160 × 160 × 128(19) + 160 × 160 × 128(2)
  - [-1, 3, C2f, [128]] # 21 (P2/4-tiny)                        · 160 × 160 × 128

  - [-1, 1, Conv, [128, 3, 2]] # 22                             · 80 × 80 × 128
  - [[-1, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3         · 80 × 80 × 128(22) + 80 × 80 × 128(18)
  - [-1, 3, C2f, [256]] # 24 (P3/8-small)                       · 80 × 80 × 256

  - [-1, 1, Conv, [256, 3, 2]] # 25                             · 40 × 40 × 256
  - [[-1, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4       · 40 × 40 × 256(25) + 40 × 40 × 256(14)
  - [-1, 3, C2f, [512]] # 27 (P4/16-medium)                     · 40 × 40 × 512

  - [[21, 24, 27], 1, Detect, [nc]] # Detect(P2, P3, P4)

5. 训练代码

import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO

if __name__ == '__main__':
    
    # model = YOLO(r'D:\Lab\YOLOv8.2\ultralytics\cfg\add\yolov8s-BiFPN-P2-TODHead-01.yaml')
    # model = YOLO(r'D:\Lab\YOLOv8.2\ultralytics\cfg\add\yolov8s-BiFPN-P2-TODHead-02.yaml')
    model = YOLO(r'D:\Lab\YOLOv8.2\ultralytics\cfg\add\yolov8s-BiFPN-P2-TODHead-03.yaml')

    # model.load('yolov8n.pt') # 是否加载预训练权重,科研不建加载否则很难提升精度

    model.train(
        data=r'The YAML file address of your own dataset.',
        cache=False,
        imgsz=640,
        epochs=300,
        single_cls=False,  # 是否是单类别检测
        batch=2,
        close_mosaic=0,
        workers=0,
        device='0',
        optimizer='SGD',   # using SGD
        # resume='runs/train/exp/weights/last.pt',   # 如过想续训就设置 last.pt 的地址
        amp=False,                                   # 如果出现训练损失为 Nan 可以关闭 amp
        project='runs/train',
        name='exp',
    )

  欢迎大家订阅我的专栏一起学习 YOLO!(o^^o)

  • 20
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值