YOLOv8改进003：Neck 添加双向特征金字塔网络 BiFPN + 添加小目标检测头（小目标检测大量涨点）

Keqi19

已于 2024-08-08 12:46:15 修改

阅读量1k

点赞数 20

分类专栏： YOLO目标检测算法改进文章标签： YOLO yolov8

于 2024-08-06 16:32:49 首次发布

本文链接：https://blog.csdn.net/m0_46496775/article/details/140940645

版权

YOLO目标检测算法改进专栏收录该内容

5 篇文章 1 订阅

订阅专栏

论文题目：《EfficientDet: Scalable and Efficient Object Detection》

论文地址：https://arxiv.org/pdf/1911.09070

官方源码：https://github.com/google/automl/tree/master/efficientdet

1. BiFPN 简介

BiFPN 即 “双向特征金字塔网络”，是一种常用于计算机视觉任务，特别是目标检测和实例分割的神经网络架构。它扩展了特征金字塔网络（FPN），通过在金字塔级别之间引入双向连接，使信息能够在网络中同时进行自底向上和自顶向下的流动。

BiFPN 的工作原理：

（1）特征金字塔生成：最初，网络通过从骨干网络（通常是 ResNet 等卷积神经网络）的多个层中提取特征来生成特征金字塔。

（2）双向连接：与传统 FPN 不同，BiFPN 在特征金字塔相邻级别之间引入了双向连接。这意味着信息可以从更高级别的特征流向更低级别的特征（自顶向下路径），也可以从更低级别的特征流向更高级别的特征（自底向上路径）。

（3）特征整合：双向连接允许在两个方向上整合来自特征金字塔不同级别的信息。这种整合有助于有效地捕获多尺度特征。

（4）加权特征融合：BiFPN 采用加权特征融合机制，将不同级别的特征进行组合。融合的权重在训练过程中学习，确保了最佳的特征整合。

BiFPN中的双向连接有助于更好地在不同尺度上捕获特征表示，提高了网络处理不同尺寸和复杂度对象的能力。这在目标检测任务中尤为重要，因为图像中的对象大小可能差异显著。

2. 项目环境

解释器：3.9.19
框架：Pytorch 2.0.0 + CUDA 11.8
系统：Win10 / Ubuntu 20.04

3. 核心代码

import torch
import torch.nn as nn

__all__ = ['BiFPN_Concat']


def autopad(k, p=None, d=1):
    """
    Pads kernel to 'same' output shape, adjusting for optional dilation; returns padding size.

    `k`: kernel, `p`: padding, `d`: dilation.
    """
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p


class Conv(nn.Module):
    # Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
    default_act = nn.SiLU()  # default activation

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        """Initializes a standard convolution layer with optional batch normalization and activation."""
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

    def forward(self, x):
        """Applies a convolution followed by batch normalization and an activation function to the input tensor `x`."""
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        """Applies a fused convolution and activation function to the input tensor `x`."""
        return self.act(self.conv(x))


class BiFPN_Concat(nn.Module):
    # Concatenate a list of tensors along dimension
    def __init__(self, c1, c2):
        super(BiFPN_Concat, self).__init__()
        self.w1_weight = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True)
        self.w2_weight = nn.Parameter(torch.ones(3, dtype=torch.float32), requires_grad=True)
        self.epsilon = 0.0001
        self.conv = Conv(c1, c2, 1, 1, 0)
        self.act = nn.ReLU()

    def forward(self, x):  # mutil-layer 1-3 layers ADD or Concat
        if len(x) == 2:
            w = self.w1_weight
            weight = w / (torch.sum(w, dim=0) + self.epsilon)
            x = self.conv(self.act(weight[0] * x[0] + weight[1] * x[1]))
        elif len(x) == 3:
            w = self.w2_weight
            weight = w / (torch.sum(w, dim=0) + self.epsilon)
            x = self.conv(self.act(weight[0] * x[0] + weight[1] * x[1] + weight[2] * x[2]))
        return x

4. 添加方法

第 1 步：在 ultralytics/nn/add_modules/ 目录下新建 Python 源文件 BiFPN.py，将以上双向特征金字塔网络 BiFPN 的核心代码复制粘贴至 BiFPN.py 文件中。

第 2 步：定位到 ultralytics/nn/add_modules/ 目录下的 __init__.py 文件，加入 BiFPN_Concat。

from .BiFPN import BiFPN_Concat

第 3 步：定位到 ultralytics/nn/ 目录下的 tasks.py 文件，找到 parse_model 函数添加以下代码。

# ============== BiFPN ==============
elif m is BiFPN_Concat:
    c2 = max([ch[x] for x in f])
# ===================================

添加完成之后，需导入 BiFPN_Concat 模块，如下图所示。

第 4 步：在 ultralytics\cfg\models\add\ 目录下新建 YAML 文件 yolov8-BiFPN-P2-TODHead.yaml，复制 yolov8.yaml 中的代码粘贴至此处，大家先添加小目标检测头（添加教程见：《YOLOv8改进002：添加小目标检测头（小目标检测大量涨点）》），之后修改网络的 Neck 部分，即添加双向特征金字塔网络 BiFPN。

在此，我提供三种改进方式（主要针对三头版本）给大家，数据集换成你们自己的，具体哪一种有涨点效果需要大家亲自动手实验。

改进方式 1

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0 backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]   # 0-P1/2     · 320 × 320 × 64
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4     · 160 × 160 × 128
  - [-1, 3, C2f, [128, True]]   # 2          · 160 × 160 × 128
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8     · 80 × 80 × 256
  - [-1, 6, C2f, [256, True]]   # 4          · 80 × 80 × 256
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16    · 40 × 40 × 512
  - [-1, 6, C2f, [512, True]]   # 6          · 40 × 40 × 512
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32    · 20 × 20 × 1024
  - [-1, 3, C2f, [1024, True]]  # 8          · 20 × 20 × 1024
  - [-1, 1, SPPF, [1024, 5]]    # 9          · 20 × 20 × 1024

# YOLOv8.0-P2 head
head:
  - [-1, 1, Conv, [512, 1, 1]] # 10                             · 20 × 20 × 512
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11             · 40 × 40 × 512
  - [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4    · 40 × 40 × 512(11) + 40 × 40 × 512(6) 注：YOLOv8s通道数是默认参数的一半！
  - [-1, 3, C2f, [512]] # 13                                    · 40 × 40 × 512

  - [-1, 1, Conv, [256, 1, 1]] # 14                             · 40 × 40 × 256
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15             · 80 × 80 × 256
  - [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3    · 80 × 80 × 256(15) + 80 × 80 × 256(4)
  - [-1, 3, C2f, [256]] # 17 (P3/8-small)                       · 80 × 80 × 256

  - [-1, 1, Conv, [128, 1, 1]] # 18                             · 80 × 80 × 128
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19             · 160 × 160 × 128
  - [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2      · 160 × 160 × 128(19) + 160 × 160 × 128(2)
  - [-1, 3, C2f, [128]] # 21 (P2/4-tiny)                        · 160 × 160 × 128

  - [4, 1, Conv, [128, 1, 1]] # 22                              · 80 × 80 × 128
  - [-2, 1, Conv, [128, 3, 2]] # 23                             · 80 × 80 × 128
  - [[-1, -2, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3     · 80 × 80 × 128(23) + 80 × 80 × 128(22) + 80 × 80 × 128(18)
  - [-1, 3, C2f, [256]] # 25 (P3/8-small)                       · 80 × 80 × 256

  - [6, 1, Conv, [256, 1, 1]] # 26                              · 40 × 40 × 256
  - [-2, 1, Conv, [256, 3, 2]] # 27                             · 40 × 40 × 256
  - [[-1, -2, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4   · 40 × 40 × 256(27) + 40 × 40 × 256(26) + 40 × 40 × 256(14)
  - [-1, 3, C2f, [512]] # 29 (P4/16-medium)                     · 40 × 40 × 512

  - [[21, 25, 29], 1, Detect, [nc]] # Detect(P2, P3, P4)

改进方式 2

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0 backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]   # 0-P1/2     · 320 × 320 × 64
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4     · 160 × 160 × 128
  - [-1, 3, C2f, [128, True]]   # 2          · 160 × 160 × 128
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8     · 80 × 80 × 256
  - [-1, 6, C2f, [256, True]]   # 4          · 80 × 80 × 256
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16    · 40 × 40 × 512
  - [-1, 6, C2f, [512, True]]   # 6          · 40 × 40 × 512
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32    · 20 × 20 × 1024
  - [-1, 3, C2f, [1024, True]]  # 8          · 20 × 20 × 1024
  - [-1, 1, SPPF, [1024, 5]]    # 9          · 20 × 20 × 1024

# YOLOv8.0-P2 head
head:
  - [-1, 1, Conv, [512, 1, 1]] # 10                             · 20 × 20 × 512
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11             · 40 × 40 × 512
  - [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4    · 40 × 40 × 512(11) + 40 × 40 × 512(6)
  - [-1, 3, C2f, [512]] # 13                                    · 40 × 40 × 512

  - [-1, 1, Conv, [256, 1, 1]] # 14                             · 40 × 40 × 256
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15             · 80 × 80 × 256
  - [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3    · 80 × 80 × 256(15) + 80 × 80 × 256(4)
  - [-1, 3, C2f, [256]] # 17 (P3/8-small)                       · 80 × 80 × 256

  - [-1, 1, Conv, [128, 1, 1]] # 18                             · 80 × 80 × 128
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19             · 160 × 160 × 128
  - [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2      · 160 × 160 × 128(19) + 160 × 160 × 128(2)
  - [-1, 3, C2f, [128]] # 21 (P2/4-tiny)                        · 160 × 160 × 128

  - [2, 1, Conv, [128, 3, 2]] # 22                              · 80 × 80 × 128
  - [-2, 1, Conv, [128, 3, 2]] # 23                             · 80 × 80 × 128
  - [[-1, -2, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3     · 80 × 80 × 128(23) + 80 × 80 × 128(22) + 80 × 80 × 128(18)
  - [-1, 3, C2f, [256]] # 25 (P3/8-small)                       · 80 × 80 × 256

  - [4, 1, Conv, [256, 3, 2]] # 26                              · 40 × 40 × 256
  - [-2, 1, Conv, [256, 3, 2]] # 27                             · 40 × 40 × 256
  - [[-1, -2, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4   · 40 × 40 × 256(27) + 40 × 40 × 256(26) + 40 × 40 × 256(14)
  - [-1, 3, C2f, [512]] # 29 (P4/16-medium)                     · 40 × 40 × 512

  - [[21, 25, 29], 1, Detect, [nc]] # Detect(P2, P3, P4)

改进方式 3

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0 backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]   # 0-P1/2     · 320 × 320 × 64
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4     · 160 × 160 × 128
  - [-1, 3, C2f, [128, True]]   # 2          · 160 × 160 × 128
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8     · 80 × 80 × 256
  - [-1, 6, C2f, [256, True]]   # 4          · 80 × 80 × 256
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16    · 40 × 40 × 512
  - [-1, 6, C2f, [512, True]]   # 6          · 40 × 40 × 512
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32    · 20 × 20 × 1024
  - [-1, 3, C2f, [1024, True]]  # 8          · 20 × 20 × 1024
  - [-1, 1, SPPF, [1024, 5]]    # 9          · 20 × 20 × 1024

# YOLOv8.0-P2 head
head:
  - [-1, 1, Conv, [512, 1, 1]] # 10                             · 20 × 20 × 512
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11             · 40 × 40 × 512
  - [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4    · 40 × 40 × 512(11) + 40 × 40 × 512(6)
  - [-1, 3, C2f, [512]] # 13                                    · 40 × 40 × 512

  - [-1, 1, Conv, [256, 1, 1]] # 14                             · 40 × 40 × 256
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15             · 80 × 80 × 256
  - [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3    · 80 × 80 × 256(15) + 80 × 80 × 256(4)
  - [-1, 3, C2f, [256]] # 17 (P3/8-small)                       · 80 × 80 × 256

  - [-1, 1, Conv, [128, 1, 1]] # 18                             · 80 × 80 × 128
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19             · 160 × 160 × 128
  - [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2      · 160 × 160 × 128(19) + 160 × 160 × 128(2)
  - [-1, 3, C2f, [128]] # 21 (P2/4-tiny)                        · 160 × 160 × 128

  - [-1, 1, Conv, [128, 3, 2]] # 22                             · 80 × 80 × 128
  - [[-1, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3         · 80 × 80 × 128(22) + 80 × 80 × 128(18)
  - [-1, 3, C2f, [256]] # 24 (P3/8-small)                       · 80 × 80 × 256

  - [-1, 1, Conv, [256, 3, 2]] # 25                             · 40 × 40 × 256
  - [[-1, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4       · 40 × 40 × 256(25) + 40 × 40 × 256(14)
  - [-1, 3, C2f, [512]] # 27 (P4/16-medium)                     · 40 × 40 × 512

  - [[21, 24, 27], 1, Detect, [nc]] # Detect(P2, P3, P4)

5. 训练代码

import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO

if __name__ == '__main__':
    
    # model = YOLO(r'D:\Lab\YOLOv8.2\ultralytics\cfg\add\yolov8s-BiFPN-P2-TODHead-01.yaml')
    # model = YOLO(r'D:\Lab\YOLOv8.2\ultralytics\cfg\add\yolov8s-BiFPN-P2-TODHead-02.yaml')
    model = YOLO(r'D:\Lab\YOLOv8.2\ultralytics\cfg\add\yolov8s-BiFPN-P2-TODHead-03.yaml')

    # model.load('yolov8n.pt') # 是否加载预训练权重，科研不建加载否则很难提升精度

    model.train(
        data=r'The YAML file address of your own dataset.',
        cache=False,
        imgsz=640,
        epochs=300,
        single_cls=False,  # 是否是单类别检测
        batch=2,
        close_mosaic=0,
        workers=0,
        device='0',
        optimizer='SGD',   # using SGD
        # resume='runs/train/exp/weights/last.pt',   # 如过想续训就设置 last.pt 的地址
        amp=False,                                   # 如果出现训练损失为 Nan 可以关闭 amp
        project='runs/train',
        name='exp',
    )

欢迎大家订阅我的专栏一起学习 YOLO！（o^^o）

Keqi19

关注

20
点赞
踩
15

收藏

觉得还不错? 一键收藏
0
评论
YOLOv8改进003：Neck 添加双向特征金字塔网络 BiFPN + 添加小目标检测头（小目标检测大量涨点）

YOLOv8 目标检测算法改进之 Neck 添加双向特征金字塔网络 BiFPN 并结合《YOLOv8改进002：添加小目标检测头（小目标检测大量涨点）》一起改进，实现小目标检测大幅度涨点。
复制链接

扫一扫