【YOLOv8创新升级1】：特征金字塔网络 BiFPN与YOLOv8相结合（代码实现）

qq_42322398

已于 2025-03-21 16:34:13 修改

阅读量2.4k

点赞数 16

文章标签：深度学习计算机视觉 python

于 2024-11-01 16:06:57 首次发布

本文链接：https://blog.csdn.net/qq_42322398/article/details/143433952

版权

摘要

Bi-directional Feature Pyramid Network（BIFPN）是一种用于多尺度目标检测的高效特征金字塔网络架构。BIFPN通过引入双向特征融合机制，增强了不同层次特征之间的信息交流，显著提高了网络对多尺度物体的识别能力。该架构利用底向上的路径传递高分辨率特征，同时通过顶向下的路径增强低分辨率特征，从而实现高效的信息融合。BIFPN的设计不仅提高了特征的表达能力，还减少了计算成本和模型复杂度。在多项目标检测任务中，BIFPN表现出优越的性能，相比于传统特征金字塔网络，其在准确性和推理速度上均有显著提升。
在这里插入图片描述

创新点

1、双向特征融合：BIFPN引入了双向路径的特征融合机制，允许高分辨率和低分辨率特征之间的双向信息流动，增强了不同层次特征的互补性。

2、高效的信息整合：通过底向上的路径和顶向下的路径，BIFPN有效整合了来自不同尺度的特征，提升了对多尺度目标的检测能力。

3、 改进的计算效率：BIFPN在保留高性能的同时，通过设计优化减少了计算复杂度，允许在保持准确性的基础上加快推理速度。

4、自适应特征重用：BIFPN支持特征的自适应重用，使得网络能够更加灵活地适应不同的输入和任务，提高了模型的通用性。

5、模块化设计：该架构采用模块化设计，使其易于集成到现有的目标检测框架中，促进了模型的可扩展性和实用性。

代码实现

核心代码

import torch
import torch.nn as nn
 
__all__ = ['BiFPN_Concat']
 
 def autopad(k, p=None, d=1):
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p
 
 class Conv(nn.Module):
    default_act = nn.SiLU()  # default activation
     def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
 
    def forward(self, x):
        return self.act(self.bn(self.conv(x)))
 
    def forward_fuse(self, x):
        return self.act(self.conv(x))
 
 class BiFPN_Concat(nn.Module):
    def __init__(self, c1, c2):
        super(BiFPN_Concat, self).__init__()
        self.w1_weight = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True)
        self.w2_weight = nn.Parameter(torch.ones(3, dtype=torch.float32), requires_grad=True)
        self.epsilon = 0.0001
        self.conv = Conv(c1, c2, 1, 1, 0)
        self.act = nn.ReLU()
 
    def forward(self, x):  
        if len(x) == 2:
            w = self.w1_weight
            weight = w / (torch.sum(w, dim=0) + self.epsilon)
            x = self.conv(self.act(weight[0] * x[0] + weight[1] * x[1]))
        elif len(x) == 3:
            w = self.w2_weight
            weight = w / (torch.sum(w, dim=0) + self.epsilon)
            x = self.conv(self.act(weight[0] * x[0] + weight[1] * x[1] + weight[2] * x[2]))
        return x

实现

在ultralytics/nn/modules/下新建SPDConv.py，并将代码写入。
在这里插入图片描述
在ultralytics\nn\modules_init_.py中加入

from .BiFPN import *

ultralytics/nn/ 目录下的 tasks.py 文件，找到 parse_model 函数添加以下代码

elif m is BiFPN_Concat:
            c2 = max([ch[x] for x in f])

在这里插入图片描述
添加头文件

在 ultralytics\cfg\models\Yolov11\ 目录下新建 YAML 文件 yolov11-BIFPN.yaml，复制下列代码粘贴至此处,添加BIFPN方式一:

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
 
# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
 
# YOLOv8.0 backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]   # 0-P1/2     · 320 × 320 × 64
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4     · 160 × 160 × 128
  - [-1, 3, C2f, [128, True]]   # 2          · 160 × 160 × 128
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8     · 80 × 80 × 256
  - [-1, 6, C2f, [256, True]]   # 4          · 80 × 80 × 256
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16    · 40 × 40 × 512
  - [-1, 6, C2f, [512, True]]   # 6          · 40 × 40 × 512
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32    · 20 × 20 × 1024
  - [-1, 3, C2f, [1024, True]]  # 8          · 20 × 20 × 1024
  - [-1, 1, SPPF, [1024, 5]]    # 9          · 20 × 20 × 1024
 
# YOLOv8.0-P2 head
head:
  - [-1, 1, Conv, [512, 1, 1]] # 10                             · 20 × 20 × 512
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11             · 40 × 40 × 512
  - [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4    · 40 × 40 × 512(11) + 40 × 40 × 512(6) 注：YOLOv8s通道数是默认参数的一半！
  - [-1, 3, C2f, [512]] # 13                                    · 40 × 40 × 512
 
  - [-1, 1, Conv, [256, 1, 1]] # 14                             · 40 × 40 × 256
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15             · 80 × 80 × 256
  - [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3    · 80 × 80 × 256(15) + 80 × 80 × 256(4)
  - [-1, 3, C2f, [256]] # 17 (P3/8-small)                       · 80 × 80 × 256
 
  - [-1, 1, Conv, [128, 1, 1]] # 18                             · 80 × 80 × 128
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19             · 160 × 160 × 128
  - [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2      · 160 × 160 × 128(19) + 160 × 160 × 128(2)
  - [-1, 3, C2f, [128]] # 21 (P2/4-tiny)                        · 160 × 160 × 128
 
  - [4, 1, Conv, [128, 1, 1]] # 22                              · 80 × 80 × 128
  - [-2, 1, Conv, [128, 3, 2]] # 23                             · 80 × 80 × 128
  - [[-1, -2, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3     · 80 × 80 × 128(23) + 80 × 80 × 128(22) + 80 × 80 × 128(18)
  - [-1, 3, C2f, [256]] # 25 (P3/8-small)                       · 80 × 80 × 256
 
  - [6, 1, Conv, [256, 1, 1]] # 26                              · 40 × 40 × 256
  - [-2, 1, Conv, [256, 3, 2]] # 27                             · 40 × 40 × 256
  - [[-1, -2, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4   · 40 × 40 × 256(27) + 40 × 40 × 256(26) + 40 × 40 × 256(14)
  - [-1, 3, C2f, [512]] # 29 (P4/16-medium)                     · 40 × 40 × 512
 
  - [[21, 25, 29], 1, Detect, [nc]] # Detect(P2, P3, P4)

添加BIFPN方式二:

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
 
# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
 
# YOLOv8.0 backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]   # 0-P1/2     · 320 × 320 × 64
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4     · 160 × 160 × 128
  - [-1, 3, C2f, [128, True]]   # 2          · 160 × 160 × 128
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8     · 80 × 80 × 256
  - [-1, 6, C2f, [256, True]]   # 4          · 80 × 80 × 256
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16    · 40 × 40 × 512
  - [-1, 6, C2f, [512, True]]   # 6          · 40 × 40 × 512
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32    · 20 × 20 × 1024
  - [-1, 3, C2f, [1024, True]]  # 8          · 20 × 20 × 1024
  - [-1, 1, SPPF, [1024, 5]]    # 9          · 20 × 20 × 1024
 
# YOLOv8.0-P2 head
head:
  - [-1, 1, Conv, [512, 1, 1]] # 10                             · 20 × 20 × 512
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11             · 40 × 40 × 512
  - [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4    · 40 × 40 × 512(11) + 40 × 40 × 512(6)
  - [-1, 3, C2f, [512]] # 13                                    · 40 × 40 × 512
 
  - [-1, 1, Conv, [256, 1, 1]] # 14                             · 40 × 40 × 256
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15             · 80 × 80 × 256
  - [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3    · 80 × 80 × 256(15) + 80 × 80 × 256(4)
  - [-1, 3, C2f, [256]] # 17 (P3/8-small)                       · 80 × 80 × 256
 
  - [-1, 1, Conv, [128, 1, 1]] # 18                             · 80 × 80 × 128
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19             · 160 × 160 × 128
  - [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2      · 160 × 160 × 128(19) + 160 × 160 × 128(2)
  - [-1, 3, C2f, [128]] # 21 (P2/4-tiny)                        · 160 × 160 × 128
 
  - [2, 1, Conv, [128, 3, 2]] # 22                              · 80 × 80 × 128
  - [-2, 1, Conv, [128, 3, 2]] # 23                             · 80 × 80 × 128
  - [[-1, -2, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3     · 80 × 80 × 128(23) + 80 × 80 × 128(22) + 80 × 80 × 128(18)
  - [-1, 3, C2f, [256]] # 25 (P3/8-small)                       · 80 × 80 × 256
 
  - [4, 1, Conv, [256, 3, 2]] # 26                              · 40 × 40 × 256
  - [-2, 1, Conv, [256, 3, 2]] # 27                             · 40 × 40 × 256
  - [[-1, -2, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4   · 40 × 40 × 256(27) + 40 × 40 × 256(26) + 40 × 40 × 256(14)
  - [-1, 3, C2f, [512]] # 29 (P4/16-medium)                     · 40 × 40 × 512
 
  - [[21, 25, 29], 1, Detect, [nc]] # Detect(P2, P3, P4)

添加BIFPN方式三:

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
 
# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
 
# YOLOv8.0 backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]   # 0-P1/2     · 320 × 320 × 64
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4     · 160 × 160 × 128
  - [-1, 3, C2f, [128, True]]   # 2          · 160 × 160 × 128
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8     · 80 × 80 × 256
  - [-1, 6, C2f, [256, True]]   # 4          · 80 × 80 × 256
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16    · 40 × 40 × 512
  - [-1, 6, C2f, [512, True]]   # 6          · 40 × 40 × 512
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32    · 20 × 20 × 1024
  - [-1, 3, C2f, [1024, True]]  # 8          · 20 × 20 × 1024
  - [-1, 1, SPPF, [1024, 5]]    # 9          · 20 × 20 × 1024
 
# YOLOv8.0-P2 head
head:
  - [-1, 1, Conv, [512, 1, 1]] # 10                             · 20 × 20 × 512
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11             · 40 × 40 × 512
  - [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4    · 40 × 40 × 512(11) + 40 × 40 × 512(6)
  - [-1, 3, C2f, [512]] # 13                                    · 40 × 40 × 512
 
  - [-1, 1, Conv, [256, 1, 1]] # 14                             · 40 × 40 × 256
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15             · 80 × 80 × 256
  - [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3    · 80 × 80 × 256(15) + 80 × 80 × 256(4)
  - [-1, 3, C2f, [256]] # 17 (P3/8-small)                       · 80 × 80 × 256
 
  - [-1, 1, Conv, [128, 1, 1]] # 18                             · 80 × 80 × 128
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19             · 160 × 160 × 128
  - [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2      · 160 × 160 × 128(19) + 160 × 160 × 128(2)
  - [-1, 3, C2f, [128]] # 21 (P2/4-tiny)                        · 160 × 160 × 128
 
  - [-1, 1, Conv, [128, 3, 2]] # 22                             · 80 × 80 × 128
  - [[-1, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3         · 80 × 80 × 128(22) + 80 × 80 × 128(18)
  - [-1, 3, C2f, [256]] # 24 (P3/8-small)                       · 80 × 80 × 256
 
  - [-1, 1, Conv, [256, 3, 2]] # 25                             · 40 × 40 × 256
  - [[-1, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4       · 40 × 40 × 256(25) + 40 × 40 × 256(14)
  - [-1, 3, C2f, [512]] # 27 (P4/16-medium)                     · 40 × 40 × 512
 
  - [[21, 24, 27], 1, Detect, [nc]] # Detect(P2, P3, P4)

训练代码

在 YOLOv8项目的主目录下新建 Python 源文件 train.py，将以下代码复制粘贴至该文件中。

import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO
 
if __name__ == '__main__':
 
    model = YOLO(r'ultralytics\cfg\models\v8\yolov8-BIFPN.yaml') #SPDCOnv
    model.train(
        data=r'myway.yaml',
        cache=False,
        imgsz=960,
        epochs=300,
        single_cls=False,  # 是否是单类别检测
        batch=4,
        close_mosaic=0,
        workers=0,
        device='0',
        optimizer='SGD',   # using SGD
        # resume='runs/train/exp/weights/last.pt',   # 如过想续训就设置 last.pt 的地址
        amp=False,                                   # 如果出现训练损失为 Nan 可以关闭 amp
        project='runs/train',
        name='exp',
    )