论文题目:《EfficientDet: Scalable and Efficient Object Detection》
论文地址:https://arxiv.org/pdf/1911.09070
官方源码:https://github.com/google/automl/tree/master/efficientdet
1. BiFPN 简介
BiFPN 即 “双向特征金字塔网络”,是一种常用于计算机视觉任务,特别是目标检测和实例分割的神经网络架构。它扩展了特征金字塔网络(FPN),通过在金字塔级别之间引入双向连接,使信息能够在网络中同时进行自底向上和自顶向下的流动。
BiFPN 的工作原理:
(1)特征金字塔生成:最初,网络通过从骨干网络(通常是 ResNet 等卷积神经网络)的多个层中提取特征来生成特征金字塔。
(2)双向连接:与传统 FPN 不同,BiFPN 在特征金字塔相邻级别之间引入了双向连接。这意味着信息可以从更高级别的特征流向更低级别的特征(自顶向下路径),也可以从更低级别的特征流向更高级别的特征(自底向上路径)。
(3)特征整合:双向连接允许在两个方向上整合来自特征金字塔不同级别的信息。这种整合有助于有效地捕获多尺度特征。
(4)加权特征融合:BiFPN 采用加权特征融合机制,将不同级别的特征进行组合。融合的权重在训练过程中学习,确保了最佳的特征整合。
BiFPN中 的双向连接有助于更好地在不同尺度上捕获特征表示,提高了网络处理不同尺寸和复杂度对象的能力。这在目标检测任务中尤为重要,因为图像中的对象大小可能差异显著。
2. 项目环境
- 解释器:3.9.19
- 框架:Pytorch 2.0.0 + CUDA 11.8
- 系统:Win10 / Ubuntu 20.04
3. 核心代码
import torch
import torch.nn as nn
__all__ = ['BiFPN_Concat']
def autopad(k, p=None, d=1):
"""
Pads kernel to 'same' output shape, adjusting for optional dilation; returns padding size.
`k`: kernel, `p`: padding, `d`: dilation.
"""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
# Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initializes a standard convolution layer with optional batch normalization and activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Applies a convolution followed by batch normalization and an activation function to the input tensor `x`."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Applies a fused convolution and activation function to the input tensor `x`."""
return self.act(self.conv(x))
class BiFPN_Concat(nn.Module):
# Concatenate a list of tensors along dimension
def __init__(self, c1, c2):
super(BiFPN_Concat, self).__init__()
self.w1_weight = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True)
self.w2_weight = nn.Parameter(torch.ones(3, dtype=torch.float32), requires_grad=True)
self.epsilon = 0.0001
self.conv = Conv(c1, c2, 1, 1, 0)
self.act = nn.ReLU()
def forward(self, x): # mutil-layer 1-3 layers ADD or Concat
if len(x) == 2:
w = self.w1_weight
weight = w / (torch.sum(w, dim=0) + self.epsilon)
x = self.conv(self.act(weight[0] * x[0] + weight[1] * x[1]))
elif len(x) == 3:
w = self.w2_weight
weight = w / (torch.sum(w, dim=0) + self.epsilon)
x = self.conv(self.act(weight[0] * x[0] + weight[1] * x[1] + weight[2] * x[2]))
return x
4. 添加方法
第 1 步 :在 ultralytics/nn/add_modules/ 目录下新建 Python 源文件 BiFPN.py,将以上双向特征金字塔网络 BiFPN 的核心代码复制粘贴至 BiFPN.py 文件中。
第 2 步 :定位到 ultralytics/nn/add_modules/ 目录下的 __init__.py 文件,加入 BiFPN_Concat。
from .BiFPN import BiFPN_Concat
第 3 步 :定位到 ultralytics/nn/ 目录下的 tasks.py 文件,找到 parse_model 函数添加以下代码。
# ============== BiFPN ==============
elif m is BiFPN_Concat:
c2 = max([ch[x] for x in f])
# ===================================
添加完成之后,需导入 BiFPN_Concat 模块,如下图所示。
第 4 步 :在 ultralytics\cfg\models\add\ 目录下新建 YAML 文件 yolov8-BiFPN-P2-TODHead.yaml,复制 yolov8.yaml 中的代码粘贴至此处,大家先添加小目标检测头(添加教程见:《YOLOv8改进002:添加小目标检测头(小目标检测大量涨点)》),之后修改网络的 Neck 部分,即添加双向特征金字塔网络 BiFPN。
在此,我提供三种改进方式(主要针对三头版本)给大家,数据集换成你们自己的,具体哪一种有涨点效果需要大家亲自动手实验。
改进方式 1
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0 backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 · 320 × 320 × 64
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 · 160 × 160 × 128
- [-1, 3, C2f, [128, True]] # 2 · 160 × 160 × 128
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 · 80 × 80 × 256
- [-1, 6, C2f, [256, True]] # 4 · 80 × 80 × 256
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 · 40 × 40 × 512
- [-1, 6, C2f, [512, True]] # 6 · 40 × 40 × 512
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 · 20 × 20 × 1024
- [-1, 3, C2f, [1024, True]] # 8 · 20 × 20 × 1024
- [-1, 1, SPPF, [1024, 5]] # 9 · 20 × 20 × 1024
# YOLOv8.0-P2 head
head:
- [-1, 1, Conv, [512, 1, 1]] # 10 · 20 × 20 × 512
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11 · 40 × 40 × 512
- [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4 · 40 × 40 × 512(11) + 40 × 40 × 512(6) 注:YOLOv8s通道数是默认参数的一半!
- [-1, 3, C2f, [512]] # 13 · 40 × 40 × 512
- [-1, 1, Conv, [256, 1, 1]] # 14 · 40 × 40 × 256
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15 · 80 × 80 × 256
- [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3 · 80 × 80 × 256(15) + 80 × 80 × 256(4)
- [-1, 3, C2f, [256]] # 17 (P3/8-small) · 80 × 80 × 256
- [-1, 1, Conv, [128, 1, 1]] # 18 · 80 × 80 × 128
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19 · 160 × 160 × 128
- [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2 · 160 × 160 × 128(19) + 160 × 160 × 128(2)
- [-1, 3, C2f, [128]] # 21 (P2/4-tiny) · 160 × 160 × 128
- [4, 1, Conv, [128, 1, 1]] # 22 · 80 × 80 × 128
- [-2, 1, Conv, [128, 3, 2]] # 23 · 80 × 80 × 128
- [[-1, -2, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3 · 80 × 80 × 128(23) + 80 × 80 × 128(22) + 80 × 80 × 128(18)
- [-1, 3, C2f, [256]] # 25 (P3/8-small) · 80 × 80 × 256
- [6, 1, Conv, [256, 1, 1]] # 26 · 40 × 40 × 256
- [-2, 1, Conv, [256, 3, 2]] # 27 · 40 × 40 × 256
- [[-1, -2, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4 · 40 × 40 × 256(27) + 40 × 40 × 256(26) + 40 × 40 × 256(14)
- [-1, 3, C2f, [512]] # 29 (P4/16-medium) · 40 × 40 × 512
- [[21, 25, 29], 1, Detect, [nc]] # Detect(P2, P3, P4)
改进方式 2
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0 backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 · 320 × 320 × 64
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 · 160 × 160 × 128
- [-1, 3, C2f, [128, True]] # 2 · 160 × 160 × 128
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 · 80 × 80 × 256
- [-1, 6, C2f, [256, True]] # 4 · 80 × 80 × 256
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 · 40 × 40 × 512
- [-1, 6, C2f, [512, True]] # 6 · 40 × 40 × 512
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 · 20 × 20 × 1024
- [-1, 3, C2f, [1024, True]] # 8 · 20 × 20 × 1024
- [-1, 1, SPPF, [1024, 5]] # 9 · 20 × 20 × 1024
# YOLOv8.0-P2 head
head:
- [-1, 1, Conv, [512, 1, 1]] # 10 · 20 × 20 × 512
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11 · 40 × 40 × 512
- [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4 · 40 × 40 × 512(11) + 40 × 40 × 512(6)
- [-1, 3, C2f, [512]] # 13 · 40 × 40 × 512
- [-1, 1, Conv, [256, 1, 1]] # 14 · 40 × 40 × 256
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15 · 80 × 80 × 256
- [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3 · 80 × 80 × 256(15) + 80 × 80 × 256(4)
- [-1, 3, C2f, [256]] # 17 (P3/8-small) · 80 × 80 × 256
- [-1, 1, Conv, [128, 1, 1]] # 18 · 80 × 80 × 128
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19 · 160 × 160 × 128
- [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2 · 160 × 160 × 128(19) + 160 × 160 × 128(2)
- [-1, 3, C2f, [128]] # 21 (P2/4-tiny) · 160 × 160 × 128
- [2, 1, Conv, [128, 3, 2]] # 22 · 80 × 80 × 128
- [-2, 1, Conv, [128, 3, 2]] # 23 · 80 × 80 × 128
- [[-1, -2, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3 · 80 × 80 × 128(23) + 80 × 80 × 128(22) + 80 × 80 × 128(18)
- [-1, 3, C2f, [256]] # 25 (P3/8-small) · 80 × 80 × 256
- [4, 1, Conv, [256, 3, 2]] # 26 · 40 × 40 × 256
- [-2, 1, Conv, [256, 3, 2]] # 27 · 40 × 40 × 256
- [[-1, -2, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4 · 40 × 40 × 256(27) + 40 × 40 × 256(26) + 40 × 40 × 256(14)
- [-1, 3, C2f, [512]] # 29 (P4/16-medium) · 40 × 40 × 512
- [[21, 25, 29], 1, Detect, [nc]] # Detect(P2, P3, P4)
改进方式 3
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0 backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 · 320 × 320 × 64
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 · 160 × 160 × 128
- [-1, 3, C2f, [128, True]] # 2 · 160 × 160 × 128
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 · 80 × 80 × 256
- [-1, 6, C2f, [256, True]] # 4 · 80 × 80 × 256
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 · 40 × 40 × 512
- [-1, 6, C2f, [512, True]] # 6 · 40 × 40 × 512
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 · 20 × 20 × 1024
- [-1, 3, C2f, [1024, True]] # 8 · 20 × 20 × 1024
- [-1, 1, SPPF, [1024, 5]] # 9 · 20 × 20 × 1024
# YOLOv8.0-P2 head
head:
- [-1, 1, Conv, [512, 1, 1]] # 10 · 20 × 20 × 512
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11 · 40 × 40 × 512
- [[-1, 6], 1, BiFPN_Concat, [256, 256]] # cat backbone P4 · 40 × 40 × 512(11) + 40 × 40 × 512(6)
- [-1, 3, C2f, [512]] # 13 · 40 × 40 × 512
- [-1, 1, Conv, [256, 1, 1]] # 14 · 40 × 40 × 256
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15 · 80 × 80 × 256
- [[-1, 4], 1, BiFPN_Concat, [128, 128]] # cat backbone P3 · 80 × 80 × 256(15) + 80 × 80 × 256(4)
- [-1, 3, C2f, [256]] # 17 (P3/8-small) · 80 × 80 × 256
- [-1, 1, Conv, [128, 1, 1]] # 18 · 80 × 80 × 128
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 19 · 160 × 160 × 128
- [[-1, 2], 1, BiFPN_Concat, [64, 64]] # cat backbone P2 · 160 × 160 × 128(19) + 160 × 160 × 128(2)
- [-1, 3, C2f, [128]] # 21 (P2/4-tiny) · 160 × 160 × 128
- [-1, 1, Conv, [128, 3, 2]] # 22 · 80 × 80 × 128
- [[-1, 18], 1, BiFPN_Concat, [64, 64]] # cat head P3 · 80 × 80 × 128(22) + 80 × 80 × 128(18)
- [-1, 3, C2f, [256]] # 24 (P3/8-small) · 80 × 80 × 256
- [-1, 1, Conv, [256, 3, 2]] # 25 · 40 × 40 × 256
- [[-1, 14], 1, BiFPN_Concat, [128, 128]] # cat head P4 · 40 × 40 × 256(25) + 40 × 40 × 256(14)
- [-1, 3, C2f, [512]] # 27 (P4/16-medium) · 40 × 40 × 512
- [[21, 24, 27], 1, Detect, [nc]] # Detect(P2, P3, P4)
5. 训练代码
import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO
if __name__ == '__main__':
# model = YOLO(r'D:\Lab\YOLOv8.2\ultralytics\cfg\add\yolov8s-BiFPN-P2-TODHead-01.yaml')
# model = YOLO(r'D:\Lab\YOLOv8.2\ultralytics\cfg\add\yolov8s-BiFPN-P2-TODHead-02.yaml')
model = YOLO(r'D:\Lab\YOLOv8.2\ultralytics\cfg\add\yolov8s-BiFPN-P2-TODHead-03.yaml')
# model.load('yolov8n.pt') # 是否加载预训练权重,科研不建加载否则很难提升精度
model.train(
data=r'The YAML file address of your own dataset.',
cache=False,
imgsz=640,
epochs=300,
single_cls=False, # 是否是单类别检测
batch=2,
close_mosaic=0,
workers=0,
device='0',
optimizer='SGD', # using SGD
# resume='runs/train/exp/weights/last.pt', # 如过想续训就设置 last.pt 的地址
amp=False, # 如果出现训练损失为 Nan 可以关闭 amp
project='runs/train',
name='exp',
)