YOLOv8目标检测创新改进与实战案例专栏
专栏目录: YOLOv8有效改进系列及项目实战目录 包含卷积,主干 注意力,检测头等创新机制 以及 各种目标检测分割项目实战案例
专栏链接: YOLOv8基础解析+创新改进+实战案例
介绍
摘要
在传统的目标检测框架中,通常采用从图像识别模型继承的主干网络来提取深层潜在特征,然后通过颈部模块融合这些潜在特征,以捕捉不同尺度的信息。由于目标检测中的分辨率远高于图像识别,主干网络的计算成本往往占据总推理成本的主要部分。这种重型主干设计范式主要是由于将图像识别模型转移到目标检测中时的历史遗留,而不是针对目标检测进行的端到端优化设计。在本研究中,我们表明这种范式确实导致了次优的目标检测模型。为此,我们提出了一种新颖的重型颈部范式,GiraffeDet,这是一种类长颈鹿的高效目标检测网络。GiraffeDet使用极其轻量级的主干网络和非常深且大的颈部模块,鼓励在不同空间尺度和不同潜在语义层次之间进行密集的信息交换。这种设计范式使检测器即使在网络的早期阶段也能以同等优先级处理高级语义信息和低级空间信息,从而在检测任务中更加有效。对多个流行的目标检测基准的数值评估表明,GiraffeDet在各种资源约束下始终优于先前的SOTA模型。源码可在 https://github.com/jyqi/GiraffeDet 获取。
文章链接
论文地址:论文地址
代码地址:代码地址
基本原理
GiraffeDet 基本原理和组件
GiraffeDet 是一个创新的对象检测框架,其设计宗旨是通过轻量级骨干网络和深度、庞大的颈部模块实现高效的多尺度信息交换,从而提高检测性能。其核心思想包括轻量级的空间到深度链(Space-to-Depth Chain, S2D-chain)和广义特征金字塔网络(Generalized Feature Pyramid Network, GFPN),共同组成了一个“长颈鹿”形网络。
1. 基本原理
-
轻量级骨干(Lightweight Backbone):
- GiraffeDet 使用轻量级的空间到深度链(S2D-chain)作为骨干网络,取代传统的CNN骨干,减少了计算成本和域迁移问题【10:7†source】。
- S2D-chain 包括两个 3x3 卷积层和多个 S2D 块,每个 S2D 块由一个 S2D 层和一个 1x1 卷积组成,通过将空间维度的信息转移到深度维度来实现特征的下采样【10:7†source】。
-
广义特征金字塔网络(Generalized FPN, GFPN):
- GFPN 提供了跨层次和跨尺度的特征融合,通过“Queen-Fusion”实现像国际象棋中的王后路径般的高效信息交换【10:7†source】。
- GFPN 设计中包含跳跃层连接(log2n-link),能够有效传递早期节点到后期节点的信息,并减少冗余【10:13†source】。
2. 组件
-
S2D链:
- 包含初始下采样的 3x3 卷积和多个 S2D 块。S2D 块通过固定间隔的均匀采样和重组特征实现空间维度到深度维度的转换【10:7†source】。
-
GFPN:
- 由多个深度和宽度可调的层组成。每层包括多种尺度和层次的特征融合,使用跳跃层连接和跨尺度连接【10:8†source】【10:13†source】。
- Queen-Fusion 融合了当前层和相邻层的特征,提供了高效的高低层信息交换【10:17†source】。
-
预测网络(Prediction Network):
- 负责生成对象的边界框和分类标签。通过 GFPN 提供的丰富特征进行准确的对象检测【10:7†source】。
3. GiraffeDet 家族
- 多样化模型:
- 根据 GFPN 的深度和宽度,GiraffeDet 开发了多个适应不同计算资源限制的模型,包括 Giraffe-D7、D11、D14、D16、D25 和 D29【10:8†source】。
- 实验结果表明,GiraffeDet 在各个 FLOPs 级别上都表现出了较高的准确性和效率【10:10†source】。
GFPN (广义特征金字塔网络) 详解
GFPN 是 GiraffeDet 中的一个关键组件,其设计旨在高效地融合多尺度特征,以提升目标检测性能。GFPN 结合了跳跃层连接(skip-layer connections)和跨尺度连接(cross-scale connections)等创新技术,解决了传统特征金字塔网络(FPN)设计中的局限性,增强了不同特征层次之间的信息交换。
GFPN 设计要点
-
多尺度特征融合:
- GFPN 的目的是汇聚从骨干网络提取的不同分辨率的特征。
- 它基于传统 FPN 的概念,但引入了更复杂的连接,以增强信息流动。
-
特征金字塔网络的演进:
- 传统 FPN:引入自上而下的路径来融合多尺度特征。
- PANet:在 FPN 结构的基础上增加了自下而上的路径,增强了双向信息流。
- BiFPN:移除只有一个输入边的节点,并增加同一级别原始输入的额外边,进一步提高了连接性。
- GFPN:结合跳跃层连接和跨尺度连接,优化了水平和垂直方向上的信息流。
-
跳跃层连接(Skip-layer Connection):
- 设计用于减少深层网络中的梯度消失问题。
- 具体实现方式有两种:密集连接(dense-link)和 log2n 连接(log2n-link)。
-
跨尺度连接(Cross-scale Connection):
- 设计用于克服大尺度变异问题。
- 以前的工作仅考虑相邻层之间的特征连接,而 GFPN 提出了新的融合方法——皇后融合(Queen-fusion),考虑了相同层次和邻近层次的特征。
GFPN 详细设计
-
跳跃层连接:
- 密集连接(dense-link):每一层都接收所有前一层的特征图,并进行卷积操作。
- log2n 连接(log2n-link):每一层接收至多 log2(l) + 1 层的前一层特征图,减少了计算复杂度,并有效传递信息。
-
皇后融合(Queen-fusion):
- 类似于国际象棋中的皇后路径,融合了当前层和相邻层的特征。
- 例如,在 P5 层,融合了上一层 P4 的下采样、上一层 P6 的上采样、上一层 P5 和当前层 P4 的特征。
实验结果与性能
- 实验结果表明,GFPN 在处理大尺度变异方面表现优异,在各个 FLOPs 级别上均实现了更高的准确性和效率。
- 连接分析显示,log2n 连接在信息传输方面比密集连接更有效,而皇后融合则能实现充分的高低层信息交换。
GFPN 的创新设计使得 GiraffeDet 能够在目标检测任务中提供卓越的性能,并在处理不同尺度的对象时表现出色。通过跳跃层连接和跨尺度连接,GFPN 实现了高效的信息融合,提高了检测的准确性和效率。
核心代码
import warnings
import torch
import torch.nn as nn
from mmcv.cnn import ConvModule, constant_init, kaiming_init
from mmcv.cnn.bricks.activation import build_activation_layer
from mmcv.cnn.bricks.norm import build_norm_layer
from mmcv.runner import BaseModule
from mmcv.runner.fp16_utils import auto_fp16
from torch.nn.modules.batchnorm import _BatchNorm
from ..builder import BACKBONES
def space_to_depth(x, block_size):
n, c, h, w = x.size()
unfolded_x = torch.nn.functional.unfold(x, block_size, stride=block_size)
return unfolded_x.view(n, c * block_size ** 2, h // block_size, w // block_size)
class Conv(ConvModule):
# Standard convolution
def __init__(self,
in_channels,
out_channels,
kernel_size=1,
stride=1,
padding=None,
groups=1,
norm_cfg=dict(type='BN'),
act_cfg=dict(type='Mish'),
**kwargs):
super(Conv, self).__init__(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=kernel_size // 2 if padding is None else padding,
groups=groups,
norm_cfg=norm_cfg,
act_cfg=act_cfg)
class SimpleFocus(nn.Module):
# Focus wh information into c-space
def __init__(self,
in_channels,
out_channels,
b,
kernel_size=1,
stride=1,
groups=1,
init_cfg=None,
**kwargs):
super(SimpleFocus, self).__init__()
padding = kernel_size // 2
self.b = b
self.conv = Conv(in_channels, out_channels, kernel_size, stride, padding, groups, **kwargs)
def forward(self, x): # x(b,c,w,h) -> y(b,4c,w/2,h/2)
x = space_to_depth(x, self.b)
return self.conv(x)
class Bottleneck(BaseModule):
def __init__(self,
in_channels,
out_channels,
shortcut=True,
groups=1,
expansion=0.5,
init_cfg=None,
**kwargs):
super(Bottleneck, self).__init__(init_cfg)
hidden_channels = int(out_channels * expansion) # hidden channels
self.conv1 = Conv(
in_channels, hidden_channels, kernel_size=1, **kwargs)
self.conv2 = Conv(
hidden_channels,
out_channels,
kernel_size=3,
groups=groups,
**kwargs)
self.shortcut = shortcut and in_channels == out_channels
def forward(self, x):
if self.shortcut:
return x + self.conv2(self.conv1(x))
else:
return self.conv2(self.conv1(x))
class BottleneckCSP(BaseModule):
# CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks
def __init__(self,
in_channels,
out_channels,
repetition=1,
shortcut=True,
groups=1,
expansion=0.5,
csp_act_cfg=dict(type='Mish'),
init_cfg=None,
**kwargs):
super(BottleneckCSP, self).__init__(init_cfg)
hidden_channels = int(out_channels * expansion) # hidden channels
self.conv1 = Conv(
in_channels, hidden_channels, kernel_size=1, **kwargs)
self.conv2 = nn.Conv2d(in_channels, hidden_channels, 1, 1, bias=False)
self.conv3 = nn.Conv2d(
hidden_channels, hidden_channels, 1, 1, bias=False)
self.conv4 = Conv(
2 * hidden_channels, out_channels, kernel_size=1, **kwargs)
csp_norm_cfg = kwargs.get('norm_cfg', dict(type='BN')).copy()
self.bn = build_norm_layer(csp_norm_cfg, 2 * hidden_channels)[-1]
csp_act_cfg_ = csp_act_cfg.copy()
if csp_act_cfg_['type'] not in [
'Tanh', 'PReLU', 'Sigmoid', 'HSigmoid', 'Swish'
]:
csp_act_cfg_.setdefault('inplace', True)
self.csp_act = build_activation_layer(csp_act_cfg_)
self.bottlenecks = nn.Sequential(*[
Bottleneck(
hidden_channels,
hidden_channels,
shortcut,
groups,
expansion=1.0,
**kwargs) for _ in range(repetition)
])
def forward(self, x):
y1 = self.conv3(self.bottlenecks(self.conv1(x)))
y2 = self.conv2(x)
return self.conv4(self.csp_act(self.bn(torch.cat((y1, y2), dim=1))))
class BottleneckCSP2(BaseModule):
# CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks
def __init__(self,
in_channels,
out_channels,
repetition=1,
shortcut=False,
groups=1,
csp_act_cfg=dict(type='Mish'),
init_cfg=None,
**kwargs):
super(BottleneckCSP2, self).__init__(init_cfg)
hidden_channels = int(out_channels) # hidden channels
self.conv1 = Conv(
in_channels, hidden_channels, kernel_size=1, **kwargs)
self.conv2 = nn.Conv2d(
hidden_channels, hidden_channels, 1, 1, bias=False)
self.conv3 = Conv(
2 * hidden_channels, out_channels, kernel_size=1, **kwargs)
csp_norm_cfg = kwargs.get('norm_cfg', dict(type='BN')).copy()
self.bn = build_norm_layer(csp_norm_cfg, 2 * hidden_channels)[-1]
csp_act_cfg_ = csp_act_cfg.copy()
if csp_act_cfg_['type'] not in [
'Tanh', 'PReLU', 'Sigmoid', 'HSigmoid', 'Swish'
]:
csp_act_cfg_.setdefault('inplace', True)
self.csp_act = build_activation_layer(csp_act_cfg_)
self.bottlenecks = nn.Sequential(*[
Bottleneck(
hidden_channels,
hidden_channels,
shortcut,
groups,
expansion=1.0,
**kwargs) for _ in range(repetition)
])
def forward(self, x):
x1 = self.conv1(x)
y1 = self.bottlenecks(x1)
y2 = self.conv2(x1)
return self.conv3(self.csp_act(self.bn(torch.cat((y1, y2), dim=1))))
class SPPV5(BaseModule):
# Spatial pyramid pooling layer used in YOLOv3-SPP
def __init__(self,
in_channels,
out_channels,
pooling_kernel_size=(5, 9, 13),
init_cfg=None,
**kwargs):
super(SPPV5, self).__init__(init_cfg)
hidden_channels = in_channels // 2 # hidden channels
self.conv1 = Conv(
in_channels, hidden_channels, kernel_size=1, **kwargs)
self.conv2 = Conv(
hidden_channels * (len(pooling_kernel_size) + 1),
out_channels,
kernel_size=1,
**kwargs)
self.maxpools = nn.ModuleList([
nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2)
for x in pooling_kernel_size
])
def forward(self, x):
x = self.conv1(x)
return self.conv2(
torch.cat([x] + [maxpool(x) for maxpool in self.maxpools], 1))
class SPPV4(BaseModule):
# CSP SPP https://github.com/WongKinYiu/CrossStagePartialNetworks
def __init__(self,
in_channels,
out_channels,
expansion=0.5,
pooling_kernel_size=(5, 9, 13),
csp_act_cfg=dict(type='Mish'),
init_cfg=None,
**kwargs):
super(SPPV4, self).__init__(init_cfg)
hidden_channels = int(2 * out_channels * expansion) # hidden channels
self.conv1 = Conv(
in_channels, hidden_channels, kernel_size=1, **kwargs)
self.conv2 = nn.Conv2d(in_channels, hidden_channels, 1, 1, bias=False)
self.conv3 = Conv(
hidden_channels, hidden_channels, kernel_size=3, **kwargs)
self.conv4 = Conv(
hidden_channels, hidden_channels, kernel_size=1, **kwargs)
self.maxpools = nn.ModuleList([
nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2)
for x in pooling_kernel_size
])
self.conv5 = Conv(
4 * hidden_channels, hidden_channels, kernel_size=1, **kwargs)
self.conv6 = Conv(
hidden_channels, hidden_channels, kernel_size=3, **kwargs)
csp_norm_cfg = kwargs.get('norm_cfg', dict(type='BN')).copy()
self.bn = build_norm_layer(csp_norm_cfg, 2 * hidden_channels)[-1]
csp_act_cfg_ = csp_act_cfg.copy()
if csp_act_cfg_['type'] not in [
'Tanh', 'PReLU', 'Sigmoid', 'HSigmoid', 'Swish'
]:
csp_act_cfg_.setdefault('inplace', True)
self.csp_act = build_activation_layer(csp_act_cfg_)
self.conv7 = Conv(
2 * hidden_channels, out_channels, kernel_size=1, **kwargs)
def forward(self, x):
x1 = self.conv4(self.conv3(self.conv1(x)))
y1 = self.conv6(
self.conv5(
torch.cat([x1] + [maxpool(x1) for maxpool in self.maxpools],
1)))
y2 = self.conv2(x)
return self.conv7(self.csp_act(self.bn(torch.cat((y1, y2), dim=1))))
class Focus(BaseModule):
# Focus wh information into c-space
# Implement with ordinary Conv2d with
# doubled kernel/padding size & stride 2
def __init__(self,
in_channels,
out_channels,
kernel_size=1,
stride=1,
groups=1,
init_cfg=None,
**kwargs):
super(Focus, self).__init__(init_cfg)
padding = kernel_size // 2
kernel_size *= 2
padding *= 2
stride *= 2
self.conv = Conv(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
**kwargs)
def forward(self, x):
return self.conv(x)
class CSPStage(BaseModule):
def __init__(self,
in_channels,
out_channels,
repetition,
init_cfg=None,
**kwargs):
super(CSPStage, self).__init__(init_cfg)
self.conv_downscale = Conv(
in_channels, out_channels, kernel_size=3, stride=2, **kwargs)
self.conv_csp = BottleneckCSP(out_channels, out_channels, repetition,
**kwargs)
def forward(self, x):
return self.conv_csp(self.conv_downscale(x))
class SPPV5Stage(BaseModule):
def __init__(self,
in_channels,
out_channels,
repetition,
init_cfg=None,
**kwargs):
super(SPPV5Stage, self).__init__(init_cfg)
self.conv_downscale = Conv(
in_channels, out_channels, kernel_size=3, stride=2, **kwargs)
self.spp = SPPV5(
out_channels, out_channels, pooling_kernel_size=(5, 9, 13))
# self.conv_csp = BottleneckCSP(out_channels, out_channels, repetition,
# **kwargs)
def forward(self, x):
# return self.conv_csp(self.spp(self.conv_downscale(x)))
return self.spp(self.conv_downscale(x))
class SPPV4Stage(BaseModule):
def __init__(self,
in_channels,
out_channels,
repetition,
init_cfg=None,
**kwargs):
super(SPPV4Stage, self).__init__(init_cfg)
self.conv_downscale = Conv(
in_channels, out_channels * 2, kernel_size=3, stride=2, **kwargs)
self.conv_csp = BottleneckCSP(out_channels * 2, out_channels * 2,
repetition, **kwargs)
self.spp = SPPV4(
out_channels * 2, out_channels, pooling_kernel_size=(5, 9, 13))
def forward(self, x):
return self.spp(self.conv_csp(self.conv_downscale(x)))
class BottleneckStage(BaseModule):
def __init__(self,
in_channels,
out_channels,
repetition,
init_cfg=None,
**kwargs):
super(BottleneckStage, self).__init__(init_cfg)
self.conv_downscale = Conv(
in_channels, out_channels, kernel_size=3, stride=2, **kwargs)
self.conv_bottleneck = Bottleneck(out_channels, out_channels,
repetition, **kwargs)
def forward(self, x):
return self.conv_bottleneck(self.conv_downscale(x))
@BACKBONES.register_module()
class DarknetCSP(BaseModule):
"""Darknet backbone.
Args:
scale (int): scale of DarknetCSP. 's'|'x'|'m'|'l'|
out_indices (Sequence[int]): Output from which stages.
frozen_stages (int): Stages to be frozen (stop grad and set eval mode).
-1 means not freezing any parameters. Default: -1.
conv_cfg (dict): Config dict for convolution layer. Default: None.
norm_cfg (dict): Dictionary to construct and config norm layer.
Default: dict(type='BN', requires_grad=True)
act_cfg (dict): Config dict for activation layer.
Default: dict(type='Mish').
norm_eval (bool): Whether to set norm layers to eval mode, namely,
freeze running stats (mean and var). Note: Effect on Batch Norm
and its variants only.
"""
arch_settings = {
'v4s5p': [['conv', 'bottleneck', 'csp', 'csp', 'csp', 'sppv4'],
[None, 1, 1, 3, 3, 1], [16, 32, 64, 128, 256, 256]],
'v4m5p': [['conv', 'bottleneck', 'csp', 'csp', 'csp', 'sppv4'],
[None, 1, 1, 5, 5, 3], [24, 48, 96, 192, 384, 384]],
'v4l5p': [['conv', 'bottleneck', 'csp', 'csp', 'csp', 'sppv4'],
[None, 1, 2, 8, 8, 4], [32, 64, 128, 256, 512, 512]],
'v4x5p': [['conv', 'bottleneck', 'csp', 'csp', 'csp', 'sppv4'],
[None, 1, 3, 11, 11, 5], [40, 80, 160, 320, 640, 640]],
'v4l6p': [['conv', 'csp', 'csp', 'csp', 'csp', 'csp', 'sppv4'],
[None, 1, 3, 15, 15, 7, 7],
[32, 64, 128, 256, 512, 1024, 512]],
'v4x7p': [['conv', 'csp', 'csp', 'csp', 'csp', 'csp', 'csp', 'sppv4'],
[None, 1, 3, 15, 15, 7, 7, 7],
[40, 80, 160, 320, 640, 1280, 1280, 640]],
'v5s5p': [['focus', 'csp', 'csp', 'csp', 'sppv5'], [None, 1, 3, 3, 1],
[32, 64, 128, 256, 512]],
's2dcsp': [['focus', 'csp', 'csp', 'csp', 'sppv5'], [None, 1, 1, 1, 1],
[32, 64, 128, 256, 512]],
'v5m5p': [['focus', 'csp', 'csp', 'csp', 'sppv5'], [None, 2, 6, 6, 2],
[48, 96, 192, 384, 768]],
'v5l5p': [['focus', 'csp', 'csp', 'csp', 'sppv5'], [None, 3, 9, 9, 3],
[64, 128, 256, 512, 1024]],
'v5x5p': [['focus', 'csp', 'csp', 'csp', 'sppv5'],
[None, 4, 12, 12, 4], [80, 160, 320, 640, 1280]],
}
def __init__(self,
scale='x5p',
out_indices=(3, 4, 5),
frozen_stages=-1,
norm_cfg=dict(
type='BN', requires_grad=True, eps=0.001, momentum=0.03),
act_cfg=dict(type='Mish'),
csp_act_cfg=dict(type='Mish'),
norm_eval=False,
pretrained=None,
init_cfg=None):
super(DarknetCSP, self).__init__(init_cfg)
if isinstance(scale, str):
if scale not in self.arch_settings:
raise KeyError(f'invalid scale {scale} for DarknetCSP')
stage, repetition, channels = self.arch_settings[scale]
else:
stage, repetition, channels = scale
self.out_indices = out_indices
self.frozen_stages = frozen_stages
cfg = dict(
norm_cfg=norm_cfg,
act_cfg=act_cfg,
csp_act_cfg=csp_act_cfg,
init_cfg=init_cfg)
self.layers = []
cin = 3
for i, (stg, rep, cout) in enumerate(zip(stage, repetition, channels)):
layer_name = f'{stg}{i}'
self.layers.append(layer_name)
if stg == 'conv':
self.add_module(layer_name, Conv(cin, cout, 3, **cfg))
elif stg == 'bottleneck':
self.add_module(layer_name,
BottleneckStage(cin, cout, rep, **cfg))
elif stg == 'csp':
self.add_module(layer_name, CSPStage(cin, cout, rep, **cfg))
elif stg == 'focus':
self.add_module(layer_name, Focus(cin, cout, 3, **cfg))
elif stg == 'sppv4':
self.add_module(layer_name, SPPV4Stage(cin, cout, rep, **cfg))
elif stg == 'sppv5':
self.add_module(layer_name, SPPV5Stage(cin, cout, rep, **cfg))
else:
raise NotImplementedError
cin = cout
self.norm_eval = norm_eval
self.fp16_enabled = False
assert not (init_cfg and pretrained), \
'init_cfg and pretrained cannot be setting at the same time'
if isinstance(pretrained, str):
warnings.warn('DeprecationWarning: pretrained is a deprecated, '
'please use "init_cfg" instead')
self.init_cfg = dict(type='Pretrained', checkpoint=pretrained)
elif pretrained is None:
if init_cfg is None:
self.init_cfg = [
dict(type='Kaiming', layer='Conv2d'),
dict(
type='Constant',
val=1,
layer=['_BatchNorm', 'GroupNorm'])
]
else:
raise TypeError('pretrained must be a str or None')
@auto_fp16()
def forward(self, x):
outs = []
for i, layer_name in enumerate(self.layers):
layer = getattr(self, layer_name)
x = layer(x)
if i in self.out_indices:
outs.append(x)
# return tuple(outs)
return outs
def init_weights(self, pretrained=None):
if isinstance(pretrained, str):
logger = logging.getLogger()
load_checkpoint(self, pretrained, strict=False, logger=logger)
elif pretrained is None:
for m in self.modules():
if isinstance(m, nn.Conv2d):
kaiming_init(m)
elif isinstance(m, (_BatchNorm, nn.GroupNorm)):
constant_init(m, 1)
def _freeze_stages(self):
if self.frozen_stages >= 0:
for i in range(0, self.frozen_stages):
m = getattr(self, self.layers[i])
m.eval()
for param in m.parameters():
param.requires_grad = False
def train(self, mode=True):
super(DarknetCSP, self).train(mode)
self._freeze_stages()
if mode and self.norm_eval:
for m in self.modules():
if isinstance(m, _BatchNorm):
m.eval()
下载YoloV8代码
直接下载
Git Clone
git clone https://github.com/ultralytics/ultralytics
安装环境
进入代码根目录并安装依赖。
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
在最新版本中,官方已经废弃了requirements.txt
文件,转而将所有必要的代码和依赖整合进了ultralytics
包中。因此,用户只需安装这个单一的ultralytics
库,就能获得所需的全部功能和环境依赖。
pip install ultralytics
引入代码
在根目录下的ultralytics/nn/
目录,新建一个 featureFusion
目录,然后新建一个以 GFPN
为文件名的py文件, 把代码拷贝进去。
import torch
import torch.nn as nn
import torch.nn.functional as F
def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1):
'''Basic cell for rep-style block, including conv and bn'''
result = nn.Sequential()
result.add_module(
'conv',
nn.Conv2d(in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=False))
result.add_module('bn', nn.BatchNorm2d(num_features=out_channels))
return result
class RepConv(nn.Module):
'''RepConv is a basic rep-style block, including training and deploy status
Code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py
'''
def __init__(self,
in_channels,
out_channels,
kernel_size=3,
stride=1,
padding=1,
dilation=1,
groups=1,
padding_mode='zeros',
deploy=False,
act='relu',
norm=None):
super(RepConv, self).__init__()
self.deploy = deploy
self.groups = groups
self.in_channels = in_channels
self.out_channels = out_channels
assert kernel_size == 3
assert padding == 1
padding_11 = padding - kernel_size // 2
if isinstance(act, str):
self.nonlinearity = get_activation(act)
else:
self.nonlinearity = act
if deploy:
self.rbr_reparam = nn.Conv2d(in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias=True,
padding_mode=padding_mode)
else:
self.rbr_identity = None
self.rbr_dense = conv_bn(in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups)
self.rbr_1x1 = conv_bn(in_channels=in_channels,
out_channels=out_channels,
kernel_size=1,
stride=stride,
padding=padding_11,
groups=groups)
def forward(self, inputs):
'''Forward process'''
if hasattr(self, 'rbr_reparam'):
return self.nonlinearity(self.rbr_reparam(inputs))
if self.rbr_identity is None:
id_out = 0
else:
id_out = self.rbr_identity(inputs)
return self.nonlinearity(
self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out)
def get_equivalent_kernel_bias(self):
kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
return kernel3x3 + self._pad_1x1_to_3x3_tensor(
kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid
def _pad_1x1_to_3x3_tensor(self, kernel1x1):
if kernel1x1 is None:
return 0
else:
return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])
def _fuse_bn_tensor(self, branch):
if branch is None:
return 0, 0
if isinstance(branch, nn.Sequential):
kernel = branch.conv.weight
running_mean = branch.bn.running_mean
running_var = branch.bn.running_var
gamma = branch.bn.weight
beta = branch.bn.bias
eps = branch.bn.eps
else:
assert isinstance(branch, nn.BatchNorm2d)
if not hasattr(self, 'id_tensor'):
input_dim = self.in_channels // self.groups
kernel_value = np.zeros((self.in_channels, input_dim, 3, 3),
dtype=np.float32)
for i in range(self.in_channels):
kernel_value[i, i % input_dim, 1, 1] = 1
self.id_tensor = torch.from_numpy(kernel_value).to(
branch.weight.device)
kernel = self.id_tensor
running_mean = branch.running_mean
running_var = branch.running_var
gamma = branch.weight
beta = branch.bias
eps = branch.eps
std = (running_var + eps).sqrt()
t = (gamma / std).reshape(-1, 1, 1, 1)
return kernel * t, beta - running_mean * gamma / std
def switch_to_deploy(self):
if hasattr(self, 'rbr_reparam'):
return
kernel, bias = self.get_equivalent_kernel_bias()
self.rbr_reparam = nn.Conv2d(
in_channels=self.rbr_dense.conv.in_channels,
out_channels=self.rbr_dense.conv.out_channels,
kernel_size=self.rbr_dense.conv.kernel_size,
stride=self.rbr_dense.conv.stride,
padding=self.rbr_dense.conv.padding,
dilation=self.rbr_dense.conv.dilation,
groups=self.rbr_dense.conv.groups,
bias=True)
self.rbr_reparam.weight.data = kernel
self.rbr_reparam.bias.data = bias
for para in self.parameters():
para.detach_()
self.__delattr__('rbr_dense')
self.__delattr__('rbr_1x1')
if hasattr(self, 'rbr_identity'):
self.__delattr__('rbr_identity')
if hasattr(self, 'id_tensor'):
self.__delattr__('id_tensor')
self.deploy = True
class Swish(nn.Module):
def __init__(self, inplace=True):
super(Swish, self).__init__()
self.inplace = inplace
def forward(self, x):
if self.inplace:
x.mul_(F.sigmoid(x))
return x
else:
return x * F.sigmoid(x)
def get_activation(name='silu', inplace=True):
if name is None:
return nn.Identity()
if isinstance(name, str):
if name == 'silu':
module = nn.SiLU(inplace=inplace)
elif name == 'relu':
module = nn.ReLU(inplace=inplace)
elif name == 'lrelu':
module = nn.LeakyReLU(0.1, inplace=inplace)
elif name == 'swish':
module = Swish(inplace=inplace)
elif name == 'hardsigmoid':
module = nn.Hardsigmoid(inplace=inplace)
elif name == 'identity':
module = nn.Identity()
else:
raise AttributeError('Unsupported act type: {}'.format(name))
return module
elif isinstance(name, nn.Module):
return name
else:
raise AttributeError('Unsupported act type: {}'.format(name))
def get_norm(name, out_channels, inplace=True):
if name == 'bn':
module = nn.BatchNorm2d(out_channels)
else:
raise NotImplementedError
return module
class ConvBNAct(nn.Module):
"""A Conv2d -> Batchnorm -> silu/leaky relu block"""
def __init__(
self,
in_channels,
out_channels,
ksize,
stride=1,
groups=1,
bias=False,
act='silu',
norm='bn',
reparam=False,
):
super().__init__()
# same padding
pad = (ksize - 1) // 2
self.conv = nn.Conv2d(
in_channels,
out_channels,
kernel_size=ksize,
stride=stride,
padding=pad,
groups=groups,
bias=bias,
)
if norm is not None:
self.bn = get_norm(norm, out_channels, inplace=True)
if act is not None:
self.act = get_activation(act, inplace=True)
self.with_norm = norm is not None
self.with_act = act is not None
def forward(self, x):
x = self.conv(x)
if self.with_norm:
x = self.bn(x)
if self.with_act:
x = self.act(x)
return x
def fuseforward(self, x):
return self.act(self.conv(x))
class BasicBlock_3x3_Reverse(nn.Module):
def __init__(self,
ch_in,
ch_hidden_ratio,
ch_out,
act='relu',
shortcut=True):
super(BasicBlock_3x3_Reverse, self).__init__()
assert ch_in == ch_out
ch_hidden = int(ch_in * ch_hidden_ratio)
self.conv1 = ConvBNAct(ch_hidden, ch_out, 3, stride=1, act=act)
self.conv2 = RepConv(ch_in, ch_hidden, 3, stride=1, act=act)
self.shortcut = shortcut
def forward(self, x):
y = self.conv2(x)
y = self.conv1(y)
if self.shortcut:
return x + y
else:
return y
class SPP(nn.Module):
def __init__(
self,
ch_in,
ch_out,
k,
pool_size,
act='swish',
):
super(SPP, self).__init__()
self.pool = []
for i, size in enumerate(pool_size):
pool = nn.MaxPool2d(kernel_size=size,
stride=1,
padding=size // 2,
ceil_mode=False)
self.add_module('pool{}'.format(i), pool)
self.pool.append(pool)
self.conv = ConvBNAct(ch_in, ch_out, k, act=act)
def forward(self, x):
outs = [x]
for pool in self.pool:
outs.append(pool(x))
y = torch.cat(outs, axis=1)
y = self.conv(y)
return y
class CSPStage(nn.Module):
def __init__(self,
ch_in,
ch_out,
n,
block_fn='BasicBlock_3x3_Reverse',
ch_hidden_ratio=1.0,
act='silu',
spp=False):
super(CSPStage, self).__init__()
split_ratio = 2
ch_first = int(ch_out // split_ratio)
ch_mid = int(ch_out - ch_first)
self.conv1 = ConvBNAct(ch_in, ch_first, 1, act=act)
self.conv2 = ConvBNAct(ch_in, ch_mid, 1, act=act)
self.convs = nn.Sequential()
next_ch_in = ch_mid
for i in range(n):
if block_fn == 'BasicBlock_3x3_Reverse':
self.convs.add_module(
str(i),
BasicBlock_3x3_Reverse(next_ch_in,
ch_hidden_ratio,
ch_mid,
act=act,
shortcut=True))
else:
raise NotImplementedError
if i == (n - 1) // 2 and spp:
self.convs.add_module(
'spp', SPP(ch_mid * 4, ch_mid, 1, [5, 9, 13], act=act))
next_ch_in = ch_mid
self.conv3 = ConvBNAct(ch_mid * n + ch_first, ch_out, 1, act=act)
def forward(self, x):
y1 = self.conv1(x)
y2 = self.conv2(x)
mid_out = [y1]
for conv in self.convs:
y2 = conv(y2)
mid_out.append(y2)
y = torch.cat(mid_out, axis=1)
y = self.conv3(y)
return y
注册
在ultralytics/nn/tasks.py
中进行如下操作:
步骤1:
from ultralytics.nn.featureFusion.GFPN import CSPStage
步骤2
修改def parse_model(d, ch, verbose=True)
:
if m in (
Classify,
Conv,
ConvTranspose,
GhostConv,
Bottleneck,
GhostBottleneck,
SPP,
SPPF,
DWConv,
Focus,
BottleneckCSP,
C1,
C2,
C2f,
C3,
C3TR,
C3Ghost,
nn.ConvTranspose2d,
DWConvTranspose2d,
C3x,
RepC3,
EVCBlock,
CloFormerAttnConv,
C2f_iAFF,
CSPStage,
):
c1, c2 = ch[f], args[0]
if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output)
c2 = make_divisible(min(c2, max_channels) * width, 8)
args = [c1, c2, *args[1:]]
if m in (BottleneckCSP, C1, C2, C2f, C3, C3TR, C3Ghost, C3x, RepC3, C2f_deformable_LKA,Sea_AttentionBlock, C2f_iAFF,CSPStage):
args.insert(2, n) # number of repeats
n = 1
配置yolov8_GFPN.yaml
ultralytics/ultralytics/cfg/models/v8/yolov8_GFPN.yaml
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 2 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f, [256, True]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
# YOLOv8.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 3, CSPStage, [512]] # 12
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 3, CSPStage, [256]] # 15 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, CSPStage, [512]] # 18 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, CSPStage, [1024]] # 21 (P5/32-large)
- [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)
配置yolov8_GFPN.yaml
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 2 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f, [256, True]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
# DAMO-YOLO GFPN Head
head:
- [-1, 1, Conv, [512, 1, 1]] # 10
- [6, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]]
- [-1, 3, CSPStage, [512]] # 13
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] #14
- [4, 1, Conv, [256, 3, 2]] # 15
- [[14, -1, 6], 1, Concat, [1]]
- [-1, 3, CSPStage, [512]] # 17
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]]
- [-1, 3, CSPStage, [256]] # 20
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 17], 1, Concat, [1]]
- [-1, 3, CSPStage, [512]] # 23
- [17, 1, Conv, [256, 3, 2]] # 24
- [23, 1, Conv, [256, 3, 2]] # 25
- [[13, 24, -1], 1, Concat, [1]]
- [-1, 3, CSPStage, [1024]] # 27
- [[20, 23, 27], 1, Detect, [nc]] # Detect(P3, P4, P5)
配置yolov8_GFPN.yaml
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f, [256, True]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
# DAMO-YOLO GFPN Head
head:
- [-1, 1, Conv, [512, 1, 1]] # 10
- [6, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]]
- [-1, 3, CSPStage, [512]] # 13
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] #14
- [4, 1, Conv, [256, 3, 2]] # 15
- [[14, -1, 6], 1, Concat, [1]]
- [-1, 3, CSPStage, [512]] # 17
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]]
- [-1, 3, CSPStage, [256]] # 20
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 17], 1, Concat, [1]]
- [-1, 3, CSPStage, [512]] # 23
- [17, 1, Conv, [256, 3, 2]] # 24
- [23, 1, Conv, [256, 3, 2]] # 25
- [[13, 24, -1], 1, Concat, [1]]
- [-1, 3, CSPStage, [1024]] # 27
- [[17, 20, 23, 27], 1, Detect, [nc]] # Detect(P3, P4, P5)
实验
脚本
import os
from ultralytics import YOLO
# Define the configuration options directly
yaml = 'ultralytics/cfg/models/v8/yolov8_GFPN.yaml'
# Initialize the YOLO model with the specified YAML file
model = YOLO(yaml)
# Print model information
model.info()
if __name__ == "__main__":
# Train the model with the specified parameters
results = model.train(data='ultralytics/datasets/original-license-plates.yaml',
name='GFPN',
epochs=10,
workers=8,
batch=1)
结果
文章目录
![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/cba8f22bcc7f4c5bb2ba32caaf61e709.png#pic_center)