💡💡💡本文解决的问题点:创新点为什么在自己数据集不涨点,甚至出现降点的现象???
💡💡💡原因分析:不同数据集加入创新点存在表现不一致是正常现象,甚至放在网络不同位置也存在有的位置能够涨点,有的位置降点现象!!!
💡💡💡如何解决: 将创新点放入不同网络位置并提供对应的yaml文件,总有一种能够在你数据集下高效涨点
YOLOv9框架图
《YOLOv9魔术师专栏》将从以下各个方向进行创新:
【原创自研模块】【多组合点优化】【主干篇】【neck优化】【注意力机制】【卷积魔改】【block&多尺度融合结合】【损失&IOU优化】【上下采样优化 】【SPPELAN & RepNCSPELAN4优化】【小目标性能提升】【前沿论文分享】【训练实战篇】
订阅者通过添加WX: AI_CV_0624,入群沟通,提供改进结构图等一系列定制化服务。
订阅者可以申请发票,便于报销
YOLOv9魔术师专栏
💡💡💡为本专栏订阅者提供创新点改进代码,改进网络结构图,方便paper写作!!!
💡💡💡适用场景:红外、小目标检测、工业缺陷检测、医学影像、遥感目标检测、低对比度场景
💡💡💡适用任务:所有改进点适用【检测】、【分割】、【pose】、【分类】等
💡💡💡全网独家首发创新,【自研多个自研模块】,【多创新点组合适合paper 】!!!
☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️ ☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️
包含注意力机制魔改、卷积魔改、检测头创新、损失&IOU优化、block优化&多层特征融合、 轻量级网络设计、24年最新顶会改进思路、原创自研paper级创新等
🚀🚀🚀 本项目持续更新 | 更新完结保底≥80+ ,冲刺100+ 🚀🚀🚀
🍉🍉🍉 联系WX: AI_CV_0624 欢迎交流!🍉🍉🍉
⭐⭐⭐专栏涨价趋势 199->259->299,越早订阅越划算⭐⭐⭐
💡💡💡 2024年计算机视觉顶会创新点适用于Yolov5、Yolov7、Yolov8等各个Yolo系列,专栏文章提供每一步步骤和源码,轻松带你上手魔改网络 !!!
💡💡💡重点:通过本专栏的阅读,后续你也可以设计魔改网络,在网络不同位置(Backbone、head、detect、loss等)进行魔改,实现创新!!!
☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️ ☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️☁️
1.YOLOv9原理介绍
论文: 2402.13616.pdf (arxiv.org)
代码:GitHub - WongKinYiu/yolov9: Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information摘要: 如今的深度学习方法重点关注如何设计最合适的目标函数,从而使得模型的预测结果能够最接近真实情况。同时,必须设计一个适当的架构,可以帮助获取足够的信息进行预测。然而,现有方法忽略了一个事实,即当输入数据经过逐层特征提取和空间变换时,大量信息将会丢失。因此,YOLOv9 深入研究了数据通过深度网络传输时数据丢失的重要问题,即信息瓶颈和可逆函数。作者提出了可编程梯度信息(programmable gradient information,PGI)的概念,来应对深度网络实现多个目标所需要的各种变化。PGI 可以为目标任务计算目标函数提供完整的输入信息,从而获得可靠的梯度信息来更新网络权值。此外,研究者基于梯度路径规划设计了一种新的轻量级网络架构,即通用高效层聚合网络(Generalized Efficient Layer Aggregation Network,GELAN)。该架构证实了 PGI 可以在轻量级模型上取得优异的结果。研究者在基于 MS COCO 数据集的目标检测任务上验证所提出的 GELAN 和 PGI。结果表明,与其他 SOTA 方法相比,GELAN 仅使用传统卷积算子即可实现更好的参数利用率。对于 PGI 而言,它的适用性很强,可用于从轻型到大型的各种模型。我们可以用它来获取完整的信息,从而使从头开始训练的模型能够比使用大型数据集预训练的 SOTA 模型获得更好的结果。对比结果如图1所示。
2.EMA注意力介绍
论文:https://arxiv.org/abs/2305.13563v1
录用:ICASSP2023
通过通道降维来建模跨通道关系可能会给提取深度视觉表示带来副作用。本文提出了一种新的高效的多尺度注意力(EMA)模块。以保留每个通道上的信息和降低计算开销为目标,将部分通道重塑为批量维度,并将通道维度分组为多个子特征,使空间语义特征在每个特征组中均匀分布。
提出了一种新的无需降维的高效多尺度注意力(efficient multi-scale attention, EMA)。请注意,这里只有两个卷积核将分别放置在并行子网络中。其中一个并行子网络是一个1x1卷积核,以与CA相同的方式处理,另一个是一个3x3卷积核。为了证明所提出的EMA的通用性,详细的实验在第4节中给出,包括在CIFAR-100、ImageNet-1k、COCO和VisDrone2019基准上的结果。图1给出了图像分类和目标检测任务的实验结果。我们的主要贡献如下:
本文提出了一种新的跨空间学习方法,并设计了一个多尺度并行子网络来建立短和长依赖关系。
1)我们考虑一种通用方法,将部分通道维度重塑为批量维度,以避免通过通用卷积进行某种形式的降维。
2)除了在不进行通道降维的情况下在每个并行子网络中构建局部的跨通道交互外,我们还通过跨空间学习方法融合两个并行子网络的输出特征图。
3)与CBAM、NAM[16]、SA、ECA和CA相比,EMA不仅取得了更好的结果,而且在所需参数方面效率更高。
CA块首先可以被视为与SE注意力模块类似的方法,其中利用全局平均池化操作对跨通道信息进行建模。通常,可以通过使用全局平均池化来生成信道统计信息,其中全局空间位置信息被压缩到信道描述符中。与SE微妙不同的是,CA将空间位置信息嵌入通道注意图以增强特征聚合。
并行子结构帮助网络避免更多的顺序处理和大深度。给定上述并行处理策略,我们在EMA模块中采用它。EMA的整体结构如图3 (b)所示。在本节中,我们将讨论EMA如何在卷积操作中不进行通道降维的情况下学习有效的通道描述,并为高级特征图产生更好的像素级注意力。具体来说,我们只从CA模块中挑选出1x1卷积的共享组件,在我们的EMA中将其命名为1x1分支。为了聚合多尺度空间结构信息,将3x3内核与1x1分支并行放置以实现快速响应,我们将其命名为3x3分支。考虑到特征分组和多尺度结构,有效地建立短期和长程依赖有利于获得更好的性能。
3.如何将EMA放在不同网络位置
3.1 yolov9-c-EMA1.yaml
最经典的一种用法,将EMA放在backbone最后面
# YOLOv9
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
#activation: nn.LeakyReLU(0.1)
#activation: nn.ReLU()
# anchors
anchors: 3
# YOLOv9 backbone
backbone:
[
[-1, 1, Silence, []],
# conv down
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
# conv down
[-1, 1, Conv, [128, 3, 2]], # 2-P2/4
# elan-1 block
[-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 3
# avg-conv down
[-1, 1, ADown, [256]], # 4-P3/8
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 5
# avg-conv down
[-1, 1, ADown, [512]], # 6-P4/16
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 7
# avg-conv down
[-1, 1, ADown, [512]], # 8-P5/32
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 9
[-1, 1, EMA_attention, [512]], # 10
]
# YOLOv9 head
head:
[
# elan-spp block
[-1, 1, SPPELAN, [512, 256]], # 11
# up-concat merge
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 7], 1, Concat, [1]], # cat backbone P4
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 14
# up-concat merge
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 5], 1, Concat, [1]], # cat backbone P3
# elan-2 block
[-1, 1, RepNCSPELAN4, [256, 256, 128, 1]], # 17 (P3/8-small)
# avg-conv-down merge
[-1, 1, ADown, [256]],
[[-1, 14], 1, Concat, [1]], # cat head P4
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 20 (P4/16-medium)
# avg-conv-down merge
[-1, 1, ADown, [512]],
[[-1, 11], 1, Concat, [1]], # cat head P5
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 23 (P5/32-large)
# multi-level reversible auxiliary branch
# routing
[5, 1, CBLinear, [[256]]], # 24
[7, 1, CBLinear, [[256, 512]]], # 25
[9, 1, CBLinear, [[256, 512, 512]]], # 26
# conv down
[0, 1, Conv, [64, 3, 2]], # 27-P1/2
# conv down
[-1, 1, Conv, [128, 3, 2]], # 28-P2/4
# elan-1 block
[-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 29
# avg-conv down fuse
[-1, 1, ADown, [256]], # 30-P3/8
[[24, 25, 26, -1], 1, CBFuse, [[0, 0, 0]]], # 31
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 32
# avg-conv down fuse
[-1, 1, ADown, [512]], # 33-P4/16
[[25, 26, -1], 1, CBFuse, [[1, 1]]], # 34
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 35
# avg-conv down fuse
[-1, 1, ADown, [512]], # 36-P5/32
[[26, -1], 1, CBFuse, [[2]]], # 37
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 38
# detection head
# detect
[[32, 35, 38, 17, 20, 23], 1, DualDDetect, [nc]], # DualDDetect(A3, A4, A5, P3, P4, P5)
]
3.2 yolov9-c-EMA2.yaml
放在backbone RepNCSPELAN4后面
# YOLOv9
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
#activation: nn.LeakyReLU(0.1)
#activation: nn.ReLU()
# anchors
anchors: 3
# YOLOv9 backbone
backbone:
[
[-1, 1, Silence, []],
# conv down
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
# conv down
[-1, 1, Conv, [128, 3, 2]], # 2-P2/4
# elan-1 block
[-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 3
[-1, 1, EMA_attention, [256]], # 4
# avg-conv down
[-1, 1, ADown, [256]], # 5-P3/8
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 6
[-1, 1, EMA_attention, [512]], # 7
# avg-conv down
[-1, 1, ADown, [512]], # 8-P4/16
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 9
[-1, 1, EMA_attention, [512]], # 10
# avg-conv down
[-1, 1, ADown, [512]], # 11-P5/32
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 12
[-1, 1, EMA_attention, [512]], # 13
]
# YOLOv9 head
head:
[
# elan-spp block
[-1, 1, SPPELAN, [512, 256]], # 14
# up-concat merge
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 10], 1, Concat, [1]], # cat backbone P4
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 17
# up-concat merge
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 7], 1, Concat, [1]], # cat backbone P3
# elan-2 block
[-1, 1, RepNCSPELAN4, [256, 256, 128, 1]], # 20 (P3/8-small)
# avg-conv-down merge
[-1, 1, ADown, [256]],
[[-1, 17], 1, Concat, [1]], # cat head P4
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 23 (P4/16-medium)
# avg-conv-down merge
[-1, 1, ADown, [512]],
[[-1, 14], 1, Concat, [1]], # cat head P5
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 26 (P5/32-large)
# multi-level reversible auxiliary branch
# routing
[7, 1, CBLinear, [[256]]], # 27
[10, 1, CBLinear, [[256, 512]]], # 28
[13, 1, CBLinear, [[256, 512, 512]]], # 29
# conv down
[0, 1, Conv, [64, 3, 2]], # 30-P1/2
# conv down
[-1, 1, Conv, [128, 3, 2]], # 31-P2/4
# elan-1 block
[-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 32
# avg-conv down fuse
[-1, 1, ADown, [256]], # 33-P3/8
[[27, 28, 29, -1], 1, CBFuse, [[0, 0, 0]]], # 34
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 35
# avg-conv down fuse
[-1, 1, ADown, [512]], # 36-P4/16
[[28, 29, -1], 1, CBFuse, [[1, 1]]], # 37
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 38
# avg-conv down fuse
[-1, 1, ADown, [512]], # 39-P5/32
[[29, -1], 1, CBFuse, [[2]]], # 40
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 41
# detection head
# detect
[[35, 38, 41, 20, 23, 26], 1, DualDDetect, [nc]], # DualDDetect(A3, A4, A5, P3, P4, P5)
]
3.3 yolov9-c-EMA3.yaml
放在neck RepNCSPELAN4后面
# YOLOv9
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
#activation: nn.LeakyReLU(0.1)
#activation: nn.ReLU()
# anchors
anchors: 3
# YOLOv9 backbone
backbone:
[
[-1, 1, Silence, []],
# conv down
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
# conv down
[-1, 1, Conv, [128, 3, 2]], # 2-P2/4
# elan-1 block
[-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 3
# avg-conv down
[-1, 1, ADown, [256]], # 4-P3/8
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 5
# avg-conv down
[-1, 1, ADown, [512]], # 6-P4/16
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 7
# avg-conv down
[-1, 1, ADown, [512]], # 8-P5/32
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 9
]
# YOLOv9 head
head:
[
# elan-spp block
[-1, 1, SPPELAN, [512, 256]], # 10
# up-concat merge
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 7], 1, Concat, [1]], # cat backbone P4
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 13
[-1, 1, EMA_attention, [512]], # 14
# up-concat merge
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 5], 1, Concat, [1]], # cat backbone P3
# elan-2 block
[-1, 1, RepNCSPELAN4, [256, 256, 128, 1]], # 17 (P3/8-small)
[-1, 1, EMA_attention, [256]], # 18
# avg-conv-down merge
[-1, 1, ADown, [256]],
[[-1, 14], 1, Concat, [1]], # cat head P4
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 21 (P4/16-medium)
[-1, 1, EMA_attention, [512]], # 22
# avg-conv-down merge
[-1, 1, ADown, [512]],
[[-1, 10], 1, Concat, [1]], # cat head P5
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 25 (P5/32-large)
[-1, 1, EMA_attention, [512]], # 26
# multi-level reversible auxiliary branch
# routing
[5, 1, CBLinear, [[256]]], # 27
[7, 1, CBLinear, [[256, 512]]], # 28
[9, 1, CBLinear, [[256, 512, 512]]], # 29
# conv down
[0, 1, Conv, [64, 3, 2]], # 30-P1/2
# conv down
[-1, 1, Conv, [128, 3, 2]], # 31-P2/4
# elan-1 block
[-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 32
[-1, 1, EMA_attention, [256]], # 33
# avg-conv down fuse
[-1, 1, ADown, [256]], # 34-P3/8
[[27, 28, 29, -1], 1, CBFuse, [[0, 0, 0]]], # 35
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 36
[-1, 1, EMA_attention, [512]], # 37
# avg-conv down fuse
[-1, 1, ADown, [512]], # 38-P4/16
[[28, 29, -1], 1, CBFuse, [[1, 1]]], # 39
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 40
[-1, 1, EMA_attention, [512]], # 41
# avg-conv down fuse
[-1, 1, ADown, [512]], # 42-P5/32
[[29, -1], 1, CBFuse, [[2]]], # 43
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 44
[-1, 1, EMA_attention, [512]], # 45
# detection head
# detect
[[37, 41, 45, 18, 22, 26], 1, DualDDetect, [nc]], # DualDDetect(A3, A4, A5, P3, P4, P5)
]
3.4 yolov9-c-EMA4.yaml
放在detect前面,直接和detect结合
# YOLOv9
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
#activation: nn.LeakyReLU(0.1)
#activation: nn.ReLU()
# anchors
anchors: 3
# YOLOv9 backbone
backbone:
[
[-1, 1, Silence, []],
# conv down
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
# conv down
[-1, 1, Conv, [128, 3, 2]], # 2-P2/4
# elan-1 block
[-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 3
# avg-conv down
[-1, 1, ADown, [256]], # 4-P3/8
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 5
# avg-conv down
[-1, 1, ADown, [512]], # 6-P4/16
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 7
# avg-conv down
[-1, 1, ADown, [512]], # 8-P5/32
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 9
]
# YOLOv9 head
head:
[
# elan-spp block
[-1, 1, SPPELAN, [512, 256]], # 10
# up-concat merge
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 7], 1, Concat, [1]], # cat backbone P4
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 13
# up-concat merge
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 5], 1, Concat, [1]], # cat backbone P3
# elan-2 block
[-1, 1, RepNCSPELAN4, [256, 256, 128, 1]], # 16 (P3/8-small)
# avg-conv-down merge
[-1, 1, ADown, [256]],
[[-1, 13], 1, Concat, [1]], # cat head P4
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 19 (P4/16-medium)
# avg-conv-down merge
[-1, 1, ADown, [512]],
[[-1, 10], 1, Concat, [1]], # cat head P5
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 22 (P5/32-large)
# multi-level reversible auxiliary branch
# routing
[5, 1, CBLinear, [[256]]], # 23
[7, 1, CBLinear, [[256, 512]]], # 24
[9, 1, CBLinear, [[256, 512, 512]]], # 25
# conv down
[0, 1, Conv, [64, 3, 2]], # 26-P1/2
# conv down
[-1, 1, Conv, [128, 3, 2]], # 27-P2/4
# elan-1 block
[-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 28
# avg-conv down fuse
[-1, 1, ADown, [256]], # 29-P3/8
[[23, 24, 25, -1], 1, CBFuse, [[0, 0, 0]]], # 30
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 31
# avg-conv down fuse
[-1, 1, ADown, [512]], # 32-P4/16
[[24, 25, -1], 1, CBFuse, [[1, 1]]], # 33
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 34
# avg-conv down fuse
[-1, 1, ADown, [512]], # 35-P5/32
[[25, -1], 1, CBFuse, [[2]]], # 36
# elan-2 block
[-1, 1, RepNCSPELAN4, [512, 512, 256, 1]], # 37
[31, 1, EMA_attention, [512]],
[34, 1, EMA_attention, [512]],
[37, 1, EMA_attention, [512]],
[16, 1, EMA_attention, [256]],
[19, 1, EMA_attention, [512]],
[22, 1, EMA_attention, [512]],
# detection head
# detect
[[38, 39, 40, 41, 42, 43], 1, DualDDetect, [nc]], # DualDDetect(A3, A4, A5, P3, P4, P5)
]