【YOLOv7】《YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors》

bryant_meng

于 2024-08-01 20:57:12 发布

阅读量448

点赞数 16

分类专栏： CNN / Transformer 文章标签： YOLO 人工智能深度学习计算机视觉 yolov7

本文链接：https://blog.csdn.net/bryant_meng/article/details/131415482

版权

CNN / Transformer 专栏收录该内容

206 篇文章 7 订阅

订阅专栏

在这里插入图片描述

CVPR-2023

https://github.com/WongKinYiu/yolov7

1 Background and Motivation

YOLOv1：2015 年 Joseph Redmon 和 Ali Farhadi 等人（华盛顿大学）

YOLOv2：2016 年 Joseph Redmon 和 Ali Farhadi 等人（华盛顿大学）

YOLOv3：2018 年 Joseph Redmon 和 Ali Farhadi 等人（华盛顿大学）

插曲：YOLOv1-v3 作者 Joseph Redmon 宣布退出 CV 界，不再官方推出 YOLO 新工作

YOLOv4：2020 年 Alexey Bochkovskiy 和 Chien-Yao Wang 等人

YOLOv5：2020 年 Ultralytics 公司

YOLOv6：2022 年美团公司

YOLOv7：2022 年 Alexey Bochkovskiy 和 Chien-Yao Wang 等人

计算机视觉任务中，实时性的目标检测是非常重要的研究主题之一

作者基于 yolov5 和 yoloR，提出 yolov7，速度和精度均有提升

hopes that it can support both mobile GPU and GPU devices from the edge to the cloud

在这里插入图片描述

coarse-to-fine lead guided label assignment

2 Related Work

Real-time object detectors
Model re-parameterization
- module-level ensemble
- model-level ensemble
Model scaling
- resolution, depth, width, and stage

3 Advantages / Contributions

提出了 anchor-based 的 yolov7

吸纳了ELAN设计思想、MP降维组件、Rep结构的思考（“extend” and “compound scaling” methods ）、正负样本匹配策略(simOTA)、辅助训练头

40% parameters and 50% computation of state-of-the-art real-time object detector

4 Method

在这里插入图片描述

基本就是 yolov5 的框架，早期版本代码大部分也一样，基于 yolov5 来 coding 的

4.1 Architecture

（1） Extended efficient layer aggregation networks

在这里插入图片描述

基于 ELAN 提出了 Extended-ELAN (E-ELAN) ，E-ELAN uses expand, shuffle, merge cardinality to achieve the ability to continuously enhance the learning ability of the network without destroying the original gradient path.

ELAN模块是一个高效的网络结构，它通过控制最短和最长的梯度路径，使网络能够学习到更多的特征，并且具有更强的鲁棒性

在这里插入图片描述

E-ELAN

在这里插入图片描述

（2）Model scaling for concatenation-based models

之前 scale 模型，很多都只考虑了单方面因素，比如深度、宽度、输入分辨率等，作者发现 E-LAN 这种结构不能简单的只考虑一个因素的 model scale

在这里插入图片描述
上面这个例子，增加深度的时候，width 也会改变

于是作者提出 compound scaling method

在这里插入图片描述

compound scaling method can maintain the properties that the model had at the initial design and maintains the optimal structure.

（3）MP 结构

在这里插入图片描述

pooling 和步长为 2 的 conv 一起来，最后 concat 在一起

（4）re-parameters

在这里插入图片描述

4.2 Trainable bag-of-freebies

（1）Planned re-parameterized convolution

作者发现 repvgg 这种 re-parameter 方法用在 VGG 网络上效果比较好，用在 resnet 或者 densenet 这种结构上会翻车

原因：the identity connection in RepConv destroys the residual in ResNet and the concatenation in DenseNet

作者 use RepConv without identity connection (RepConvN) to design the architecture of planned re-parameterized convolution.

在这里插入图片描述

（2）Coarse for auxiliary and fine for lead loss

在这里插入图片描述

we use lead head prediction as guidance to generate coarse-to-fine hierarchical labels

对aux head分配更多的正样本 coarse，而lead head分配较少的正样本 fine

在这里插入图片描述

主头用了 3 个区域（自己加上周围的两个区域）来回归GT，辅助头偏移最多1，自己加周围4个，共5个区域来回归GT

such learning as a kind of generalized residual learning——有点 resnet 那味

By letting the shallower auxiliary head directly learn the information that lead head has learned, lead head will be more able to focus on learning residual information that has not yet been learned.

fine label and coarse label to be dynamically adjusted during the learning process

（3）Other trainable bag-of-freebies

Batch normalization in conv-bn-activation topology，这样方便 conv 和 bn 合并
EMA model: we use EMA model purely as the final inference model
Implicit knowledge in YOLOR，代码头部结构中，输出特征图 add 和 multi 了一个系数（1，1，channel，1），该系数是 learnable 的

2021年过去，谁是YOLO系列的最强王者？YOLO系列的最高精度YOLOR是怎样炼成的

Wang C Y, Yeh I H, Liao H Y M. You only learn one representation: Unified network for multiple tasks[J]. arXiv preprint arXiv:2105.04206, 2021.

浅层特征定义为显式知识，深层特征定义为隐式知识

本文将直接可观察的知识定义为显式知识，隐藏在神经网络中且无法观察的知识定义为隐式知识

在这里插入图片描述

在深度学习中，流形空间通常指的是数据在高维空间中形成的低维结构或曲面。

在这里插入图片描述

4.3 Label Assignment

正负样本分配用的是 simOTA

YOLOv7正负样本分配详解

结合了 yolov5 和 yolox

在这里插入图片描述

h 和 w 范围是 0-4，和 anchor 的长宽比遥相呼应

在这里插入图片描述

初筛只挑选出 GT 自己和左上或者右下共 3 个 grid（GT长宽比符合预设anchor 的长宽比范围内）

复筛，动态调整是predict 和 GT 实时比对（IoU、类别），而不是anchor和GT对比了，存在 grid 符合条件，但是 predict 很糟糕，与 GT 差异较大的回归场景

5 Experiments

did not use pre-trained models

5.1 Datasets and Metrics

（1）Baselines

在这里插入图片描述
（2）Comparison with state-of-the-arts

在这里插入图片描述

the best speed-accuracy trade-off comprehensively

（3）Ablation study

Proposed compound scaling method

在这里插入图片描述

Proposed planned re-parameterized model

在这里插入图片描述

all higher AP values are present on our proposed planned re-parameterized model.

在这里插入图片描述

Proposed assistant loss for auxiliary head

在这里插入图片描述

在这里插入图片描述
效果基本上没有什么提升，哈哈哈，这里搞了个寂寞

改动改动

the method of constraining the upper bound of objectness by the distance from the center of the object can achieve better performance.

在这里插入图片描述

提升还是不太明显，继续改进改进

在这里插入图片描述

这个改动有些效果

More comparison

在这里插入图片描述

6 Conclusion（own） / Future work

参考学习来自
- YOLOv7（目标检测）入门教程详解—检测，推理，训练
- 目标检测算法之YOLO（YOLOv7）(❤❤❤❤❤)
- 深入浅出 Yolo 系列之 Yolov7 基础网络结构详解(❤❤❤❤❤)
- YOLOv7详解，从基础到实战，五年工程师带你1小时掌握目标检测核心！（视频❤❤❤❤❤）
初读论文中的细节信息感觉比较少
正样本还和 yolov5 一样，正样本有3个网格负责预测，可以想象成 GT 上下左右平移 0.5，看挪到靠近哪个网格就哪个网格负责预测
【yolov7系列】网络框架细节拆解
yolov7 依旧基于anchor based的方法，同时在网络架构上增加E-ELAN层，并将REP层也加入进来，方便后续部署，同时在训练时，在head时，新增Aux_detect用于辅助检测

bryant_meng

关注

16
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
【YOLOv7】《YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors》

CVPR-2023。
复制链接

扫一扫

专栏目录