【YOLOv6】《YOLOv6：A Single-Stage Object Detection Framework for Industrial Applications》

bryant_meng

于 2024-07-16 10:18:40 发布

阅读量399

点赞数 13

分类专栏： CNN / Transformer 文章标签： YOLO 目标检测人工智能计算机视觉 yolov6

本文链接：https://blog.csdn.net/bryant_meng/article/details/131514316

版权

CNN / Transformer 专栏收录该内容

202 篇文章 7 订阅

订阅专栏

在这里插入图片描述

arXiv-2022

https://github.com/meituan/YOLOv6

Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. arXiv preprint arXiv:2209.02976, 2022.

1 Background and Motivation

作者观察现有的 YOLO series 目标检测方法

（1）Reparameterization from RepVGG is not yet well exploited in detection

（2）Quantization of reparameterization-based detectors also requires meticulous treatment

（3）pay less attention to deployment

（4）Advanced domain-specific strategies like label assignment and loss function design need further verifications considering

（5）bag-of-freebies，such as knowledge distillation

从 network design, training strategies, testing techniques, quantization and optimization methods 方面，strive to push YOLO 系列的 limits to the next level

With the generous permission of YOLO authors, we name it YOLOv6

2 Advantages / Contributions

提出了 yolo v6

refashion a line of networks of different sizes tailored for industrial applications in diverse scenarios
引入了 self-distillation strategy
改进 label assignment, loss function and data augmentation technique 来提升 yolov6 的性能
reform the quantization scheme

又快又好

3 Method

3.1 Network Design

1）主干网络

在这里插入图片描述

CSP 和 RepVGG 的结合，CSPStackRep Block

【CSPNet】《CSPNet：A New Backbone that can Enhance Learning Capability of CNN》

【RepVGG】《RepVGG：Making VGG-style ConvNets Great Again》

2）neck

沿用的 PANet

【PANet】《Path Aggregation Network for Instance Segmentation》

3）Head

Efficient Decoupled Head

Anchor-free

在这里插入图片描述

Tian Z, Shen C, Chen H, et al. FCOS: Fully convolutional one-stage object detection. arXiv 2019[J]. arXiv preprint arXiv:1904.01355, 2019.

point based not key-point based 的 anchor free 的方法

3.2 Label Assignment

Task alignment learning（TAL）采用的是 TOOD 方法

Feng C, Zhong Y, Gao Y, et al. Tood: Task-aligned one-stage object detection[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, 2021: 3490-3499.

3.3 Loss Function

（1）regression loss

a）IoU-series Loss

SIoU（YOLOv6-N and YOLOv6-T）和 GIoU（others YOLOv6）

Gevorgyan Z. SIoU loss: More powerful learning for bounding box regression[J]. arXiv preprint arXiv:2205.12740, 2022.

Li X, Wang W, Wu L, et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002-21012.

b）Probability Loss

Distribution Focal Loss (DFL) in YOLOv6-M/L

Li X, Wang W, Wu L, et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002-21012.

在这里插入图片描述

quality focal loss & distribute focal loss 详解（paper, 代码）

在这里插入图片描述

（2）classification loss

VariFocal Loss

Zhang H, Wang Y, Dayoub F, et al. Varifocalnet: An iou-aware dense object detector[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 8514-8523.

（3）objectness loss

效果降低了，作者取消了

3.4 Industry-handy Improvements

（1）more training epochs

300 to 400，简单粗暴

（2）self-distillation

We limit the teacher to be the student itself but pretrained, hence we call it self-distillation

在这里插入图片描述

一开始 soft label，随着训练的深入，hard label

（3）Gray border of images

由于训练有马赛克增广，导致测试的时候有补灰边的结果比没有灰边的结果要好

作者的方案

turning mosaic augmentations off during last epochs
change the area of gray border and resize the image with gray borders directly to the target image size（减少原图+padding灰边=测试尺寸，而不是原图 + padding 灰边 = 测试尺寸）

测试640，原图 640，灰边 = 3 时，原图 634 + 3 * 2 = 640

测试640，原图 640，灰边 = 6 时，原图 628 + 6 * 2 = 640

3.5 Quantization and Deployment

涉及到了 RepVGG结构（ Reparam），it is hard to incorporate QAT when it comes to matching fake quantizers during training and inference.

作者做了如下的优化

（1）RepOptimizer to obtain PTQ-friendly weights

在这里插入图片描述

（2）Sensitivity Analysis

（3）QAT with channel-wise distillation and graph optimization to pursue extreme performance.

在这里插入图片描述

通用目标检测开源框架YOLOv6在美团的量化部署实战

在这里插入图片描述

5 Experiments

在这里插入图片描述

5.1 Datasets and Metrics

COCO and mAP

5.2 Ablation Study

5.2.1 Network

（1）Backbone and neck

VFL as the classification loss, and GIoU with DFL as the regression loss.

在这里插入图片描述

channel coefficient (denoted as CC) of CSPStackRep Block

小模型 single-path 更猛，大模型 multi-branch structure 效果更佳（超大模型 L 好像也不是很敏感）

在这里插入图片描述

瘦高型的网络比矮胖型的网络效果要好（高度可以理解为网络的深度）

（2）Combinations of convolutional layers and activation functions

在这里插入图片描述

SiLU 部署推理的时候速度会慢很多，小网络建议用 ReLU 平替

大网络 RepConv + SiLU 效果最好（哈哈，Conv 也不弱呀）

（3）Miscellaneous design

在这里插入图片描述

decoupled head (denoted as DH)

anchor-free paradigm（AF）

(EfficientRep Backbone) and the neck (Rep-PAN neck), denoted as EB+RN

decoupled head (hybrid channels, HC)

5.2.2 Label Assignment

在这里插入图片描述

TAL，也即 TOOD 效果最好

在这里插入图片描述

TAL 不采用 warmup strategy 的效果也不错，原始方法是 with ATSS warmup

5.2.3 Loss functions

在这里插入图片描述

（1）Classification Loss

在这里插入图片描述
（2）Regression Loss

在这里插入图片描述

在这里插入图片描述
引入 DFL 精度些许提升，但是 the inference speed is greatly affected for small models

（3）Object Loss

在这里插入图片描述

引入 Object Loss 效果反而下降，作者分析如下

The negative gain may come from the conflict between the object branch and the other two branches in TAL.

YOLOv6 摒弃掉了 objectness loss

5.3 Industry-handy improvements

（1）More training epochs
在这里插入图片描述
训练越久越给力

（2）Self-distillation

在这里插入图片描述

蒸馏 + weight decay 效果最佳

（3）Gray border of images

在这里插入图片描述
reduced from 672 to 640

固定测试尺寸的情况下，调整原图和灰边的大小，灰边越多，原图越少

在这里插入图片描述

5.4 Quantization Results

（1）PTQ

在这里插入图片描述

（2）QAT

v1.0
在这里插入图片描述
v2.0

（3）Quantization Details

Feature Distribution Comparison

在这里插入图片描述

Sensitivity Analysis Results

在这里插入图片描述

MSE is the closest to direct AP evaluation

在这里插入图片描述

5.6 Detailed Latency and Throughput Benchmark

（1）T4 GPU Latency Table with TensorRT 8

在这里插入图片描述

（2）V100 GPU Latency Table

在这里插入图片描述

（3）CPU Latency

在这里插入图片描述

6 Conclusion（own） / Future work

DFL 和 GIoU 搭配
decouple head 中的 hybrid channels
plain single-path architecture is a better choice for small networks,
decoupled head (hybrid channels, HC)

摘抄下其他博主的解读：

YOLOv6：又快又准的目标检测框架开源啦

Loss

VariFocal Loss(VFL)

小模型 SIoU 损失

大模型 GIoU 损失

YOLOv6 主要在 Backbone、Neck、Head 以及训练策略等方面进行了诸多的改进：

我们统一设计了更高效的 Backbone 和 Neck ：受到硬件感知神经网络设计思想的启发，基于 RepVGG style [4] 设计了可重参数化、更高效的骨干网络 EfficientRep Backbone 和 Rep-PAN Neck。

优化设计了更简洁有效的 Efficient Decoupled Head，在维持精度的同时，进一步降低了一般解耦头带来的额外延时开销。

在训练策略上，我们采用Anchor-free 无锚范式，同时辅以 SimOTA [2] 标签分配策略以及 SIoU [9] 边界框回归损失来进一步提高检测精度。

YOLOv6 引入了 SimOTA [4]算法动态分配正样本，进一步提高检测精度。YOLOv5 的标签分配策略是基于 Shape 匹配，并通过跨网格匹配策略增加正样本数量，从而使得网络快速收敛，但是该方法属于静态分配方法，并不会随着网络训练的过程而调整。

在这里插入图片描述
和YOLOX一样，YOLOv6也对检测头进行了解耦，分开了边框与类别的分类过程。将边框回归和类别分类耦合在一起时会影响性能，因为这样不仅加慢收敛的速度，也会提高检测头的复杂程度。（YOLOV6网络结构介绍）

在这里插入图片描述

来自通用目标检测开源框架YOLOv6在美团的量化部署实战

在这里插入图片描述

信噪比计算方法
https://github.com/openppl-public/ppq/blob/8a849c9b14bacf2a5d0f42a481dfa865d2b75e66/ppq/quantization/measure/norm.py

def torch_snr_error(y_pred: torch.Tensor, y_real: torch.Tensor, reduction: str='mean') -> torch.Tensor:
    if y_pred.shape != y_real.shape:
        raise ValueError(f'Can not compute snr loss for tensors with different shape. '
            f'({y_pred.shape} and {y_real.shape})')
    reduction = str(reduction).lower()

    if y_pred.ndim == 1:
        y_pred = y_pred.unsqueeze(0)
        y_real = y_real.unsqueeze(0)

    y_pred = y_pred.flatten(start_dim=1)
    y_real = y_real.flatten(start_dim=1)

    noise_power  = torch.pow(y_pred - y_real, 2).sum(dim=-1)
    signal_power = torch.pow(y_real, 2).sum(dim=-1)
    snr = (noise_power) / (signal_power + 1e-7)

    if reduction == 'mean':
        return torch.mean(snr)
    elif reduction == 'sum':
        return torch.sum(snr)
    elif reduction == 'none':
        return snr
    else:
        raise ValueError(f'Unsupported reduction method.')

在这里插入图片描述

YOLO_v6讲解
在这里插入图片描述
当gt_box的center落在哪儿个anchor_points范围内，那么这个point负责该gt_box。出现一个anchor_point范围内包含多个gt_center，那么只负责iou最大的那个gt。也就是说gt_box可以一堆多，但是anchor_point最多负责一个gt。

YOLOv5，YOLOv6，YOLOv7在TensorRT推理速度比较

在这里插入图片描述

附录——正负样本匹配机制

单阶段检测算法主流的标签分配方法总结

目前的标签分配方法根据标签是否非负即正分为硬标签分配（Hard LA）和软标签分配（Soft LA）两大类

label assignment

Hard LA

静态标签分配方法主要基于距离、IOU等先验知识设置固定阈值去区分正负样本，如FCOS、两阶段标检测算法、RFLA等；
动态标签分配方法则根据不同策略动态设置阈值选择正负样本，如ATSS、PAA、OTA、DSL、SimOTA等

在这里插入图片描述
软标签分配方法会基于预测结果与真实框计算软标签和正负权重，在候选正样本（一般为落在GT内点）的基础上根据正负权重潜在分配正负样本和计算损失，且会在训练过程中动态调整软标签和正负权重，如GFL、VFL、TOOD、DW等。

在这里插入图片描述

ATSS——Dynamic Hard LA

在这里插入图片描述

来自：mmdetection最小复刻版(七)：anchor-base和anchor-free差异分析

在这里插入图片描述

51.1AP！单阶段检测器的新纪录，TOOD：即插即用的检测器换头术，显著提升性能

在这里插入图片描述

RFLA——Static Hard LA

在这里插入图片描述

SimOTA & DSL——Dynamic Hard LA

在这里插入图片描述
OTA 的简化版本

缺点：slow down the training process. And it is not rare to fall into unstable training

AutoAssign

在这里插入图片描述

TOOD——soft LA

在这里插入图片描述
Feng C, Zhong Y, Gao Y, et al. Tood: Task-aligned one-stage object detection[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, 2021: 3490-3499.