arXiv-2022
https://github.com/meituan/YOLOv6
Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. arXiv preprint arXiv:2209.02976, 2022.
文章目录
1 Background and Motivation
作者观察现有的 YOLO series 目标检测方法
(1)Reparameterization from RepVGG is not yet well exploited in detection
(2)Quantization of reparameterization-based detectors also requires meticulous treatment
(3)pay less attention to deployment
(4)Advanced domain-specific strategies like label assignment and loss function design need further verifications considering
(5)bag-of-freebies,such as knowledge distillation
从 network design, training strategies, testing techniques, quantization and optimization methods 方面,strive to push YOLO 系列的 limits to the next level
With the generous permission of YOLO authors, we name it YOLOv6
2 Advantages / Contributions
提出了 yolo v6
-
refashion a line of networks of different sizes tailored for industrial applications in diverse scenarios
-
引入了 self-distillation strategy
-
改进 label assignment, loss function and data augmentation technique 来提升 yolov6 的性能
-
reform the quantization scheme
又快又好
3 Method
3.1 Network Design
1)主干网络
CSP 和 RepVGG 的结合,CSPStackRep Block
【CSPNet】《CSPNet:A New Backbone that can Enhance Learning Capability of CNN》
2)neck
沿用的 PANet
3)Head
Efficient Decoupled Head
Anchor-free
Tian Z, Shen C, Chen H, et al. FCOS: Fully convolutional one-stage object detection. arXiv 2019[J]. arXiv preprint arXiv:1904.01355, 2019.
point based not key-point based 的 anchor free 的方法
3.2 Label Assignment
Task alignment learning(TAL) 采用的是 TOOD 方法
Feng C, Zhong Y, Gao Y, et al. Tood: Task-aligned one-stage object detection[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, 2021: 3490-3499.
3.3 Loss Function
(1)regression loss
a)IoU-series Loss
SIoU(YOLOv6-N and YOLOv6-T) 和 GIoU(others YOLOv6)
Gevorgyan Z. SIoU loss: More powerful learning for bounding box regression[J]. arXiv preprint arXiv:2205.12740, 2022.
Li X, Wang W, Wu L, et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002-21012.
b)Probability Loss
Distribution Focal Loss (DFL) in YOLOv6-M/L
Li X, Wang W, Wu L, et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002-21012.
quality focal loss & distribute focal loss 详解(paper, 代码)
(2)classification loss
VariFocal Loss
Zhang H, Wang Y, Dayoub F, et al. Varifocalnet: An iou-aware dense object detector[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 8514-8523.
(3)objectness loss
效果降低了,作者取消了
3.4 Industry-handy Improvements
(1)more training epochs
300 to 400,简单粗暴
(2)self-distillation
We limit the teacher to be the student itself but pretrained, hence we call it self-distillation
一开始 soft label,随着训练的深入,hard label
(3)Gray border of images
由于训练有马赛克增广,导致测试的时候有补灰边的结果比没有灰边的结果要好
作者的方案
- turning mosaic augmentations off during last epochs
- change the area of gray border and resize the image with gray borders directly to the target image size(减少原图+padding灰边=测试尺寸,而不是原图 + padding 灰边 = 测试尺寸)
测试640,原图 640,灰边 = 3 时,原图 634 + 3 * 2 = 640
测试640,原图 640,灰边 = 6 时,原图 628 + 6 * 2 = 640
3.5 Quantization and Deployment
涉及到了 RepVGG结构( Reparam),it is hard to incorporate QAT when it comes to matching fake quantizers during training and inference.
作者做了如下的优化
(1)RepOptimizer to obtain PTQ-friendly weights
(2)Sensitivity Analysis
(3)QAT with channel-wise distillation and graph optimization to pursue extreme performance.
5 Experiments
5.1 Datasets and Metrics
COCO and mAP
5.2 Ablation Study
5.2.1 Network
(1)Backbone and neck
VFL as the classification loss, and GIoU with DFL as the regression loss.
channel coefficient (denoted as CC) of CSPStackRep Block
小模型 single-path 更猛,大模型 multi-branch structure 效果更佳(超大模型 L 好像也不是很敏感)
瘦高型的网络比矮胖型的网络效果要好(高度可以理解为网络的深度)
(2)Combinations of convolutional layers and activation functions
SiLU 部署推理的时候速度会慢很多,小网络建议用 ReLU 平替
大网络 RepConv + SiLU 效果最好(哈哈,Conv 也不弱呀)
(3)Miscellaneous design
decoupled head (denoted as DH)
anchor-free paradigm(AF)
(EfficientRep Backbone) and the neck (Rep-PAN neck), denoted as EB+RN
decoupled head (hybrid channels, HC)
5.2.2 Label Assignment
TAL,也即 TOOD 效果最好
TAL 不采用 warmup strategy 的效果也不错,原始方法是 with ATSS warmup
5.2.3 Loss functions
(1)Classification Loss
(2)Regression Loss
引入 DFL 精度些许提升,但是 the inference speed is greatly affected for small models
(3)Object Loss
引入 Object Loss 效果反而下降,作者分析如下
The negative gain may come from the conflict between the object branch and the other two branches in TAL.
YOLOv6 摒弃掉了 objectness loss
5.3 Industry-handy improvements
(1)More training epochs
训练越久越给力
(2)Self-distillation
蒸馏 + weight decay 效果最佳
(3)Gray border of images
reduced from 672 to 640
固定测试尺寸的情况下,调整原图和灰边的大小,灰边越多,原图越少
5.4 Quantization Results
(1)PTQ
(2)QAT
v1.0
v2.0
(3)Quantization Details
Feature Distribution Comparison
Sensitivity Analysis Results
MSE is the closest to direct AP evaluation
5.6 Detailed Latency and Throughput Benchmark
(1)T4 GPU Latency Table with TensorRT 8
(2)V100 GPU Latency Table
(3)CPU Latency
6 Conclusion(own) / Future work
- DFL 和 GIoU 搭配
- decouple head 中的 hybrid channels
- plain single-path architecture is a better choice for small networks,
- decoupled head (hybrid channels, HC)
摘抄下其他博主的解读:
Loss
VariFocal Loss(VFL)
小模型 SIoU 损失
大模型 GIoU 损失
YOLOv6 主要在 Backbone、Neck、Head 以及训练策略等方面进行了诸多的改进:
我们统一设计了更高效的 Backbone 和 Neck :受到硬件感知神经网络设计思想的启发,基于 RepVGG style [4] 设计了可重参数化、更高效的骨干网络 EfficientRep Backbone 和 Rep-PAN Neck。
优化设计了更简洁有效的 Efficient Decoupled Head,在维持精度的同时,进一步降低了一般解耦头带来的额外延时开销。
在训练策略上,我们采用Anchor-free 无锚范式,同时辅以 SimOTA [2] 标签分配策略以及 SIoU [9] 边界框回归损失来进一步提高检测精度。
YOLOv6 引入了 SimOTA [4]算法动态分配正样本,进一步提高检测精度。YOLOv5 的标签分配策略是基于 Shape 匹配,并通过跨网格匹配策略增加正样本数量,从而使得网络快速收敛,但是该方法属于静态分配方法,并不会随着网络训练的过程而调整。
和YOLOX一样,YOLOv6也对检测头进行了解耦,分开了边框与类别的分类过程。将边框回归和类别分类耦合在一起时会影响性能,因为这样不仅加慢收敛的速度,也会提高检测头的复杂程度。(YOLOV6网络结构介绍)
信噪比计算方法
https://github.com/openppl-public/ppq/blob/8a849c9b14bacf2a5d0f42a481dfa865d2b75e66/ppq/quantization/measure/norm.py
def torch_snr_error(y_pred: torch.Tensor, y_real: torch.Tensor, reduction: str='mean') -> torch.Tensor:
if y_pred.shape != y_real.shape:
raise ValueError(f'Can not compute snr loss for tensors with different shape. '
f'({y_pred.shape} and {y_real.shape})')
reduction = str(reduction).lower()
if y_pred.ndim == 1:
y_pred = y_pred.unsqueeze(0)
y_real = y_real.unsqueeze(0)
y_pred = y_pred.flatten(start_dim=1)
y_real = y_real.flatten(start_dim=1)
noise_power = torch.pow(y_pred - y_real, 2).sum(dim=-1)
signal_power = torch.pow(y_real, 2).sum(dim=-1)
snr = (noise_power) / (signal_power + 1e-7)
if reduction == 'mean':
return torch.mean(snr)
elif reduction == 'sum':
return torch.sum(snr)
elif reduction == 'none':
return snr
else:
raise ValueError(f'Unsupported reduction method.')
YOLO_v6讲解
当gt_box的center落在哪儿个anchor_points范围内,那么这个point负责该gt_box。出现一个anchor_point范围内包含多个gt_center,那么只负责iou最大的那个gt。也就是说gt_box可以一堆多,但是anchor_point最多负责一个gt。
附录——正负样本匹配机制
目前的标签分配方法根据标签是否非负即正分为硬标签分配(Hard LA)和软标签分配(Soft LA)两大类
label assignment
Hard LA
- 静态标签分配方法主要基于距离、IOU等先验知识设置固定阈值去区分正负样本,如FCOS、两阶段标检测算法、RFLA等;
- 动态标签分配方法则根据不同策略动态设置阈值选择正负样本,如ATSS、PAA、OTA、DSL、SimOTA等
软标签分配方法会基于预测结果与真实框计算软标签和正负权重,在候选正样本(一般为落在GT内点)的基础上根据正负权重潜在分配正负样本和计算损失,且会在训练过程中动态调整软标签和正负权重,如GFL、VFL、TOOD、DW等。
ATSS——Dynamic Hard LA
来自:mmdetection最小复刻版(七):anchor-base和anchor-free差异分析
51.1AP!单阶段检测器的新纪录,TOOD:即插即用的检测器换头术,显著提升性能
RFLA——Static Hard LA
SimOTA & DSL——Dynamic Hard LA
OTA 的简化版本
缺点:slow down the training process. And it is not rare to fall into unstable training
AutoAssign
TOOD——soft LA
Feng C, Zhong Y, Gao Y, et al. Tood: Task-aligned one-stage object detection[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, 2021: 3490-3499.