-
You only look once (YOLO) V4
-
目的:提高神经网络在生产系统中的速度和优化并行计算;
-
网络模型的选择: CSPResNext50在ImageNet上的分类效果比CSPDarknet53好,CSPDarknet53在COCO上的检测效果比CSPResNext50好;
-
选择额外的模块增强感受野和参数聚合方法:
- 相比于分类,检测算法要求:(1)更高的输入分辨率,以检测到多个小尺寸的目标;(2)更多的层,以覆盖高分辨率的输入;(3)更多的参数:增强在单幅图像中检测多个不同大小的目标的能力;
- 不同大小感受野的影响:
- Up to the object size - allows viewing the entire object;
- Up to network size - allows viewing the context around the object;
- Exceeding the network size - increases the number of connections between the image point and the fifinal activation;
- 在CSPDarknet53上加入SPP,增加感受野,分理处有意义的上下文。几乎不会导致速度减小;
- 使用PANet聚合骨干网络中不同层级的参数,而v3中使用FPN;
-
最终,选择CSDarknet53作为骨干网络、SPP作为额外模块、PANet作为path-aggregation neck以及v3的head作为V4的组成部分;
-
Selection of BoF and BoS的选择:
- Activations: ReLU, leaky-ReLU, Swish, or Mish;
- Bounding box regression loss: MSE, IoU, GIoU,CIoU, DIoU;
- Data augmentation: CutOut, MixUp, CutMix;
- Regularization method: 选择dropblock作为正则化方法;
- Normalization of the network activations by their mean and variance: Batch Normalization (BN) ,Cross-GPU Batch Normalization (CGBN or SyncBN), Filter Response Normalization (FRN) , or Cross-Iteration Batch Normalization (CBN);
- Skip-connections: Residual connections, Weighted residual connections, Multi-input weighted residual connections, or Cross stage partial connections (CSP);
-
额外的改进:
- 新的数据增广方法:Mosaic, and Self-Adversarial Training (SAT);
- Mosaic:将四幅图像进行拼接,可以混合四种不同的上下文;
- cutmix:两幅图,将一幅图像贴到另一副图像上;
- SAT:第一阶段,修改原始图像(不改变网络权重),神经网络对其自身执行对抗攻击。修改原始图像,可以制造图像上没有期望的检测物体的假象;第二阶段:在修改后的图像上执行正常的检测;
- select optimal hyper-parameters while applying genetic algorithms;
- modifified SAM, modifified PAN, and Cross mini-Batch Normalization (CmBN);
- 新的数据增广方法:Mosaic, and Self-Adversarial Training (SAT);
-
总结下来:
YOLOv4 consists of:
- Backbone: CSPDarknet53, Neck: SP, PAN Head: YOLOv3;
YOLO v4 uses:
-
Bag of Freebies (BoF) for backbone: CutMix and Mosaic data augmentation, DropBlock regularization,Class label moothing;
-
Bag of Specials (BoS) for backbone: Mish activation, Cross-stage partial connections (CSP), Multiinput weighted residual connections (MiWRC);
-
Bag of Freebies (BoF) for detector: CIoU-loss,CmBN, DropBlock regularization, Mosaic data augmentation, Self-Adversarial Training, Eliminate grid sensitivity, Using multiple anchors for a single groundtruth, Cosine annealing scheduler , Optimal hyperparameters, Random training shapes;
-
Bag of Specials (BoS) for detector: Mish activation,SPP-block, SAM-block, PAN path-aggregation block,DIoU-NMS;
-
polo
-
-
实验总结
- 引入CutMix、Mosaic数据增广方式,分类标签平滑、Mish activation,分类精度改善;
- v3中在预测时将某个预测限定在gt的grid中,但有时候可能gt本身存在偏差,所以应该允许预测周围小范围的grid中,即允许预测 c x c_x cx或者 c x + 1 c_x+1 cx+1的 b x b_x bx,因此需要给sigmoid乘一个大于1.0的因子,消除grid的敏感性;
- 对于一个gt使用多个anchor,设定阈值选择;
- 使用遗传算法在网络训练的前10%的阶段选择最优超参数;
- 在小的训练图像上使用随机形状训练,自动增大min-batch size;
You only look once (YOLO) V4
最新推荐文章于 2022-03-06 16:14:27 发布