Paper Reading-YOLOv4

paper:YOLOv4
github:Darknet
个人感觉,这篇论文更像是一种好多方法的总结之类的论文,最近公司要做一个分享,做一下阅读笔记

标题传统的物体检测大概是由以下几个部分组成的(借用作者的原话):

在这里插入图片描述

对于YOLOv4:

在这里插入图片描述

  • head:预测类别和物体框,可以分为两类(one-stage和two-stage)
  • neck:在backbone和head中间插入一些layers,这些layers通常是用来collect feature maps from different stages,文章的原话是:

Usually, a neck is composed of several bottom-up paths and several top-down paths

  • Backbone的作用是提取图像的特征,作者比较了CSPResNext50和CSPDarknet53后,选择了后者,原话是:

our numerous studies demonstrate that the CSPResNext50 is considerably better compared to CSPDarknet53 in terms of object classification on the ILSVRC2012 (ImageNet) dataset. However, conversely, the CSPDarknet53 is better compared to CSPResNext50 in terms of detecting objects on the MS COCO dataset.

简单理解就是,这两个网络,前者的效果在分类的数据集表现很好,但是后者在检测的数据集上表现更好。除此之外,还有其它的原因,与分类相比,检测需要更大的分辨率,更深的网络,更多的参数,具体如下:
在这里插入图片描述
在这里插入图片描述

The CSPResNext50 contains only 16 convolutional layers 3 × 3, a 425 × 425 receptive field and 20.6M parameters, while CSPDarknet53 contains 29 convolutional layers 3 × 3, a 725 × 725 receptive field and 27.6M parameters. This theoretical justification, together with our numerous experiments, show that CSPDarknet53 neural network is the optimal model of the two as the backbone for a detector.

  • 在这里,作者也对感受野做了一些总结:

在这里插入图片描述

不同的感受野会有不一样的作用,YOLOv4为了增大感受野,使用了SPP和PANet,原话是:

We add the SPP block over the CSPDarknet53, since it significantly increases the receptive field, separates out the most significant context features and causes almost no reduction of the network operation speed. We use PANet as the method of parameter aggregation from different backbone levels for different detector levels, instead of the FPN used in YOLOv3.

  • SPP在YOLOv4中的结构为:

在这里插入图片描述

当然,之前说感受这里有很多方法的总结,可以从两个方面来概括:Bag of freebies 和 Bag of specials

  • 关于Bag of freebies,原文中有解释:

Therefore, researchers always like to take this advantage and develop better training methods which can make the object detector receive better accuracy without increasing the inference cost. We call these methods that only change the training strategy or only increase the training cost as “bag of freebies.”

简单理解就是,不会增加inference的时间,可以增加模型的accuracy,这样的方法就是Bag of freebies

这里也有一篇关于Bag of freebies的论文

Bag of freebies
(1)data augmentation:model has higher robustness(模型有更高的鲁棒性)
(2)semantic distribution bias(数据不均衡)
(3)express the relationship of the degree of association between different categories with the one-hot hard representation
(4)Bounding Box (BBox) regression
  • 而Bag of specials,就是会增加inference的时间,但是能大幅提高accuracy

For those plugin modules and post-processing methods that only increase the inference cost by a small amount but can significantly improve the accuracy of object detection, we call them “bag of specials”.

Bag of specials
(1)enhance receptive field
(2) attention module
(3) feature integration
(4)activation function
(5) post-processing:NMS

Additional Improvements:

  • 在提出了这些方法后,作者做的进一步的改进就是:

在这里插入图片描述

  • 关于SAT,文章中的原话是

operate in 2 forward backward stages.
In the 1st stage the neural network alters the original image instead of the network weights. In this way the neural network executes an adversarial attack on itself, altering the original image to create the deception that there is no desired object on the image.
In the 2nd stage, the neural network is trained to detect an object on this modified image in the normal way.

  • modified SAM 和 modified PAN:

在这里插入图片描述
将Spatial-wise Attention变为Point-wise Attention,也就是从空间上的注意力到点注意力来修改SAM
在这里插入图片描述
关于PANet的改变,就是addition变为concatenation,我个人感觉是为了增加通道数,能搜集到更多的特征,减少信息的损失。

最后,从网上找的一张感觉比较不错的图:

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值