【CV经典模型-Detection】YOLO v3

最新推荐文章于 2024-05-18 16:55:56 发布

Iris__HU

最新推荐文章于 2024-05-18 16:55:56 发布

阅读量260

点赞数

分类专栏：论文阅读深度学习

本文链接：https://blog.csdn.net/huxiyan450/article/details/106639777

版权

深度学习同时被 2 个专栏收录

12 篇文章

订阅专栏

论文阅读

8 篇文章

订阅专栏

YOLO v3

Bounding box Regression

与v2一致：
对于bounding box regression而言，每个bounding box会对应4个输出： $t_x,t_y,t_w,t_h$ 。
最后需要的bounding box的中心位置 $b_x，b_y）$ ，宽高 $b_w，b_h）$ 可以由如下公式算得：

$b_x=\sigma(t_x)+c_x$
$b_y=\sigma(t_y)+c_y$
$b_w=p_w*e^{t_w}$
$b_h=p_h*e^{t_h}$

$c_x，c_y）$ ：此cell相对于feature map左上角的offset。
$p_w，p_h）$ ：此anchor box的先验宽高。
注意：

grid cell的尺寸做过归一化，每个cell的大小为1*1。因此每个属于此grid cell的bounding box的中心位置 $b_x，b_y）$ 与grid cell起始位置 $c_x，c_y）$ 之间的偏差一定大于0且小于1。
anchor box的尺寸和比例照例也是从训练集中通过k-means聚类而来。

Objectness Prediction

与yolo一样，每个bounding box会对应输出一个confidence score,来表示此bounding box包含物体的概率。
当反向传播时，每个ground truth只用和它IOU最高的anchor box所对应的那个bounding box prediction来更新权重。

Classification

与v2不同，分类时v3直接使用了n个独立的罗辑回归分类器来对每一类做二分类（n为总类别数）。对每一个label使用交叉熵计算 error。

Prediction across scales（*）

为了能更好地检测不同尺寸的物体，v3输出了3种不同尺寸的feature map来作为prediction，每种尺寸的feature map使用了三种不同的anchor box，一共有9种anchor box。
网络的结构如下：
Mao, Qi-Chao & Sun, Hong-Mei & Liu, Yan-Bo & Jia, Rui-Sheng. (2019). Mini-YOLOv3: Real-Time Object Detector for Embedded Applications. IEEE Access. PP. 1-1. 10.1109/ACCESS.2019.2941547.

Mao, Qi-Chao & Sun, Hong-Mei & Liu, Yan-Bo & Jia, Rui-Sheng. (2019).
Mini-YOLOv3: Real-Time Object Detector for Embedded Applications. IEEE
Access. PP. 1-1. 10.1109/ACCESS.2019.2941547.

Darknet-53

3种scale：stride=32， stride=16， stride=8
以stride=16为例子：
- 对之前stride=32的feature map使用1*1filter做channel reduction
- upsampling，使feature map的尺寸*2，扩大之后，stride由32变为16。
- 把之前layer最后一个stride=16的输出和上一步upsampling得到的输出重叠到一起。（虽然现在这个新的feature map的stride依然为16，但是它包含了基于原图像学到的细节feature和通过stride不断扩大receptive field得到的更全局的feature）
- 经过一个convolutional set，进一步学习和融合这些不同尺度的features.
- 最后得到在更密的grid上的prediction。
共9种anchor box，每种scale的prediction使用三种anchor box：
- stride最大的feature map包含更全局的信息，主要用于检测大尺寸物体，因此用尺寸最大的3个anchor box；
- stride最小的feature map包含很多局部细节的feature，主要用于检测小物体，因此用尺寸最小的anchor box。

Training

multi-scale training
鲁棒的Detecion应该与输入图片的大小尺寸大小和分辨率高低无关，为了降低模型对图片尺寸和分辨率的敏感度，训练时每隔n个batch，输入图片的尺寸会更改一次，以便让模型能在不同的分辨率上做预测。
data augmentation
batch normalization
DL细节-Batch Normalization

Reference

https://arxiv.org/abs/1804.02767
https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b
https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/