# YOLO系列理论解读 v1 v2 v3

40 篇文章 1 订阅
26 篇文章 0 订阅

## YOLO系列理论解读

### YOLO v1（You Only Look Once:Unified, Real-Time Object Detection）

#### YOLO v1实现步骤

1. 将一幅图像分成SxS个网格(grid cell)，如果某个object的中心落在这个网格中，则这个网格就负责预测这个object。

2)每个网格要预测B个bounding box，每个bounding box除了要预测位置之外，还要附带预测一个confidence值。每个网格还要预测c个类别的分数。

For evaluating YOLO on PASCAL VOC, we use S = 7, B = 2. PASCAL VOC has 20 labelled classes so C = 20. Our final prediction is a 7 × 7 × 30 tensor.

Each bounding box consists of 5 predictions: x, y, w, h,
and confidence. The (x, y) coordinates represent the center
of the box relative to the bounds of the grid cell. The width
and height are predicted relative to the whole image. Finally
the confidence prediction represents the IOU between the
predicted box and any ground truth box.

we define confidence as Pr(Object) ∗ IOUtruthpred . If noobject exists in that cell, the confidence scores should bezero. Otherwise we want the confidence score to equal theintersection over union (IOU) between the predicted box and the ground truth

#### 损失函数

yolo v1的损失函数个人感觉是及其复杂的，在论文中给出的函数表达形式为：

λ coord  ∑ i = 0 S 2 ∑ j = 0 B 1 i j obj  [ ( x i − x ^ i ) 2 + ( y i − y ^ i ) 2 ] + λ coord  ∑ i = 0 S 2 ∑ j = 0 B 1 i j obj  [ ( w i − w ^ i ) 2 + ( h i − h ^ i ) 2 ] + ∑ i = 0 S 2 ∑ j = 0 B 1 i j obj  ( C i − C ^ i ) 2 + λ noobj  ∑ i = 0 S 2 ∑ j = 0 B 1 i j n o o b j ( C i − C ^ i ) 2 + ∑ i = 0 S 2 1 i obj  ∑ c ∈  classes  ( p i ( c ) − p ^ i ( c ) ) 2 \begin{array}{l} \lambda_{\text {coord }} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {obj }}\left[\left(x_{i}-\hat{x}_{i}\right)^{2}+\left(y_{i}-\hat{y}_{i}\right)^{2}\right] \\ +\lambda_{\text {coord }} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {obj }}\left[\left(\sqrt{w_{i}}-\sqrt{\hat{w}_{i}}\right)^{2}+\left(\sqrt{h_{i}}-\sqrt{\hat{h}_{i}}\right)^{2}\right] \\ +\sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {obj }}\left(C_{i}-\hat{C}_{i}\right)^{2} \\ +\lambda_{\text {noobj }} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\mathrm{noobj}}\left(C_{i}-\hat{C}_{i}\right)^{2} \\ +\sum_{i=0}^{S^{2}} \mathbb{1}_{i}^{\text {obj }} \sum_{c \in \text { classes }}\left(p_{i}(c)-\hat{p}_{i}(c)\right)^{2} \end{array}

#### 存在问题

Yolo v1对小的集群目标的预测效果差，例如之前论文值提到的对图片中的较小的鸟群有较差的预测效果。

### YOLO V2 (YOLO9000)

#### v2的改进

1. Batch Normalization（引入了BN层）

Batch normalization leads to significant improvements in convergence while eliminating the need for other forms of regularization [7]. By adding batch normalization on all of the convolutional layers in YOLO we get more than 2% improvement in mAP. Batch normalization also helps regularize the model. With batch normalization we can remove dropout from the model without overfitting

1. High Resolution Classifier（更高分辨率的分类器）
2. Convolutional With Anchor Boxes.（使用锚框来进行预测）

1. Fine-Grained Features（结合更底层的特征信息）

passthrough layer将高层将13x13的结果与高层的特征图26x26x512进行一个结合的操作。

1. Multi-Scale Training（采用多尺度的训练方法）

following multiples of 32: {320, 352, …, 608}. Thus the
smallest option is 320 × 320 and the largest is 608 × 608.
We resize the network to that dimension and continue training.

#### BackBone骨干网络

Yolo v2使用的网络架构为：Darknet-19作为其骨干网络（224x224的输入共19个卷积层）模型结构图。

125 =(20+5) x 5

### YOLO V3( An Incremental Improvement)

#### 主干网络

Darknet-53：53层网络的特点通过卷积层替换之前的下采样层，使得检测的效果得到了提升。

2 + ( 1 × 2 ) + 1 + ( 2 × 2 ) + 1 + ( 8 × 2 ) + 1 + ( 8 × 2 ) + 1 + ( 4 × 2 ) + 1 = 53 \begin{array}{l} 2+ \\ (1 \times 2)+1+ \\ (2 \times 2)+1+ \\ (8 \times 2)+1+ \\ (8 \times 2)+1+ \\ (4 \times 2)+1=53 \end{array}

On the COCO dataset the 9 clusters were:
(10×13),(16×30),(33×23),(30×61),(62×45),(59×119),(116 × 90),(156 × 198),(373 × 326).

N × N × [3 ∗ (4 + 1 + 80)] for the 4 bounding box offsets,
1 objectness prediction, and 80 class predictions.

#### 目标边界框的预测

σ ( x ) = Sigmoid ⁡ ( x ) \sigma(x)=\operatorname{Sigmoid}(x)

b x = σ ( t x ) + c x b y = σ ( t y ) + c y b w = p w e t w b h = p h e t n \begin{array}{l} b_{x}=\sigma\left(t_{x}\right)+c_{x} \\ b_{y}=\sigma\left(t_{y}\right)+c_{y} \\ b_{w}=p_{w} e^{t_{w}} \\ b_{h}=p_{h} \mathrm{e}^{t_{n}} \end{array}

#### 损失函数

L ( o , c , O , C , l , g ) = λ 1 L conf  ( o , c ) + λ 2 L c l a ( O , C ) + λ 3 L l o c ( l , g ) L(o, c, O, C, l, g)=\lambda_{1} L_{\text {conf }}(o, c)+\lambda_{2} L_{c l a}(O, C)+\lambda_{3} L_{l o c}(l, g)

λ 1 , λ 2 , λ 3 为平衡系数。 \lambda_{1}, \lambda_{2}, \lambda_{3}为平衡系数。

• 置信度损失使用的是二值交叉熵损失:

YOLOv3 predicts an objectness score for each bounding
box using logistic regression.This should be1if thebound-lng
g box prior overlaps a ground truth object by more than
any other bounding box prior. If the bounding box prior

Binary Cross Entropy

L conf  ( o , c ) = − ∑ i ( o i ln ⁡ ( c ^ i ) + ( 1 − o i ) ln ⁡ ( 1 − c ^ i ) ) N L_{\text {conf }}(o, c)=-\frac{\sum_{i}\left(o_{i} \ln \left(\hat{c}_{i}\right)+\left(1-o_{i}\right) \ln \left(1-\hat{c}_{i}\right)\right)}{N}

c ^ i = Sigmoid ⁡ ( c i ) \hat{c}_{i}=\operatorname{Sigmoid}\left(c_{i}\right)

c为预测值,ci,为c通过Sigmoid函数得到的预测置信度。N为正负样本个数。

• 类别损失使用的是二值交叉熵损失:

L c l a ( O , C ) = − ∑ i ∈  posj  j  cla  ( O i j ln ⁡ ( C ^ i j ) + ( 1 − O i j ) ln ⁡ ( 1 − C ^ i j ) ) N pos  C ^ i j = Sigmoid ⁡ ( C i j ) \begin{array}{c} L_{c l a}(O, C)=-\frac{\sum_{i \in \text { posj } j \text { cla }}\left(O_{i j} \ln \left(\hat{C}_{i j}\right)+\left(1-O_{i j}\right) \ln \left(1-\hat{C}_{i j}\right)\right)}{N_{\text {pos }}} \\ \hat{C}_{i j}=\operatorname{Sigmoid}\left(C_{i j}\right) \end{array}

Cij为预测值,Cij(hat)为Cij通过Sigmoid函数得到的目标概率

Npos为正样本个数

• 定位损失

L loc  ( t , g ) = ∑ i ∈  pos  ( σ ( t x i ) − g ^ x i ) 2 + ( σ ( t y i ) − g ^ y i ) 2 + ( t w i − g ^ w i ) 2 + ( t h i − g ^ h i ) 2 N pos  L_{\text {loc }}(t, g)=\frac{\sum_{i \in \text { pos }}\left(\sigma\left(t_{x}^{i}\right)-\hat{g}_{x}^{i}\right)^{2}+\left(\sigma\left(t_{y}^{i}\right)-\hat{g}_{y}^{i}\right)^{2}+\left(t_{w}^{i}-\hat{g}_{w}^{i}\right)^{2}+\left(t_{h}^{i}-\hat{g}_{h}^{i}\right)^{2}}{N_{\text {pos }}}

g ^ x i = g x i − c x i g ^ y i = g y i − c y i g ^ w i = ln ⁡ ( g w i / p w i ) g ^ h i = ln ⁡ ( g h i / p h i ) \begin{array}{l} \hat{g}_{x}^{i}=g_{x}^{i}-c_{x}^{i} \\ \hat{g}_{y}^{i}=g_{y}^{i}-c_{y}^{i} \\ \hat{g}_{w}^{i}=\ln \left(g_{w}^{i} / p_{w}^{i}\right) \\ \hat{g}_{h}^{i}=\ln \left(g_{h}^{i} / p_{h}^{i}\right) \end{array}

• 29
点赞
• 29
收藏
觉得还不错? 一键收藏
• 打赏
• 0
评论
11-23 883
07-06 243
09-12 49万+
11-27 2085
10-14 1216
08-03 4667
07-10 1057
07-12 46
07-12 233
07-11 464

### “相关推荐”对你有帮助么？

• 非常没帮助
• 没帮助
• 一般
• 有帮助
• 非常有帮助

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。