YOLOv1的loss函数

最新推荐文章于 2024-08-15 07:00:00 发布

玄云飘风

最新推荐文章于 2024-08-15 07:00:00 发布

阅读量5.7k

点赞数 3

分类专栏：论文阅读

本文链接：https://blog.csdn.net/tfcy694/article/details/82961395

版权

论文阅读专栏收录该内容

23 篇文章 1 订阅

订阅专栏

1.网络的output

YOLOv1的网络结构是包含20个卷基层的basemodel+4个新增卷积层。当选取7*7的grid和2个bounding box之后，输出为7*7*30的tensor。其中每个30d向量包括：5d长度的bbox1预测+5d长度的bbox2预测+该grid属于20个class的概率。bbox的预测五元组 $(x, y, w, h, c o n f i d e n c e)$ 解释原文：

The $(x, y)$ coordinates represent the center of the box relative to the bounds of the grid cell. The width and height are predicted relative to the whole image. Finally the confidence prediction represents the IOU between the predicted box and any ground truth box.

解释一下： $(x, y)$ 表示box的中心在grid cell中的相对位置（即 $x,y\in[0,1]$ ）
$w 、 h$ 表示该box的长和宽相对于input的长和宽（448）的比例（即 $w,h\in[0,1]$ ）
这四个元素即可还原出原图的一个bbox：
设左上角为(0,0)，grid cell左上角坐标为 $x_0,y_0)$ ，则bbox的宽高为： $448\times w、448\times h$
左上角坐标为： $x_0+\dfrac{448}{7}\times x-\dfrac{1}{2}\times448\times w,\ y_0+\dfrac{448}{7}\times y-\dfrac{1}{2}\times448\times h$
$confidence=Pr(Object)\times IOU$ 中，按照作者的意思， $P r (O b j e c t)$ 仅可能为0或者1，所以 $c o n f i d e n c e$ 要么为0，表示此bbox中不包含对象，要么不为0，表示该bbox和ground truth的IOU。
20d的分类概率类似于base model，不再赘述：

Each grid cell also predicts C conditional class probabilities, $Pr(Class_i|Object)$

2.loss函数

$\begin{array}{rcl} L&=\lambda_{cood}\sum_{i=1}^{S^2}\sum_{j=1}^{B}1_{ij}^{obj}[(x_i-\hat{x_i})^2+(y_i-\hat{y_i})^2]\\ &+\lambda_{cood}\sum_{i=1}^{S^2}\sum_{j=1}^{B}1_{ij}^{obj}[(\sqrt{w_i}-\sqrt{\hat{w_i}})^2+(\sqrt{h_i}-\sqrt{\hat{h_i}})^2]\\ &+\sum_{i=1}^{S^2}\sum_{j=1}^{B}1_{ij}^{obj}(C_i-\hat{C_i})^2\\ &+\lambda_{noobj}\sum_{i=1}^{S^2}\sum_{j=1}^{B}1_{ij}^{noobj}(C_i-\hat{C_i})^2\\ &+\sum_{i=1}^{S^2}1_{ij}^{obj}\sum_{c\in classes}(p_i(c)-\hat{p_i}(c))^2 \end{array}$
上次见到这么丑的公式还是电磁场的课上，不过幸好这个公式花点工夫还是能看懂的。
首先我们明确一下几个符号： $\sum_{i=1}^{S^2}$ 是在遍历grid cell， $\sum_{j=1}^{B}$ 是在遍历每个grid cell的bbox。而 $1_{ij}^{obj}$ 表示选取 $S^2*B$ 个bbox中框出了ground truth的那几个bbox，意味着和input中的对象数量在同一数量级（不完全相等是因为每个grid cell的若干个bbox可能选出了同一个对象）。剩下未选出的，就给了 $1_{ij}^{noobj}$ 。 $\lambda_{noobj}$ $\lambda_{cood}$ 是用于调节类别不平衡的超参数。基于这些认识，我们把上式综合一下：
$\begin{array}{rcl} L&=\sum_{i=1}^{S^2}\sum_{j=1}^{B}1_{ij}^{obj}[\lambda_{cood}(x_i-\hat{x_i})^2+\lambda_{cood}(y_i-\hat{y_i})^2+\lambda_{cood}(\sqrt{w_i}-\sqrt{\hat{w_i}})^2+\lambda_{cood}(\sqrt{h_i}-\sqrt{\hat{h_i}})^2+(C_i-\hat{C_i})^2]\\ &+\sum_{i=1}^{S^2}\sum_{j=1}^{B}1_{ij}^{noobj}[\lambda_{noobj}(C_i-\hat{C_i})^2]\\ &+\sum_{i=1}^{S^2}1_{ij}^{obj}\sum_{c\in classes}(p_i(c)-\hat{p_i}(c))^2 \end{array}$
论文中作者设置两个超参数 $\lambda$ 是因为input中的对象数量很少，所以会产生大量的无对象bbox，所以作者设置了10倍差距的 $\lambda$ 来抵消巨大的不平衡。