YOLO3的一些理解

最新推荐文章于 2024-06-14 15:28:36 发布

玄云飘风

最新推荐文章于 2024-06-14 15:28:36 发布

阅读量1.7k

点赞数 1

分类专栏：论文阅读 CV 文章标签： YOLOv3

本文链接：https://blog.csdn.net/tfcy694/article/details/91049323

版权

CV 同时被 2 个专栏收录

35 篇文章 4 订阅

订阅专栏

论文阅读

23 篇文章 1 订阅

订阅专栏

最近实习用到了YOLOv3，该模型的论文写得很飘逸，网上也有很多不同版本的解读和实现，但是总觉得大家的观点都有点飘，不深刻。这里把遇到的问题和找到的解答逐个做一些记录。由于我这里代码主要是基于caffe的，所以重点不会放到Keras和PyTorch实现上，这两种版本的代码仅作参考。
本文发布时不是最终版本，后续会一边理解一边完善。

关于mask

https://blog.csdn.net/Julialove102123/article/details/79836975 ：
Every layer has to know about all of the anchor boxes but is only predicting some subset of them. This could probably be named something better but the mask tells the layer which of the bounding boxes it is responsible for predicting. The first yolo layer predicts 6,7,8 because those are the largest boxes and it’s at the coarsest scale. The 2nd yolo layer predicts some smallers ones, etc.

The layer assumes if it isn’t passed a mask that it is responsible for all the bounding boxes, hence the ifstatement thing.

关于roi

论文和大部分blog都没有看到关于roi的介绍。然而代码中每层金字塔却给出了大小为200的roi，这里还需要再问问大佬。

补充：roi表示每张图最多拥有的目标数量，主要用于C／C++的数组申请和内存管理。

关于金字塔结构

在这里插入图片描述

关于loss

关于loss的形式，网上有一些基于BCE loss和MSE loss的分歧，见：
https://stackoverflow.com/questions/55395205/what-is-the-loss-function-of-yolov3
根据darknet代码和类似的caffe代码，采用sigmoid-1的形式传递梯度。
https://github.com/AlexeyAB/darknet/issues/1695#issuecomment-426016524
https://github.com/AlexeyAB/darknet/issues/1845#issuecomment-434079752
objness的loss计算见：
https://stats.stackexchange.com/questions/373266/yolo-v3-loss-function

关于预测输出

每个anchor中的输出主要包含x, y, h, w, objness, classes, 具体见：
https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/
这位小哥基于PyTorch实现了YOLO3并用一个系列的博客详细介绍了推理过程，可惜没有介绍训练过程（尤其是loss），太遗憾了。

some refs

https://blog.csdn.net/weixin_42078618/article/details/85005428
https://blog.csdn.net/weixin_42078618/article/details/87787919

玄云飘风

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
YOLO3的一些理解

关于maskhttps://blog.csdn.net/Julialove102123/article/details/79836975 ：Every layer has to know about all of the anchor boxes but is only predicting some subset of them. This could probably be named s...
复制链接

扫一扫

专栏目录