【深度学习】从零开始用pytorch实现YOLOv3

最新推荐文章于 2024-09-12 07:41:45 发布

Hanawh

最新推荐文章于 2024-09-12 07:41:45 发布

阅读量1.3k

点赞数

分类专栏：深度学习文章标签：深度学习

本文链接：https://blog.csdn.net/qq_36530992/article/details/102805677

版权

本教程详细介绍了如何从零开始使用PyTorch实现YOLOv3目标检测模型。内容包括YOLOv3的网络架构、前馈传播、分数阈值处理、非极大值抑制、输入输出管道设计以及训练过程。文章阐述了YOLOv3的工作原理，如预测框的计算、物体概率和类别置信度，并探讨了不同尺度的预测及其在特征图上的应用。

摘要由CSDN通过智能技术生成

Tutorial

About YOLOv3

预测框与cell

You expect each cell of the feature map to predict an object through one of it’s bounding boxes if the center of the object falls in the receptive field of that cell.We divide the input image into a grid just to determine which cell of the prediction feature map is responsible for prediction

首先将输入图像划分成特征图大小的网格，每个网格就是一个cell。
一个物体的gt box的中心落在哪个cell，那么这个cell就负责预测这个物体。
将这个cell所对应的特征图上的点分配为负责检测该物体的单元格。
一个cell可以预测三个框，只有一个框能被指定为gt label，而这个框与gt box的IoU最高。

网络输出与预测框

It might make sense to predict the width and the height of the bounding box, but in practice, that leads to unstable gradients during training. Instead, most of the modern object detectors predict log-space transforms, or simply offsets to pre-defined default bounding boxes called anchors.

通过下列式子可以将网络输出值转换为预测框的值，YOLO预测的值是相对于负责预测物体的框所在cell左上角的offsets，并被特征图大小进行归一化。
在这里插入图片描述

Objectness Score代表了一个物体在预测框的概率，经过了sigmoid函数。

Class Confidences
在v3之前，通过softmax函数来预测属于哪个类别，而v3中则用sigmoid函数。

不同尺度的预测
在三个不同尺度的feature map上进行检测，分别是 $13\times 13$ 、 $26\times 26$ 、 $52\times 52$ ，通过upsample来一层一层检测，从而有助于检测小物体。

The network downsamples the input image until the first detection layer, where a detection is made using feature maps of a layer with stride 32. Further, layers are upsampled by a factor of 2 and concatenated with feature maps of a previous layers having identical feature map sizes. Another detection is now made at layer with stride 16. The same upsampling procedure is repeated, and a final detection is made at the layer of stride 8.

输出处理
对于 $416\times416$ 的输入图像，一共有 $((13\times 13)+(26\times 26)+(52\times 52))\times3 = 10674$ 个预测框，但是一张图的目标数很少，如何去排除不必要的框。

首先通过Objectness Score是否通过阈值来排除一些预测框
其次再通过NMS非极大值抑制来继续排除

构建网络架构

下载cfg文件：

如上图所示，是yolov3的个模块图，图中所示第一个卷积块filters为255是因为 $3\times(80+5)$ ，这些模块构造的并不是基本的darknet52的网络，而是结合darknet52中三种尺度的特征图和FPN所构造的整个检测框架。

route
layers为一个值的时候，比如说-4，则代表输出从该层起向前数4层的feature map，而如果为两个值的话，则代表将两层的输出按通道拼接。
yolo
一共有9个anchors，mask代表该特征图所起作用的anchors。

构建create_modules函数

构建route和shortcut层的时候，可以构建一个空模块，直到前向传播的时候完善功能。
```
class EmptyLayer(nn.Module):
	def __init__(self):
    	super(EmptyLayer, self).__init__()
```
可以适当的用try-except去代替if-else
卷积块的padding = (kernel_size-1)//2