YOLO v3实现 Part2

最新推荐文章于 2024-01-28 23:54:43 发布

不二青衣

最新推荐文章于 2024-01-28 23:54:43 发布

阅读量438

点赞数

分类专栏： YOLO算法

YOLO算法专栏收录该内容

7 篇文章 1 订阅

订阅专栏

这是关于从头实现YOLO v3检测器的教程的第2部分。在上一部分中，我解释了YOLO是如何工作的，在这一部分中，我们将实现YOLO在PyTorch中使用的图层。换句话说，这是我们创建模型构建块的部分。

The code for this tutorial is designed to run on Python 3.5, and PyTorch 0.4. It can be found in it’s entirety at this Github repo.

This tutorial is broken into 5 parts:

Part 1 : Understanding How YOLO works
Part 2 (This one): Creating the layers of the network architecture
Part 3 : Implementing the the forward pass of the network
Part 4 : Objectness Confidence Thresholding and Non-maximum Suppression
Part 5 : Designing the input and the output pipelines

先决条件

第一部分的教程/知识YOLO如何工作。
PyTorch基本使用知识, 包括如何创建 nn.Module, nn.Sequential 的自定义框架和 torch.nn.parameter 类.

我假设你以前已经有过使用PyTorch的经验。如果您刚刚开始，我建议您在返回本文之前先研究一下这个框架。

开始

首先创建一个检测器代码所在的目录。

然后，创建一个文件darknet.py。Darknet是YOLO底层架构的名称。这个文件将包含创建YOLO网络的代码。我们将用一个名为util.py 的文件,其包含各种帮助函数的代码,对Darknet进行补充。将这两个文件保存在检测器文件夹中。您可以使用git来跟踪更改。

配置文件

官方代码(用C编写)使用配置文件构建网络。cfg 文件一块一块的描述网络的设计。如果您来自caffe背景，它相当于.protxt文件，用于描述网络。

我们将使用作者发布的官方cfg文件来构建我们的网络。从这里下载它 here ，并将它放在名为cfg的检测器目录文件夹中。如果你在Linux上，cd进入你的网络目录并输入:

mkdir cfg
cd cfg
wget https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg

如果你打开配置文件，你会看到以下内容：

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

上面我们看到4个模块。其中3个描述卷积层，后面是一个 shortcut层。 shortcut层是一个跳过连接，就像在ResNet中使用的连接一样。YOLO中有5种图层:

Convolutional

[convolutional]
batch_normalize=1  
filters=64  
size=3  
stride=1  
pad=1  
activation=leaky

Shortcut

[shortcut]
from=-3  
activation=linear

shortcut 是跳过连接，类似于ResNet中使用的连接。from参数为-3，意味着shortcut 的输出是通过添加前一层的特征映射和 shortcut layer倒数第三层特征映射得到的。

Upsample

[upsample]
stride=2

使用双线性上采样，将上一层的feature map按步幅stride=2向上采样。

Route

[route]
layers = -4

[route]
layers = -1, 61

Route层需要说明一下。它有一个属性层，可以有一个值，也可以有两个值。

当layers属性只有一个值时，它输出该值索引层的特征映射。在我们的例子中，它是-4，所以该层将会输出Route层中倒数第四层的特征映射。

当layers有两个值时，它返回由该值索引的层的特征映射连接。在我们的例子中，它是- 1,61，该层将输出来自前一层(-1)和第61层的特征映射，并沿深度维度来连接。

YOLO

[yolo]
mask = 0,1,2
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=80
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=1

YOLO层对应于第1部分描述的检测层。anchors描述了9个锚点，但是只使用由mask标签的属性索引的锚点。这里，mask的值是0,1,2，这意味着使用了第一个、第二个和第三个锚。这是有意义的，因为检测层的每个单元预测3个盒子。我们总共有3个尺度的探测层，一共9个锚点。

Net

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width= 320
height = 320
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

cfg中还有另一种类型的块叫做net，但是我不把它称为层，因为它只描述关于网络输入和训练参数的信息。在YOLO的前向传播中没有用到。但是，它确实为我们提供了诸如网络输入大小之类的信息，我们使用这些信息来调整前向传播中的锚点。

解析配置文件

开始之前，在darknet.py的顶部添加必要的导入文件。

from __future__ import division

import torch 
import torch.nn as nn
import torch.nn.functional as F 
from torch.autograd import Variable
import numpy as np

我们定义了一个名为parse_cfg的函数，它以配置文件的路径作为输入。

def parse_cfg(cfgfile):
    """
    Takes a configuration file
    
    Returns a list of blocks. Each blocks describes a block in the neural
    network to be built. Block is represented as a dictionary in the list
    
    """

这里内容是解析cfg，并将每个块存储为一个字典，块的属性及其值作为键-值对存储在字典中。在解析cfg时，我们不断地将这些由代码中的变量block表示的dicts,添加到一个列表blocks中。我们的函数将返回这个block。

首先我们将cfg文件的内容保存在字符串列表中。下面的代码对这个列表执行一些预处理。

file = open(cfgfile, 'r')
lines = file.read().split('\n')                        # 将行存储在列表中
lines = [x for x in lines if len(x) > 0]               # 读取空行 
lines = [x for x in lines if x[0] != '#']              # 删除注释
lines = [x.rstrip().lstrip() for x in lines]           # 删除边缘空白

然后，我们循环遍历结果列表以获取块。

block = {}
blocks = []

for line in lines:
    if line[0] == "[":               # 这标志着一个新块的开始
        if len(block) != 0:          # 如果块不是空的，表示它正在存储前一个块的值。
            blocks.append(block)     # 将它添加到块列表中
            block = {}               # 重新初始化块
        block["type"] = line[1:-1].rstrip()     
    else:
        key,value = line.split("=") 
        block[key.rstrip()] = value.lstrip()
blocks.append(block)

return blocks

创建构建块

现在我们将使用上面parse_cfg返回的列表，为配置文件中显示的块构造PyTorch模块。

我们在列表中有5种类型的层(上面提到过)。PyTorch为convolutional 和upsample类型提供了预构建的层。我们必须通过扩展 nn.Module 类来为其余的层编写自己的模块。

create_modules函数接受parse_cfg函数返回的列表blocks。

def create_modules(blocks):
    net_info = blocks[0]     #获取有关输入和预处理的信息
    module_list = nn.ModuleList()
    prev_filters = 3
    output_filters = []

在遍历块列表之前，我们定义一个变量net_info来存储关于网络的信息。

nn.ModuleList

我们的函数将返回一个 nn.ModuleList。这个类非常像一个包含 nn.Module 的普通列表。但是，当我们将nn.ModuleList 作为 nn.Module对象的成员进行添加时(即当我们将模块添加到我们网络时)， nn.ModuleList 中 nn.Module 对象的所有参数，也被当作 nn.Module 对象的参数添加。(我们正在添加nn.ModuleList作为成员的网络 )。

当我们定义一个新的卷积层时，我们必须定义它的内核的维数。虽然内核的高度和宽度是由cfg文件提供的，但是内核的深度正是上一层中出现的过滤器的数量(或特征映射的深度)。这意味着我们需要持续跟踪卷积层所应用的过滤器数量。我们使用变量prev_filter来完成此操作。我们初始化为3，因为图像有3个过滤器对应于RGB通道。

Route层从以前的层带来(可能是连接的)特征映射。如果在Route层的正前方有一个卷积层，那么内核就会被应用到前一层的特征映射上，也就是Route层所带来的特征映射上。因此，我们不仅需要跟踪前一层中的过滤器数量，还需要跟踪前一层中的每个过滤器。在迭代时，我们将每个块的输出过滤器数量添加到output_filters列表中。

现在，我们的想法是迭代块列表，并为每个块创建一个PyTorch模块。

    for index, x in enumerate(blocks[1:]):
        module = nn.Sequential()

        #检查block类型
        #为block创建一个新的模块
        #添加到module_list

nn.Sequential 类用于按顺序执行多个 nn.Module 对象。如果您查看cfg，就会发现一个块可能包含多个层。例如， convolutional 类型的块除了卷积层之外，还有批处理规范层和泄漏ReLU激活层。我们使用 nn.Sequential 将这些层串在一起，它是add_module函数。例如，下面是我们创建卷积层和upsample层的方法。

        if (x["type"] == "convolutional"):
            #获取关于图层的信息
            activation = x["activation"]
            try:
                batch_normalize = int(x["batch_normalize"])
                bias = False
            except:
                batch_normalize = 0
                bias = True

            filters= int(x["filters"])
            padding = int(x["pad"])
            kernel_size = int(x["size"])
            stride = int(x["stride"])

            if padding:
                pad = (kernel_size - 1) // 2
            else:
                pad = 0

            #Add the convolutional layer
            conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias)
            module.add_module("conv_{0}".format(index), conv)

            #Add the Batch Norm Layer
            if batch_normalize:
                bn = nn.BatchNorm2d(filters)
                module.add_module("batch_norm_{0}".format(index), bn)

            #检查激活
            #对于YOLO来说，它要么是Linear，要么是Leaky ReLU
            if activation == "leaky":
                activn = nn.LeakyReLU(0.1, inplace = True)
                module.add_module("leaky_{0}".format(index), activn)

        #如果是向上采样层
        #我们使用 Bilinear2dUpsampling
        elif (x["type"] == "upsample"):
            stride = int(x["stride"])
            upsample = nn.Upsample(scale_factor = 2, mode = "bilinear")
            module.add_module("upsample_{}".format(index), upsample)

Route Layer / Shortcut Layers

接下来，我们编写创建 Route 和 Shortcut的代码。

        #如果是 route layer
        elif (x["type"] == "route"):
            x["layers"] = x["layers"].split(',')
            #Start  of a route
            start = int(x["layers"][0])
            #end, if there exists one.
            try:
                end = int(x["layers"][1])
            except:
                end = 0
            #Positive anotation
            if start > 0: 
                start = start - index
            if end > 0:
                end = end - index
            route = EmptyLayer()
            module.add_module("route_{0}".format(index), route)
            if end < 0:
                filters = output_filters[index + start] + output_filters[index + end]
            else:
                filters= output_filters[index + start]

        #shortcut 对应跳转链接
        elif x["type"] == "shortcut":
            shortcut = EmptyLayer()
            module.add_module("shortcut_{}".format(index), shortcut)

创建Route层的代码应该得到合理的解释。首先，我们提取layers属性的值，将其转换为一个整数并存储在一个列表中。

然后我们有一个名为Emptylayer的新层，顾名思义，它就是一个空层。

route = EmptyLayer()

它的定义是。

class EmptyLayer(nn.Module):
    def __init__(self):
        super(EmptyLayer, self).__init__()

Wait, an empty layer?

现在，空层可能看起来很奇怪，因为它什么都不做。Route层，就像任何其他层执行一个操作(前一个层/连接)。在PyTorch中，当我们定义一个新层时，我们子类化nn.Module ,并在nn.Moduleobject对象的foward函数中编写该层执行的操作。

要为Route块设计一个层，我们必须构建一个nn.Module对象，该对象初始化时属性层的值作为它的成员。然后，我们可以在forward函数中编写连接/提出feature map的代码。最后，在网络的forward函数中执行这一层。

但是，考虑到连接的代码相当简短(在feature map上调用torch.cat)，按照上面的方法设计一个层将导致不必要的抽象。相反，我们所能做的是用一个虚拟层代替提议的route层，然后直接在表示darknet的nn.Module对象的forward函数中执行连接。(如果最后一行对您来说没有太大意义，我建议您阅读nn.Module类在PyTorch中是如何使用的。)

位于route层前面的卷积层将它的内核应用于(可能是连接的)来自前一层的feature map。下面的代码更新 filters 变量，以保存路由层输出的 filters 的数量。

if end < 0:
    #如果我们连接映射
    filters = output_filters[index + start] + output_filters[index + end]
else:
    filters= output_filters[index + start]

Shortcut层还使用空层，因为它还执行一个非常简单的操作(添加)。没有必要更新filters变量，因为它只是将前一层的feature map添加到后一层的feature map中。

YOLO Layer

最后，我们编写创建YOLO层的代码。

        #Yolo is the detection layer
        elif x["type"] == "yolo":
            mask = x["mask"].split(",")
            mask = [int(x) for x in mask]

            anchors = x["anchors"].split(",")
            anchors = [int(a) for a in anchors]
            anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)]
            anchors = [anchors[i] for i in mask]

            detection = DetectionLayer(anchors)
            module.add_module("Detection_{}".format(index), detection)

我们定义了一个新的层 DetectionLayer ，它包含用于检测边界框的锚点。

检测层定义如下：

class DetectionLayer(nn.Module):
    def __init__(self, anchors):
        super(DetectionLayer, self).__init__()
        self.anchors = anchors

在循环的最后，我们做一些薄记。

        module_list.append(module)
        prev_filters = filters
        output_filters.append(filters)

这就是循环体的结尾。在函数create_modules的末尾，我们返回一个包含net_info和module_list的元组。

return (net_info, module_list)

测试代码

你可以通过在darknet.py末尾输入以下代码行并运行文件来测试您的代码。

blocks = parse_cfg("cfg/yolov3.cfg")
print(create_modules(blocks))

您将看到一个很长的列表(恰好包含106个条目)，其中的元素看起来是这样的

.
.

  (9): Sequential(
     (conv_9): Conv2d (128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
     (batch_norm_9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
     (leaky_9): LeakyReLU(0.1, inplace)
   )
   (10): Sequential(
     (conv_10): Conv2d (64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (batch_norm_10): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
     (leaky_10): LeakyReLU(0.1, inplace)
   )
   (11): Sequential(
     (shortcut_11): EmptyLayer(
     )
   )
.
.
.

这部分就讲到这里。在下一部分中，我们将组装已创建的构建块，以从图像生成输出。

扩展阅读