从头实现YOLOv3：第3部分

最新推荐文章于 2024-07-23 12:52:57 发布

ManiacLook

最新推荐文章于 2024-07-23 12:52:57 发布

阅读量1.7k

点赞数 1

文章标签：深度学习计算机视觉目标检测

本文链接：https://blog.csdn.net/ManiacLook/article/details/121473707

版权

原文为英文，进行了翻译和部分修改，原文地址
代码地址：github仓库、ACgit仓库
相关内容：
YOLOv3论文翻译
 YOLOv3原理及流程简述
 从头实现YOLOv3：第1部分
 从头实现YOLOv3：第2部分
 从头实现YOLOv3：第4部分
 从头实现YOLOv3：第5部分

第3部分：实现网络的前向传播

这是从头实现 YOLO v3 检测器教程的第 3 部分。在上一部分中，我们实现了 YOLO 架构中使用的层，在这一部分中，我们将在 PyTorch 中实现 YOLO 的网络结构，从而生成给定图像的输出。

我们的目标是设计网络的前向传播。

定义网络

正如我之前指出的，我们使用 nn.Module 类在 PyTorch 中构建自定义网络。让我们为检测器定义一个网络。在darknet.py 文件中，我们添加以下类：

class Darknet(nn.Module):
    def __init__(self, cfgfile):
        super(Darknet, self).__init__()
        self.blocks = parse_cfg(cfgfile)
        self.net_info, self.module_list = create_modules(self.blocks)

在这里，我们继承 nn.Module 类，并将我们的类命名为 Darknet。我们用成员参数、blocks、net_info 和 module_list 初始化网络。

实现网络的前向传递

网络的前向传递是通过覆盖 nn.Module 类的 forward 方法来实现的。

forward有两个目的。首先，计算输出，其次，以更容易处理的方式转换输出的检测特征图（例如转换它们维度，以便可以连接多个维度的检测图，否则尺寸不同就不能连接）。

    def forward(self, x, CUDA):  # x 是当前层输入，即图像特征
        modules = self.blocks[1:]
        outputs = {}  # We cache the outputs for the route layer

forward 接受三个参数，self、输入 x 和 CUDA，如果CUDA为真，将使用 GPU 来加速前向传播。

在这里，我们迭代 self.blocks[1:] 而不是 self.blocks，因为 self.blocks 的第一个元素是一个net块，它不是前向传播的一部分。

由于 route 和 shorcut 层需要来自前面层的输出特征图，我们将每一层的输出特征图缓存在字典outputs中。key 是层的索引，value是特征图。

与 create_modules 函数的情况一样，我们现在迭代包含网络模块的 module_list。这里要注意的是，模块的添加顺序与它们在配置文件中的顺序相同。这意味着可以简单地通过顺序运行每个模块来获得输出。

	write = 0  # This is explained a bit later
    for i, module in enumerate(modules):
        module_type = (module["type"])

卷积层和上采样层

如果模块是卷积或上采样模块，工作如下：

        if module_type == "convolutional" or module_type == "upsample":
            x = self.module_list[i](x)  # 进行卷积或上采样

route / shortcut 层

对于 route 层的代码，必须考虑两种情况（如第 2 部分所述）。对于必须连接两个特征图的情况，使用 torch.cat 函数，第二个参数为 1。这是因为要沿深度连接特征图。（在 PyTorch 中，卷积层的输入和输出格式为 B X C X H X W。深度对应通道维度 C）。

            elif module_type == "route":
                layers = module["layers"]
                layers = [int(a) for a in layers]

                if layers[0] > 0:
                    layers[0] = layers[0] - i

                if len(layers) == 1:
                    x = outputs[i + layers[0]]
                else:
                    if layers[1] > 0:
                        layers[1] = layers[1] - i
                    map1 = outputs[i + layers[0]]
                    map2 = outputs[i + layers[1]]
					# 连接特征图
                    x = torch.cat((map1, map2), 1)

            elif module_type == "shortcut":
                from_ = int(module["from"])
                # 两层特征图相加
                x = outputs[i - 1] + outputs[i + from_]

YOLO(检测层)

YOLO 的输出是一个卷积特征图，其中包含沿特征图深度的边界框属性。单元格预测的属性边界框彼此一个接一个堆叠。因此，如果您必须在 (5,6) 处访问单元格的第二个边界，则必须通过 map[5,6, (5+C): 2*(5+C)] 对其进行索引，也就是说卷积特征图维度是 13 x 13 x (3 x (5 + C))，(5,6)处的单元格的 3 个边界框的属性分别在 map[5, 6, :(5 + C)]、map[5, 6, (5 + C) : 2 * (5 + C)]、map[5, 6, 2 * (5 + C) : 3 * (5 + C)]中。这种形式对于输出处理非常不方便，例如通过目标置信度进行阈值处理、向中心添加网格偏移、应用 anchor 等。

另一个问题是，由于检测发生在三个尺度上，因此检测图的维度会有所不同。虽然三个特征图的维度不同，但对它们进行的输出处理操作很相似。必须对单个张量而不是三个分开的张量执行这些操作。

为了解决这些问题，我们引入了函数 predict_transform：

转换输出

函数 predict_transform 位于文件 util.py 中，在util.py的前面引入包：

from __future__ import division

import torch 
import torch.nn as nn
import torch.nn.functional as F 
from torch.autograd import Variable
import numpy as np
import cv2

predict_transform 接受 5 个参数；prediction（输出）、inp_dim（输入图像维度）、anchors、num_classes 和一个可选的 CUDA 标志

def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):

predict_transform 函数接收检测特征图并将其转换为二维张量，其中张量的每一行对应于边界框的属性，按以下顺序排列：
在这里插入图片描述

这是进行上述转换的代码：

	batch_size = prediction.size(0)  # 批大小
    stride = inp_dim // prediction.size(2)  # 步长
    grid_size = inp_dim // stride  # 特征图尺寸
    bbox_attrs = 5 + num_classes  # 坐标+置信度+类别
    num_anchors = len(anchors)  # 使用的 anchor 个数

    prediction = prediction.view(batch_size, bbox_attrs * num_anchors, grid_size * grid_size)
    prediction = prediction.transpose(1, 2).contiguous()
    # 将批次中每个特征图的每个单元格的每个边界框按维度 1 连接在一起
    prediction = prediction.view(batch_size, grid_size * grid_size * num_anchors, bbox_attrs)

anchors 的尺寸根据net块的高度和宽度属性。这些属性描述了输入图像的尺寸，它比检测图大（乘一个步幅因子）。因此必须通过检测特征图的步幅来划分 anchors。

	anchors = [(a[0] / stride, a[1] / stride) for a in anchors]

现在，需要根据在第 1 部分中讨论的方程来转换输出。

对 x, y 坐标和 objectness score 进行 Sigmoid:

	# Sigmoid the  centre_X, centre_Y. and object confidencce
    prediction[:, :, 0] = torch.sigmoid(prediction[:, :, 0])
    prediction[:, :, 1] = torch.sigmoid(prediction[:, :, 1])
    prediction[:, :, 4] = torch.sigmoid(prediction[:, :, 4])

将网格偏移添加到中心坐标预测中。

	# Add the center offsets
    grid = np.arange(grid_size)
    a, b = np.meshgrid(grid, grid)

    x_offset = torch.FloatTensor(a).view(-1, 1)
    y_offset = torch.FloatTensor(b).view(-1, 1)

    if CUDA:
        x_offset = x_offset.cuda()
        y_offset = y_offset.cuda()

    x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1, num_anchors).view(-1, 2).unsqueeze(0)

    prediction[:, :, :2] += x_y_offset

用 anchor 调整边界框尺寸：

	# log space transform height and the width
    anchors = torch.FloatTensor(anchors)

    if CUDA:
        anchors = anchors.cuda()

    anchors = anchors.repeat(grid_size * grid_size, 1).unsqueeze(0)
    prediction[:, :, 2:4] = torch.exp(prediction[:, :, 2:4]) * anchors

将 sigmoid 激活应用于类分数:

	prediction[:, :, 5:5 + num_classes] = torch.sigmoid((prediction[:, :, 5:5 + num_classes]))

要做的最后一件事是将检测图的坐标和尺寸调整为输入图像的大小。此处的边界框属性根据特征图（例如 13 x 13）调整大小。如果输入图像为 416 x 416，我们将属性乘以 32，即步长变量。

	prediction[:, :, :4] *= stride

循环体到此结束，返回函数末尾的prediction。

	return prediction

重新审视检测层

现在已经转换了输出张量，可以将三个不同尺度的特征图连接成一个大张量。请注意，在转换之前是不可能连接的，因为无法连接具有不同空间维度的特征图。但是现在，我们的输出张量仅仅是一个以边界框作为行的表格，可以进行连接。

注意函数 forward 中循环之前的 write = 0 行。write 标志用于指示是否遇到了第一次检测。如果 write 为 0，则表示 detections 尚未初始化。如果为 1，则表示 detections 已初始化，可以将特征图与它连接。

现在，我们已经用 predict_transform 函数改变了特征图形状，在forward函数中编写处理检测特征图的代码。

在 darknet.py 文件的顶部，添加以下导入：

from util import *

然后，在forward函数中：

            elif module_type == "yolo":
                # 获得当前尺度下的 anchor
                anchors = self.module_list[i][0].anchors
                # 输入图像的尺寸
                inp_dim = int(self.net_info["height"])
                # 分类个数
                num_classes = int(module["classes"])

                # 转换特征图形状
                x = x.data
                x = predict_transform(x, inp_dim, anchors, num_classes, CUDA)
                if not write:  # 还没初始化
                    detections = x
                    write = 1
                else:
                    detections = torch.cat((detections, x), 1)

            outputs[i] = x

现在，只需返回detections：

	return detections

测试前向传播

这是一个创建虚拟输入的函数。我们将把这个输入传递给我们的网络。在编写此函数之前，将此图像保存到工作目录中。

现在，在 darknet.py 文件顶部定义函数，如下所示：

def get_test_input():
    img = cv2.imread("dog-cycle-car.png")
    img = cv2.resize(img, (416, 416))  # Resize to the input dimension
    img_ = img[:, :, ::-1].transpose((2, 0, 1))  # BGR -> RGB & H X W X C -> C X H X W
    img_ = img_[np.newaxis, :, :, :] / 255.0  # Add a channel at 0 (for batch) | Normalise
    img_ = torch.from_numpy(img_).float()  # Convert to float
    img_ = Variable(img_)  # Convert to Variable 方便求梯度
    return img_

然后，输入下列代码：

model = Darknet("cfg/yolov3.cfg")
inp = get_test_input()
pred = model(inp, torch.cuda.is_available())
print(pred)

你会看到类似下面的输出：

tensor([[[1.5323e+01, 1.4370e+01, 1.5629e+02,  ..., 4.0777e-01,
          3.9062e-01, 5.0815e-01],
         [1.5169e+01, 1.7085e+01, 1.3182e+02,  ..., 4.7767e-01,
          5.5244e-01, 4.5476e-01],
         [2.0304e+01, 1.7789e+01, 2.7705e+02,  ..., 5.3866e-01,
          5.4552e-01, 4.3233e-01],
         ...,
         [4.1158e+02, 4.1178e+02, 8.6914e+00,  ..., 5.9036e-01,
          4.4941e-01, 4.1893e-01],
         [4.1313e+02, 4.1173e+02, 1.9307e+01,  ..., 4.7306e-01,
          5.2066e-01, 5.4644e-01],
         [4.1162e+02, 4.1154e+02, 2.6846e+01,  ..., 4.5669e-01,
          4.6721e-01, 4.9838e-01]]])

Process finished with exit code 0

这个张量的形状是 1 x 10647 x 85。第一个维度是batch大小，它只是 1，因为使用了单个图像。对于batch中的每个图像，我们有一个 10647 x 85 的表。每个表的行代表一个边界框。（4个bbox坐标属性，1个objectness score，80个class score）。

此时，网络具有随机权重，不会产生正确的输出。需要在网络中加载一个权重文件。为此，我们将使用官方权重文件。

下载预训练权重

将权重文件下载到检测器目录中。从这里获取权重文件。或者，如果使用的是 linux

wget https://pjreddie.com/media/files/yolov3.weights

了解权重文件

官方权重文件是二进制文件，其中包含以串行方式存储的权重。

读取权重时必须格外小心。权重只是存储为浮点数，没有任何东西可以告诉我们它们属于哪一层。不要弄错了，比如说，将批规范层的权重加载到卷积层的权重中。由于只读取浮点数无法区分哪个权重属于哪个层，因此，必须了解权重是如何存储的。

首先，只有两种类型的层拥有权重，批处理规范层或卷积层。

这些层的权重完全按照它们在配置文件中出现的顺序存储。因此，如果一个卷积后跟一个shortcut块，然后是shortcut块后的另一个卷积层，文件相应位置中包含的先是前一个卷积块的权重，然后是后面的权重。

当批处理规范层出现在卷积块中时，没有偏差。但是，当没有批处理规范层时，必须从文件中读取偏差“权重”。

下图总结了权重如何存储权重：

wts-1

加载权重

让我们写一个函数加载权重，它是 Darknet 类的成员函数，采用除 self 之外的一个参数，即权重文件的路径。

def load_weights(self, weightfile):

权重文件的前 160 个字节存储 5 个 int32 值，这些值构成了文件的头部。

	# Open the weights file
    fp = open(weightfile, "rb")

    # The first 5 values are header information
    # 1. Major version number
    # 2. Minor Version Number
    # 3. Subversion number
    # 4,5. Images seen by the network (during training)
    header = np.fromfile(fp, dtype=np.int32, count=5)
    self.header = torch.from_numpy(header)
    self.seen = self.header[3]

其余位表示权重，按上述顺序排列。权重存储为 float32 或 32 位浮点数。让我们在 np.ndarray 中加载其余的权重。

	weights = np.fromfile(fp, dtype=np.float32)

现在，遍历权重文件，并将权重加载到网络的模块中。

	ptr = 0
    for i in range(len(self.module_list)):
        model_type = self.blocks[i + 1]["type"]
        # If module_type is convolutional load weights
        # Otherwise ignore.

进入循环，首先检查卷积块batch_normalise是否为真。基于此来加载权重。

        if model_type == "convolutional":
        	model = self.module_list[i]
            try:  # 批量规范层
                batch_normalize = int(self.blocks[i + 1]["batch_normalize"])
            except:
                batch_normalize = 0
            conv = model[0]  # 卷积层

保留一个名为 ptr 的变量来跟踪我们在权重数组中的位置。现在，如果 batch_normalize 为 True，按如下方式加载权重。

            if batch_normalize:
                bn = model[1]
                # Get the number of weights of Batch Norm Layer
                num_bn_biases = bn.bias.numel()

                # Load the weights
                bn_biases = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
                ptr += num_bn_biases
                bn_weights = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
                ptr += num_bn_biases
                bn_running_mean = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
                ptr += num_bn_biases
                bn_running_var = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
                ptr += num_bn_biases

                # Cast the loaded weights into dims of model weights.
                bn_biases = bn_biases.view_as(bn.bias.data)
                bn_weights = bn_weights.view_as(bn.weight.data)
                bn_running_mean = bn_running_mean.view_as(bn.running_mean)
                bn_running_var = bn_running_var.view_as(bn.running_var)

                # Copy the data to model
                bn.bias.data.copy_(bn_biases)
                bn.weight.data.copy_(bn_weights)
                bn.running_mean.copy_(bn_running_mean)
                bn.running_var.copy_(bn_running_var)

如果batch_norm 不为真，只需加载卷积层的偏差。

            else:
                # Number of biases
                num_biases = conv.bias.numel()
                # Load the weights
                conv_biases = torch.from_numpy(weights[ptr:ptr + num_biases])
                ptr += num_biases
                # reshape the loaded weights according to the dims of the model weights
                conv_biases = conv_biases.view_as(conv.bias.data)
                # Finally copy the data
                conv.bias.data.copy_(conv_biases)

最后加载卷积层的权重。

			# Let us load the weights for the Convolutional layers
            num_weights = conv.weight.numel()
        
            # Do the same as above for weights
            conv_weights = torch.from_numpy(weights[ptr:ptr + num_weights])
            ptr += num_weights
            
            conv_weights = conv_weights.view_as(conv.weight.data)
            conv.weight.data.copy_(conv_weights)

我们已经完成了这个函数，现在可以通过调用Darknet对象上的 load_weights 函数在Darknet对象中加载权重。

model = Darknet("cfg/yolov3.cfg")
model.load_weights("yolov3.weights")

这就是这部分的全部内容，建立模型并加载权重后，终于可以开始检测目标了。在下一部分中，将介绍使用目标置信度阈值和非极大值抑制来生成最终检测集。

ManiacLook

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
从头实现YOLOv3：第3部分

原文为英文，进行了翻译和部分修改，原文地址第3部分：实现网络的前向传递这是从头实现 YOLO v3 检测器教程的第 3 部分。在上一部分中，我们实现了 YOLO 架构中使用的层，在这一部分中，我们将在 PyTorch 中实现 YOLO 的网络结构，以便我们可以生成给定图像的输出。我们的目标是设计网络的前向传播。定义网络正如我之前指出的，我们使用 nn.Module 类在 PyTorch 中构建自定义网络。让我们为检测器定义一个网络。在darknet.py 文件中，我们添加以下类：cla
复制链接

扫一扫