PyTorch从零开始编写YOLOv3网络和侦测代码并加载官方权重

序言

本文只涉及代码的编写,不会对YOLOv3原理做太多讲解,如果对YOLOv3原理还不够了解的,请移步YOLOv3原理详解,因为之前官方给的代码比较复杂,所以自己重新整理了一下,在本文中,对YOLOv3的网络和侦测部分的代码重新书写,代码变得更为整洁,并加载官方给的权重文件。

环境介绍

  • Windows10
  • Pytorch 1.1.0
  • cuda10
  • Pycharm

一、根据cfg文件搭建网络模型

首先去YOLO官网下载cfg文件和对应的权重文件,我这里下载的是YOLOv3-416的:
在这里插入图片描述
下载后打开cfg文件,你会看到下面这样的配置:

[net]
batch=64
subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
 
learning_rate=0.001
burn_in=1000
max_batches = 50000
policy=steps
steps=4000,45000
scales=.1,.1
 
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
 
# Downsample
 
[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky
 
......
 
[yolo]
mask = 0,1,2
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=2   #自己的类别
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

在官方给的pytorch代码中,是创建了一个函数根据cfg文件来搭建网络,具体详解我在本文中不做解释,如果对cfg文件不够了解的移步YOLOv3cfg详解,我这里只介绍用到的模块,然后根据这些模块搭建网络:

  • Convolutional (卷积模块)
  • Up Sample (上采样)
  • Residual (残差模块)
  • Downsampling (下采样,也就是网络结构中残差块之间的Convolutional)
  • ConvolutionalSet (侦测分类网络)

分别对应了下图中的网络结构模块,接下来就是对这些模块一一实现
在这里插入图片描述
首先是Convolutional (卷积模块),根据cfg文件,我们可以看到模块是固定的格式,每一个卷积模块都是由一层卷积+一层BatchNorm+一层LeakyReLU激活层构成:

定义一个ConvolutionalLayer类,卷积层需要的信息在调用时根据cfg文件自定义给定:

class ConvolutionalLayer(torch.nn.Module):

    def __init__(self, in_channels, out_channels, kernel_size, stride, padding, bias=False):
        super(ConvolutionalLayer, self).__init__()

        self.sub_module = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=bias),
            torch.nn.BatchNorm2d(out_channels),
            torch.nn.LeakyReLU(0.1)
        )


    def forward(self, x):

        return self.sub_module(x)

创建残差模块,定义一个ResidualLayer类,里面包括了一个1x1和3x3的卷积层:

class ResidualLayer(torch.nn.Module):

    def __init__(self, in_channels):
        super(ResidualLayer, self).__init__()

        self.sub_module = torch.nn.Sequential(
            ConvolutionalLayer(in_channels, in_channels // 2, 1, 1, 0),
            ConvolutionalLayer(in_channels // 2, in_channels, 3, 1, 1),
        )

    def forward(self, x):
        return x + self.sub_module(x)

同理接着创建其他模块Downsampling、UpsampleLaye、ConvolutionalSet:

class UpsampleLayer(torch.nn.Module):

    def __init__(self):
        super(UpsampleLayer, self).__init__()

    def forward(self, x):
        return torch.nn.functional.interpolate(x, scale_factor=2, mode='nearest')

class DownsamplingLayer(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(DownsamplingLayer, self).__init__()

        self.sub_module = torch.nn.Sequential(
            ConvolutionalLayer(in_channels, out_channels, 3, 2, 1)
        )

    def forward(self, x):
        return self.sub_module(x)

class ConvolutionalSet(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(ConvolutionalSet, self).__init__()

        time_channel = out_channels * 2

        self.sub_module = torch.nn.Sequential(
            ConvolutionalLayer(in_channels, out_channels, 1, 1, 0),
            ConvolutionalLayer(out_channels, time_channel, 3, 1, 1),

            ConvolutionalLayer(time_channel, out_channels, 1, 1, 0),
            ConvolutionalLayer(out_channels, time_channel, 3, 1, 1),

            ConvolutionalLayer(time_channel, out_channels, 1, 1, 0),
        )

    def forward(self, x):
        return self.sub_module(x)

模块都创建完成后,开始网络搭建,结合cfg文件和网络结构图,创建主网络,起名为Darknet:

class Darknet(nn.Module):
    def __init__(self, cls=80):
        super(Darknet, self).__init__()

        output_channel = 3 * (5 + cls)

        self.trunk52 = nn.Sequential(
            ConvolutionalLayer(3, 32, 3, 1, 1),
            DownsamplingLayer(32, 64),

            ResidualLayer(64),

            DownsamplingLayer(64, 128),

            ResidualLayer(128),
            ResidualLayer(128),

            DownsamplingLayer(128, 256),

            ResidualLayer(256),
            ResidualLayer(256),
            ResidualLayer(256),
            ResidualLayer(256),
            ResidualLayer(256),
            ResidualLayer(256),
            ResidualLayer(256),
            ResidualLayer(256),
        )

        self.trunk26 = nn.Sequential(
            DownsamplingLayer(256, 512),

            ResidualLayer(512),
            ResidualLayer(512),
            ResidualLayer(512),
            ResidualLayer(512),
            ResidualLayer(512),
            ResidualLayer(512),
            ResidualLayer(512),
            ResidualLayer(512),
        )

        self.trunk13 = nn.Sequential(
            DownsamplingLayer(512, 1024),

            ResidualLayer(1024),
            ResidualLayer(1024),
            ResidualLayer(1024),
            ResidualLayer(1024),
        )

        self.con_set13 = nn.Sequential(
            ConvolutionalSet(1024, 512)
        )

        self.predict_one = nn.Sequential(
            ConvolutionalLayer(512, 1024, 3, 1, 1),
            nn.Conv2d(1024, output_channel, 1, 1, 0)
        )

        self.up_to_26 = nn.Sequential(
            ConvolutionalLayer(512, 256, 1, 1, 0),
            UpsampleLayer()
        )

        self.con_set26 = nn.Sequential(
            ConvolutionalSet(768, 256)
        )

        self.predict_two = nn.Sequential(
            ConvolutionalLayer(256, 512, 3, 1, 1),
            nn.Conv2d(512, output_channel, 1, 1, 0)
        )

        self.up_to_52 = nn.Sequential(
            ConvolutionalLayer(256, 128, 1, 1, 0),
            UpsampleLayer()
        )

        self.con_set52 = nn.Sequential(
            ConvolutionalSet(384, 128)
        )

        self.predict_three = nn.Sequential(
            ConvolutionalLayer(128, 256, 3, 1, 1),
            nn.Conv2d(256, output_channel, 1, 1, 0)
        )


    def forward(self, x):
        feature_52 = self.trunk52(x)
        feature_26 = self.trunk26(feature_52)
        feature_13 = self.trunk13(feature_26)

        con_set_13_out = self.con_set13(feature_13)
        detection_13_out = self.predict_one(con_set_13_out)

        up_26_out = self.up_to_26(con_set_13_out)
        route_26_out = torch.cat((up_26_out, feature_26), dim=1)
        con_set_26_out = self.con_set26(route_26_out)
        detection_26_out = self.predict_two(con_set_26_out)

        up_52_out = self.up_to_52(con_set_26_out)
        route_52_out = torch.cat((up_52_out, feature_52), dim=1)
        con_set_52_out = self.con_set52(route_52_out)
        detection_52_out = self.predict_three(con_set_52_out)

        return detection_13_out, detection_26_out, detection_52_out

至此,网络构建完成,但是官方给的权重并不是pytorch可以直接加载的pt文件,需要用一个load_weights函数转换成pytorch可读的pt文件:

    def load_weights(self, weightfile):
        # Open the weights file
        fp = open(weightfile, "rb")

        # The first 5 values are header information
        # 1. Major version number
        # 2. Minor Version Number
        # 3. Subversion number
        # 4,5. Images seen by the network (during training)

        weights = np.fromfile(fp, dtype=np.float32)  # 加载 np.ndarray 中的剩余权重,权重是以float32类型存储的
        weights = weights[5:]

        model_list = []
        for model in self.modules():
            if isinstance(model, nn.BatchNorm2d):
                model_list.append(model)
            if isinstance(model, nn.Conv2d):
                model_list.append(model)

        ptr = 0
        is_continue = False
        for i in range(0, len(model_list)):
            if is_continue:
                is_continue = False
                continue

            conv = model_list[i]
            # print(i // 2, conv)

            if i < len(model_list) - 1 and isinstance(model_list[i + 1], nn.BatchNorm2d):
                is_continue = True

                bn = model_list[i + 1]
                # print(bn)
                num_bn_biases = bn.bias.numel()
                # print(num_bn_biases, weights[ptr:ptr + 4 * num_bn_biases])

                bn_biases = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
                ptr += num_bn_biases

                bn_weights = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
                ptr += num_bn_biases

                bn_running_mean = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
                ptr += num_bn_biases

                bn_running_var = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
                ptr += num_bn_biases

                bn_biases = bn_biases.view_as(bn.bias.data)
                bn_weights = bn_weights.view_as(bn.weight.data)
                bn_running_mean = bn_running_mean.view_as(bn.running_mean)
                bn_running_var = bn_running_var.view_as(bn.running_var)

                bn.bias.data.copy_(bn_biases)
                bn.weight.data.copy_(bn_weights)
                bn.running_mean.copy_(bn_running_mean)
                bn.running_var.copy_(bn_running_var)

                # print(bn.bias)
                # print(bn.weight)
                # print(bn.running_mean)
                # print(bn.running_var)
            else:
                is_continue = False

                num_biases = conv.bias.numel()
                # print(weights[ptr:ptr + num_biases])

                conv_biases = torch.from_numpy(weights[ptr: ptr + num_biases])
                ptr = ptr + num_biases

                conv_biases = conv_biases.view_as(conv.bias.data)

                conv.bias.data.copy_(conv_biases)

            num_weights = conv.weight.numel()
            # print(num_weights, weights[ptr:ptr + num_weights])

            conv_weights = torch.from_numpy(weights[ptr:ptr + num_weights])
            ptr = ptr + num_weights

            conv_weights = conv_weights.view_as(conv.weight.data)
            conv.weight.data.copy_(conv_weights)

        fp.close()

二、对网络输出进行侦测

YOLOv3输出格式:
在这里插入图片描述
在YOLOv3中,在feature上的每个点输出格式为:3 x((tx,ty,tw,th)+obj_score+class_score),每个点上预设了3个anchors,其中(tx,ty)用来确定中心点的坐标,(tw,th)用来回归目标框的宽高,obj_score是物体的置信度,而class_score则包含了目标的分类信息。

知道了输出格式之后,就可以往下继续编写侦测的代码,这里命名为detector.py,完整代码如下:

import anchors_cfg
from darknet import Darknet
import torchvision
from utils import *
import cv2

class Detector(torch.nn.Module):

    def __init__(self,save_net):
        super(Detector, self).__init__()
        self.net = Darknet()

        # self.net.load_state_dict(torch.load(save_net))
        self.net.load_weights(save_net)

        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        # self.net.to(self.device)
        self.net.eval()          #开始测试

    def forward(self, input, thresh, anchors):        #将图片、置信度阈值、建议框输入
        input_ = input

        output_13, output_26, output_52 = self.net(input_)        #将图片传入网络中得到三个特征图输出

        # output_13 = output_13.cpu()
        # output_26 = output_26.cpu()
        # output_52 = output_52.cpu()

        idxs_13, vecs_13 = self._filter(output_13, thresh)     #得到置信度大于阈值的索引和输出
        boxes_13 = self._parse(idxs_13, vecs_13, 32, anchors[13])

        idxs_26, vecs_26 = self._filter(output_26, thresh)
        boxes_26 = self._parse(idxs_26, vecs_26, 16, anchors[26])

        idxs_52, vecs_52 = self._filter(output_52, thresh)
        boxes_52 = self._parse(idxs_52, vecs_52, 8, anchors[52])

        box = torch.cat([boxes_13, boxes_26, boxes_52], dim=0)
        box = nms(box)

        return box

    def _filter(self, output, thresh):

        output = output.permute(0, 2, 3, 1)            #将[N,45,13,13]转化为[N,13,13,3,15]
        output = output.reshape(output.size(0), output.size(1), output.size(2), 3, -1)

        mask = torch.sigmoid(output[..., 4]) > thresh      #得到置信度大于阈值的掩码

        idxs = mask.nonzero()               #根据掩码得到索引
        vecs = output[mask]                 #置信度大于阈值的总输出

        return idxs, vecs

    def _parse(self, idxs, vecs, t, anchors):    #传入索引、大于阈值的总输出、缩放比例、建议框
        anchors = torch.Tensor(anchors)          #将建议框转化为张量

        n = idxs[:, 0]  # 所属的图片   [N,13,13,3,15]
        a = idxs[:, 3]  # 建议框    [N,13,13,3,15]

        cy = (idxs[:, 1].float() + torch.sigmoid(vecs[:, 1])) * t  # 索引+中心点输出乘以缩放比例得到原图的中心点y
        cx = (idxs[:, 2].float() + torch.sigmoid(vecs[:, 0])) * t  #索引 + 中心点输出乘以缩放比例得到原图的中心点x
        w = anchors[a, 0] * torch.exp(vecs[:, 2])   #三个建议框对应的实际框的w
        h = anchors[a, 1] * torch.exp(vecs[:, 3])   #三个建议框对应的实际框的h

        cls = torch.sigmoid(vecs[:,4])


        if len(vecs[:,5:85]) > 0:
            _,pred = torch.max(vecs[:,5:85],dim=1)              #得到分类情况
            box = torch.stack([n.float(), cx, cy, w, h,pred.float(),cls], dim=1)
        else:
            box = torch.stack([n.float(), cx, cy, w, h, h,cls], dim=1)

        return box


if __name__ == '__main__':

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    transforms = torchvision.transforms.Compose([  # 归一化,Tensor处理
        torchvision.transforms.ToTensor()
    ])
    detector = Detector(r"model\yolov3.weights")
    for i in range(1,10):    #遍历文件夹中的图片
        image = Image.open(r"test_image\timg{0}.jpg".format(i))
        image_c =image.copy()                             #将原图复制
        W, H, scale, img = narrow_image(image)            #传入缩放函数中,得到W,H,缩放比例,缩放后的416*416的图片
        img_data = transforms(img).unsqueeze(0)           #转换成张量扩充一个维度
        box = detector(img_data, 0.25, cfg.ANCHORS_GROUP)          #将缩放后的416*416图像传入探索函数,得到目标框
        box = enlarge_box(W, H, scale, box)              #将目标框按照缩放比例反算回原图
        image_out = draw(box,image_c)               #将框和复制后的原图传入画图函数
        image_out.show()


接下来我对代码的部分进行解释,这里用到了一个自己编写的utils类(代码我会在下文贴出来),里面包含了几个要用到的功能:

  • NMS
  • IOU
  • 将原图缩放成416*416
  • 预测框反算到原图
  • 将预测框在原图上画出

首先实例化网络,加载权重,这里用PIL读取一张图片传入网络中,得到三个不同尺度的输出结果:

output_13, output_26, output_52 = self.net(input_)

分别将三个结果传入_filter()函数中,用于将置信度大于阈值的索引及数据取出:

    def _filter(self, output, thresh):
        output = output.permute(0, 2, 3, 1)            
        output = output.reshape(output.size(0), output.size(1), output.size(2), 3, -1)
        mask = torch.sigmoid(output[..., 4]) > thresh      #得到置信度大于阈值的掩码
        idxs = mask.nonzero()               #根据掩码得到索引
        vecs = output[mask]                 #置信度大于阈值的数据
        
        return idxs, vecs

得到置信度较高的索引后,接下来将索引对应位置的信息取出,进行预测框的反算,传入_parse()函数中:

    def _parse(self, idxs, vecs, t, anchors):    
        anchors = torch.Tensor(anchors)          

        n = idxs[:, 0]  # 所属的图片,批量导入图片时用到,这里可以不用管
        a = idxs[:, 3]  # 建议框    [N,13,13,3,15]

        cy = (idxs[:, 1].float() + torch.sigmoid(vecs[:, 1])) * t  # 索引+中心点输出乘以缩放比例得到原图的中心点y
        cx = (idxs[:, 2].float() + torch.sigmoid(vecs[:, 0])) * t  #索引 + 中心点输出乘以缩放比例得到原图的中心点x
        w = anchors[a, 0] * torch.exp(vecs[:, 2])   #三个建议框对应的实际框的w
        h = anchors[a, 1] * torch.exp(vecs[:, 3])   #三个建议框对应的实际框的h

        cls = torch.sigmoid(vecs[:,4])

        if len(vecs[:,5:85]) > 0:
            _,pred = torch.max(vecs[:,5:85],dim=1)              #得到分类情况
            box = torch.stack([n.float(), cx, cy, w, h,pred.float(),cls], dim=1)
        else:
            box = torch.stack([n.float(), cx, cy, w, h, h,cls], dim=1)

        return box

这样一来就得到了所有置信度大于阈值的预测框,然后将这批框传入到NMS中,值得一提的是,这里的NMS是根据类别来做的。

def nms(box,thresh = 0.2):
    if box.shape[0] == 0:            #如果传入框个数为0,直接返回空
        return torch.Tensor([])
    _boxes = box[box[:, 6].argsort(descending=True)]        #对所有框按照进行排序
    r_boxes = []     #保存剩下的框
    while _boxes.shape[0] >1:
        a_box = _boxes[0]  # 取出第一个元素
        b_box = _boxes[1:]  # 取出后面的所有元素

        r_boxes.append(a_box.unsqueeze(0))  # 把置信度最高的加入列表

        index_ = np.where(iou(a_box[1:5], b_box[:,1:5]) < thresh)        #同类计算iou
        _boxes  = b_box[index_]          #得到剩下的框

    if _boxes.shape[0] >0:           #框数只有1个直接加入
        r_boxes.append(_boxes[0].unsqueeze(0))

    r_boxes = torch.cat(r_boxes, dim=0)    #最后把所有框cat起来

    return r_boxes

在得到所有的预测框后,需要注意的是,这里返回的预测框是针对于416*416的图片而言,我们传入的是一张任意大小的图片(在传入网络之前做了缩放操作),所以还要反算回原图上,这里用了utils中的enlarge_box函数实现。

最后得到完整的预测框,剩下的可视化工作交给了utils中的draw函数来实现,这里要用到“coco.names”这个文件,里面包含了coco数据集80的类别的名称。

好了,所有工作都已经做完了,来看下效果吧:

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
整体来说和官方的预测是差不多的,速度没有做测试。

注意 :因为我是用笔记本写的程序,显卡性能跟不上,用CPU跑的模型,如果需要用GPU的话,代码稍微修改一下就可以了。

三、完整代码

完整工程,可以说是非常简洁了:
在这里插入图片描述
完整代码和权重文件我放到百度云盘,需要的自己自取:

github:YOLOv3-Detector-pytorch

链接:https://pan.baidu.com/s/1twjsVjjSzr9_v4YMWmls3A
提取码:azaa

  • 8
    点赞
  • 25
    收藏
    觉得还不错? 一键收藏
  • 6
    评论
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值