yolov4项目记录7-Darknet模型构造方式

最新推荐文章于 2024-04-22 14:03:19 发布

Swayzzu

最新推荐文章于 2024-04-22 14:03:19 发布

阅读量1.7k

点赞数 1

分类专栏： CV 文章标签：深度学习计算机视觉 pytorch

本文链接：https://blog.csdn.net/Swayzzu/article/details/122261189

版权

CV 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

一、概述

除了面向对象的方式去构建模型，还可以通过面向过程的方式构建，这种方式就是Darknet，通过直接构造出每一层的输入通道数，步长，卷积核大小等层需要的信息，就可以直接解析出一个网络出来，这里暂时只针对本项目进行模型构造方式的学习理解，不深入探究详细的细节。

二、构造方式

1.配置文件

配置文件的格式如下图所示，我们模型中需要的每一层，都以下面这种方式记录，如果前面是[net]，那么下面的就是模型的参数，如果是[convolutional]，那么就是卷积层，下面跟的就是卷积层的参数，[shortcut]是用来短接的，也就是残差模块里面的短接，[route]是用来配置其他形式的层连接，当只有一个数的时候，是CSPX模块里面的跳接，有两个数的时候，是两个输出的拼接，4个数的时候，是SPP模块。

2.层的解析

对于整体的模型，我们会首先构造一个ModuleList，这样，对于每一个模块，都可以创建出层，并添加到ModuleList里面。

每一个层，我们都需要记录一下当前层的输出的通道数，以及相对于原图尺寸下采样的倍数。这样在后面进行模块拼接的时候，可以直接索引到对应的层的输出通道数以及下采样倍数。

①卷积层

卷积层中会直接添加BN层，以及激活函数。创建完一个Sequential，根据配置中设置好的参数，把卷积层添加进去即可。

elif block['type'] == 'convolutional':
    conv_id = conv_id + 1
    batch_normalize = int(block['batch_normalize'])
    filters = int(block['filters'])# 卷积核数量
    kernel_size = int(block['size'])
    stride = int(block['stride'])
    is_pad = int(block['pad'])
    pad = (kernel_size - 1) // 2 if is_pad else 0
    activation = block['activation']
    model = nn.Sequential()
    if batch_normalize:# 如果有BN层，就在容器中添加卷积和BN，后面再添加激活函数，就是一个模块了
        model.add_module('conv{0}'.format(conv_id),
                         nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias=False))
        model.add_module('bn{0}'.format(conv_id), nn.BatchNorm2d(filters))
        # model.add_module('bn{0}'.format(conv_id), BN2d(filters))
    else:
        model.add_module('conv{0}'.format(conv_id),
                         nn.Conv2d(prev_filters, filters, kernel_size, stride, pad))
    if activation == 'leaky':
        model.add_module('leaky{0}'.format(conv_id), nn.LeakyReLU(0.1, inplace=True))
    elif activation == 'relu':
        model.add_module('relu{0}'.format(conv_id), nn.ReLU(inplace=True))
    elif activation == 'mish':
        model.add_module('mish{0}'.format(conv_id), Mish())
    else:
        print("convalution havn't activate {}".format(activation))

    prev_filters = filters
    #每一层的输出通道数都记录下来
    out_filters.append(prev_filters)
    prev_stride = stride * prev_stride
    #每一层相对于608下采样了多少倍记录下
    out_strides.append(prev_stride)
    models.append(model)

②route层

这里是对于有特殊操作的层的，route中参数的数量只能是1,2,4，分别对应CSP模块的跳接，有分支之后的拼接，以及SPP中的拼接。

中间的参数可能是正数也可能是负数，如果是正数，就是一个绝对位置，如果是负数，就是相对位置，比如当前已经有了7个模块了，到route这里，数字是-3，那么就需要往前寻找到第4个模块。并根据具体route实现的功能，记录当前route层的输出通道数，以及下采样倍数。

这一部分主要是记录当前route的输出通道数，以及下采样的倍数，并在ModuleList里面添加一个空的模块，需要我们在forward里面，把相应的功能实现。

elif block['type'] == 'route':
    layers = block['layers'].split(',')
    ind = len(models)
    layers = [int(i) if int(i) > 0 else int(i) + ind for i in layers]
    # 这里就是修改了输入的通道数，以及下采样倍数，然后添加一个空的模块进去。
    #layers=1  ---->   cspnet跳接, 或者345下采样的跳接
    #layers=2  ---->   concat拼接
    #layers=4  ---->
    if len(layers) == 1:
        if 'groups' not in block.keys() or int(block['groups']) == 1:
            prev_filters = out_filters[layers[0]]
            prev_stride = out_strides[layers[0]]
        else:
            prev_filters = out_filters[layers[0]] // int(block['groups'])
            prev_stride = out_strides[layers[0]] // int(block['groups'])
    elif len(layers) == 2: # Concat
        assert (layers[0] == ind - 1 or layers[1] == ind - 1)
        prev_filters = out_filters[layers[0]] + out_filters[layers[1]] # 因为是拼接，所以当前的输出通道数就变成了二者相加
        prev_stride = out_strides[layers[0]]
    elif len(layers) == 4:
        assert (layers[0] == ind - 1)
        prev_filters = out_filters[layers[0]] + out_filters[layers[1]] + out_filters[layers[2]] + \
                       out_filters[layers[3]]
        prev_stride = out_strides[layers[0]]
    else:
        print("route error!!!")

    out_filters.append(prev_filters)
    out_strides.append(prev_stride)
    models.append(EmptyModule()) # 在前向传播中构造route

③shortcut层

这里是针对残差模块的短接层。只需要记录输出的通道数，以及下采样倍数即可，ModuleList中同样添加一个空模块，在forward中实现具体功能。

elif block['type'] == 'shortcut': # 数字寻找来源的位置
    ind = len(models)
    prev_filters = out_filters[ind - 1]# 上一个输出的通道数
    out_filters.append(prev_filters) # 上一个的输出通道数就是这个输出的通道数，因为这里是残差链接层
    prev_stride = out_strides[ind - 1]
    out_strides.append(prev_stride)# 上一个的下采样倍数，就是这个输出的倍数
    models.append(EmptyModule()) # 需要在forward里面构造这个网络，这里就是记录一下，不做任何事情

④upsample层

模型中有两处进行了上采样，就是在信息融合的时候进行的。同样，这里也是记录当前输出的通道数以及下采样倍数，模型就是重新构建一个类来实现，对数据的最后两维进行一个拉伸操作。

elif block['type'] == 'upsample':
    stride = int(block['stride'])
    out_filters.append(prev_filters)
    prev_stride = prev_stride // stride
    out_strides.append(prev_stride)

    models.append(Upsample_expand(stride))

class Upsample_expand(nn.Module):
    def __init__(self, stride=2):
        super(Upsample_expand, self).__init__()
        self.stride = stride

    def forward(self, x):
        assert (x.data.dim() == 4)
        x = x.view(x.size(0), x.size(1), x.size(2), 1, x.size(3), 1).\
            expand(x.size(0), x.size(1), x.size(2), self.stride, x.size(3), self.stride).contiguous().\
            view(x.size(0), x.size(1), x.size(2) * self.stride, x.size(3) * self.stride)

        return x

3.forward部分

这里就是通过上面解析的模型，把输入的x一步一步最终得到三个输出的过程。这里就会把前面的route以及shortcut去解析起来。这里面并没有计算损失，因为我们在外面实现了损失的计算，因此这里的计算损失的层就直接是空的模块了。当得到结果后，在外面通过计算损失的函数去算。

①流程

forward会对前面添加的所有模块进行遍历。首先构建一个输出词典，用于记录每一个模块的输出的结果。这里会对模块进行遍历，每到一个模块，就会通过self.models[ind]去索引出来对应位置的模块，然后把x输入进去，这样就得到了对应位置的输出。就这样每一步索引，都取出对应的模型，都输入x，都得到对应的输出，都记录到对应位置的词典中。

outputs = dict()
out_boxes = []
for block in self.blocks:
    ind = ind + 1
    # if ind > 0:
    #    return x

    if block['type'] == 'net':
        continue
    elif block['type'] in ['convolutional', 'maxpool', 'reorg', 'upsample', 'avgpool', 'softmax', 'connected']:
        x = self.models[ind](x)
        outputs[ind] = x

②route部分

到这里的时候，前面已经有记录的输出结果了，存放在outputs里面，可以根据索引取到输出。

假设此处索引是7，route的数字是-3,只有一个数，说明是跳接，那么就去寻找索引是4的输出，这个输出，就是此处，索引为7的输出。添加到Outputs里面即可。

假设此处索引是7，route的数字是-3，-5，那么说明是拼接，就去寻找索引是4和2的输出，把它们俩拼接起来，作为此处索引为7的输出即可。

同理，如果是四个数字，就去找到对应的索引输出，并拼接起来即可。

elif block['type'] == 'route':
    layers = block['layers'].split(',')
    layers = [int(i) if int(i) > 0 else int(i) + ind for i in layers]
    if len(layers) == 1:
        if 'groups' not in block.keys() or int(block['groups']) == 1:
            x = outputs[layers[0]]
            outputs[ind] = x
        else:
            groups = int(block['groups'])
            group_id = int(block['group_id'])
            _, b, _, _ = outputs[layers[0]].shape
            x = outputs[layers[0]][:, b // groups * group_id:b // groups * (group_id + 1)]
            outputs[ind] = x
    elif len(layers) == 2:
        x1 = outputs[layers[0]]
        x2 = outputs[layers[1]]
        x = torch.cat((x1, x2), 1)
        outputs[ind] = x
    elif len(layers) == 4:
        x1 = outputs[layers[0]]
        x2 = outputs[layers[1]]
        x3 = outputs[layers[2]]
        x4 = outputs[layers[3]]
        x = torch.cat((x1, x2, x3, x4), 1)
        outputs[ind] = x
    else:
        print("rounte number > 2 ,is {}".format(len(layers)))

③shortcut部分

这里是残差模块的短接操作，比如此处的索引是7，如果短接是来自5号，那就去找到5号的输出，然后把上一个的输出也就是6号的输出，二者element-wise加起来，就是此处索引为7的输出。如果有激活函数就添加一下，然后把结果添加到outputs词典中。

elif block['type'] == 'shortcut':
    from_layer = int(block['from'])
    activation = block['activation']
    from_layer = from_layer if from_layer > 0 else from_layer + ind
    x1 = outputs[from_layer]
    x2 = outputs[ind - 1]
    x = x1 + x2
    if activation == 'leaky':
        x = F.leaky_relu(x, 0.1, inplace=True)
    elif activation == 'relu':
        x = F.relu(x, inplace=True)
    outputs[ind] = x

三、模型构造代码

这里只写出了模型的部分，还有预训练参数的导入方法，以及保存方法没有放上来。

class Darknet(nn.Module):
    def __init__(self, cfgfile, inference=False):
        super(Darknet, self).__init__()
        self.inference = inference
        self.training = not self.inference

        self.blocks = parse_cfg(cfgfile)
        self.width = int(self.blocks[0]['width'])
        self.height = int(self.blocks[0]['height'])

        self.models = self.create_network(self.blocks)  # merge conv, bn,leaky
        #self.loss = self.models[len(self.models) - 1]

        # if self.blocks[(len(self.blocks) - 1)]['type'] == 'region':
        #     self.anchors = self.loss.anchors
        #     self.num_anchors = self.loss.num_anchors
        #     self.anchor_step = self.loss.anchor_step
        #     self.num_classes = self.loss.num_classes

        self.header = torch.IntTensor([0, 0, 0, 0])
        self.seen = 0

    def forward(self, x):
        ind = -2
        self.loss = None
        outputs = dict()
        out_boxes = []
        for block in self.blocks:
            ind = ind + 1
            # if ind > 0:
            #    return x

            if block['type'] == 'net':
                continue
            elif block['type'] in ['convolutional', 'maxpool', 'reorg', 'upsample', 'avgpool', 'softmax', 'connected']:
                x = self.models[ind](x)
                outputs[ind] = x
            elif block['type'] == 'route':
                layers = block['layers'].split(',')
                layers = [int(i) if int(i) > 0 else int(i) + ind for i in layers]
                if len(layers) == 1:
                    if 'groups' not in block.keys() or int(block['groups']) == 1:
                        x = outputs[layers[0]]
                        outputs[ind] = x
                    else:
                        groups = int(block['groups'])
                        group_id = int(block['group_id'])
                        _, b, _, _ = outputs[layers[0]].shape
                        x = outputs[layers[0]][:, b // groups * group_id:b // groups * (group_id + 1)]
                        outputs[ind] = x
                elif len(layers) == 2:
                    x1 = outputs[layers[0]]
                    x2 = outputs[layers[1]]
                    x = torch.cat((x1, x2), 1)
                    outputs[ind] = x
                elif len(layers) == 4:
                    x1 = outputs[layers[0]]
                    x2 = outputs[layers[1]]
                    x3 = outputs[layers[2]]
                    x4 = outputs[layers[3]]
                    x = torch.cat((x1, x2, x3, x4), 1)
                    outputs[ind] = x
                else:
                    print("rounte number > 2 ,is {}".format(len(layers)))

            elif block['type'] == 'shortcut':
                from_layer = int(block['from'])
                activation = block['activation']
                from_layer = from_layer if from_layer > 0 else from_layer + ind
                x1 = outputs[from_layer]
                x2 = outputs[ind - 1]
                x = x1 + x2
                if activation == 'leaky':
                    x = F.leaky_relu(x, 0.1, inplace=True)
                elif activation == 'relu':
                    x = F.relu(x, inplace=True)
                outputs[ind] = x
            elif block['type'] == 'region':
                continue
            elif block['type'] == 'yolo':
                boxes = self.models[ind](x)
                out_boxes.append(boxes)
            elif block['type'] == 'cost':
                continue
            else:
                print('unknown type %s' % (block['type']))

        # if self.training:
        #     return out_boxes
        # else:
        #     return get_region_boxes(out_boxes)
        return out_boxes