YOLOV4网络结构搭建

一、基本的卷积块Conv + BN + Mish

#---------------------------------------------------#
#   卷积块 -> 卷积 + 标准化 + 激活函数
#   Conv2d + BatchNormalization + Mish
#---------------------------------------------------#
class BasicConv(nn.Module): #定义一个基本的卷积块
    def __init__(self, in_channels, out_channels, kernel_size, stride=1):
        super(BasicConv, self).__init__()

        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, kernel_size//2, bias=False)
        self.bn = nn.BatchNorm2d(out_channels)
        self.activation = Mish()

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.activation(x)
        return x

 二、残差块

#---------------------------------------------------#
#   CSPdarknet的结构块的组成部分
#   内部堆叠的残差块
#---------------------------------------------------#
class Resblock(nn.Module):
    def __init__(self, channels, hidden_channels=None):
        super(Resblock, self).__init__()

        '''这里设置hidden_channels就是为了应对DownSample1中的残差块,DownSample1残差块中
        两个卷积的通道数是不一样的,而其他DownSample残差块中卷积的通道数是一样的'''
        if hidden_channels is None:
            hidden_channels = channels

        self.block = nn.Sequential(
            BasicConv(channels, hidden_channels, kernel_size=1),
            BasicConv(hidden_channels, channels, kernel_size=3)
        )

    def forward(self, x):
        return x + self.block(x)

三、CSPDarknet53网络结构

k代表卷积核的大小,s代表步距,c代表通过该模块输出的特征层channels

1、DownSample模块

#--------------------------------------------------------------------#
#   CSPdarknet的结构块(DownSample模块)
#   首先利用ZeroPadding2D和一个步长为2x2的卷积块进行高和宽的压缩
#   然后建立一个大的残差边shortconv、这个大残差边绕过了很多的残差结构
#   主干部分会对num_blocks进行循环,循环内部是残差结构
#   对于整个CSPdarknet的结构块,就是一个大残差块+内部多个小残差块
#   传入参数first是用来判断是否是第一个DownSample模块,第一个DownSample模块与后面几个略有不同
#--------------------------------------------------------------------#
class Resblock_body(nn.Module):
    def __init__(self, in_channels, out_channels, num_blocks, first):
        super(Resblock_body, self).__init__()
        #----------------------------------------------------------------#
        #   利用一个步长为2x2的卷积块进行高和宽的压缩
        #----------------------------------------------------------------#
        self.downsample_conv = BasicConv(in_channels, out_channels, 3, stride=2)

        if first: #first用来判断是否是第一个DownSample模块,第一个DownSample模块与后面几个略有不同
            #   然后建立一个大的残差边self.split_conv0、这个大残差边绕过了很多的残差结构
            #   输出通道等于输入通道
            self.split_conv0 = BasicConv(out_channels, out_channels, 1)

            #   主干部分会对num_blocks进行循环,循环内部是残差结构。
            self.split_conv1 = BasicConv(out_channels, out_channels, 1)  
            self.blocks_conv = nn.Sequential(
                Resblock(channels=out_channels, hidden_channels=out_channels//2), #此Resblock中先对通道数降维再升维
                BasicConv(out_channels, out_channels, 1)
            )

            # concat之后通道数为out_channels*2,在经过convbnmish后通道数变为out_channels
            self.concat_conv = BasicConv(out_channels*2, out_channels, 1)

        else: # 对应DownSample2,DownSample3,DownSample4,DownSample5
            #   建立一个大的残差边self.split_conv0、这个大残差边绕过了很多的残差结构
            #   DownSample2,3,4,5输出通道为输入通道的一半
            self.split_conv0 = BasicConv(out_channels, out_channels//2, 1)

            #   主干部分会对num_blocks进行循环,循环内部是残差结构。
            #----------------------------------------------------------------#
            self.split_conv1 = BasicConv(out_channels, out_channels//2, 1)
            self.blocks_conv = nn.Sequential(
                *[Resblock(out_channels//2) for _ in range(num_blocks)],
                BasicConv(out_channels//2, out_channels//2, 1)
            )
            # concat之后通道数为out_channels,在经过convbnmish后通道数也为out_channels
            self.concat_conv = BasicConv(out_channels, out_channels, 1)

    def forward(self, x): #正向传播过程
        x = self.downsample_conv(x)

        x0 = self.split_conv0(x)
        x1 = self.split_conv1(x)
        x1 = self.blocks_conv(x1)

        #------------------------------------#
        #   将大残差边再堆叠回来
        #------------------------------------#
        x = torch.cat([x1, x0], dim=1)
        #------------------------------------#
        #   最后对通道数进行整合
        #------------------------------------#
        x = self.concat_conv(x)

        return x

2、CSPdarknet53 的主干部分

#---------------------------------------------------#
#   CSPdarknet53 的主体部分
#   输入为一张416x416x3的图片
#   输出为三个有效特征层
#   因为最后定义了model = CSPDarkNet([1, 2, 8, 8, 4]),
#   CSPDarkNet类中传入的参数为layers,也就是说layers对应列表[1, 2, 8, 8, 4]
#   layers[0]=1 layers[1]=2 layers[2]=8 layers[3]=8 layers[4]=4
#---------------------------------------------------#
class CSPDarkNet(nn.Module):
    def __init__(self, layers):
        super(CSPDarkNet, self).__init__()
        self.inplanes = 32
        # 416,416,3 -> 416,416,32
        self.conv1 = BasicConv(3, self.inplanes, kernel_size=3, stride=1) #此处对应输入图片后第一个卷积块
        self.feature_channels = [64, 128, 256, 512, 1024] # 每个DownSample的输出通道数

        self.stages = nn.ModuleList([
            # Resblock_body中传入了四个参数,in_channels, out_channels, num_blocks, first
            # 416,416,32 -> 208,208,64
            Resblock_body(self.inplanes, self.feature_channels[0], layers[0], first=True),
            # 208,208,64 -> 104,104,128
            Resblock_body(self.feature_channels[0], self.feature_channels[1], layers[1], first=False),
            # 104,104,128 -> 52,52,256
            Resblock_body(self.feature_channels[1], self.feature_channels[2], layers[2], first=False),
            # 52,52,256 -> 26,26,512
            Resblock_body(self.feature_channels[2], self.feature_channels[3], layers[3], first=False),
            # 26,26,512 -> 13,13,1024
            Resblock_body(self.feature_channels[3], self.feature_channels[4], layers[4], first=False)
        ])

        self.num_features = 1
        # 进行权值初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()


    def forward(self, x):
        x = self.conv1(x)

        x = self.stages[0](x)
        x = self.stages[1](x)
        out3 = self.stages[2](x)
        out4 = self.stages[3](out3)
        out5 = self.stages[4](out4)

        return out3, out4, out5 #三个预测特征层

def darknet53(pretrained, **kwargs):
    model = CSPDarkNet([1, 2, 8, 8, 4]) #对应每一个DownSample中残差块Resblock的堆叠次数
    if pretrained:
        if isinstance(pretrained, str):
            model.load_state_dict(torch.load(pretrained))
        else:
            raise Exception("darknet request a pretrained path. got [{}]".format(pretrained))
    return model

 四、YOLOv4网络结构的构建

1、构建卷积块卷积块Conv + BN + LeakyReLU

#   卷积块 -> 卷积 + 标准化 + 激活函数
#   Conv2d + BatchNormalization + LeakyReLU
class ConvBNLeaky(nn.Module): #定义一个基本的卷积块
    def __init__(self, filter_in, filter_out, kernel_size, stride=1):
        super(ConvBNLeaky, self).__init__()

        pad = (kernel_size - 1) // 2 if kernel_size else 0
        self.conv = nn.Conv2d(filter_in, filter_out, kernel_size, stride, padding=pad, bias=False)
        self.bn = nn.BatchNorm2d(filter_out)
        self.activation = nn.LeakyReLU(0.1)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.activation(x)
        return x

 2、构建三层卷积模块

 

#   三次卷积块,filters_list是一个列表,可以巧妙地构建每个卷积的输入输出通道
#   ConvSet1的输入通道数为1024,ConvSet2的输入通道数为2048
#   ConvSet1([512,1024],1024) ConvSet2([512,1024],2048)
def make_three_conv(filters_list, in_filters):
    m = nn.Sequential(
        ConvBNLeaky(in_filters, filters_list[0], kernel_size=1),
        ConvBNLeaky(filters_list[0], filters_list[1], kernel_size=3),
        ConvBNLeaky(filters_list[1], filters_list[0], kernel_size=1),
    )
    return m

 3、构建五层卷积模块

#   五次卷积块,filters_list是一个列表,可以巧妙地构建每个卷积的输入输出通道
#   1*1卷积和3*3卷积交替进行的,这有助于减少参数量,并且可以提取出有效的特征
#   ConvSet3的输入通道数为512,ConvSet4的输入通道数为256
#   ConvSet5的输入通道数为512,ConvSet6的输入通道数为1024
#   ConvSet3([256, 512],512) ConvSet4([128, 256],256)
#   ConvSet5([256, 512],512) ConvSet6([512, 1024],1024)
def make_five_conv(filters_list, in_filters):
    m = nn.Sequential(
        ConvBNLeaky(in_filters, filters_list[0], kernel_size=1),
        ConvBNLeaky(filters_list[0], filters_list[1], kernel_size=3),
        ConvBNLeaky(filters_list[1], filters_list[0], kernel_size=1),
        ConvBNLeaky(filters_list[0], filters_list[1], kernel_size=3),
        ConvBNLeaky(filters_list[1], filters_list[0], kernel_size=1),
    )
    return m

4、构建SPP模块

#   SPP结构,利用不同大小的池化核进行池化 5*5 9*9 13*13
#   先构建kernel_size=5, stride=1, padding=2的最大池化层
#   再构建kernel_size=9, stride=1, padding=4的最大池化层
#   再构建kernel_size=13, stride=1, padding=6的最大池化层
#   池化后堆叠
#---------------------------------------------------#
class SpatialPyramidPooling(nn.Module):
    def __init__(self, pool_sizes=[5, 9, 13]):
        super(SpatialPyramidPooling, self).__init__()

        self.maxpools = nn.ModuleList([nn.MaxPool2d(kernel_size=pool_size, stride=1, padding=pool_size//2) for pool_size in pool_sizes])

    def forward(self, x):
        features = [maxpool(x) for maxpool in self.maxpools[::-1]]
        features = torch.cat(features + [x], dim=1) # x指的是未经过最大池化的层

        return features

 5、构建卷积 + 上采样模块(共有两处)

#   卷积 + 上采样
#---------------------------------------------------#
class ConvUpsample(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(ConvUpsample, self).__init__()

        self.upsample = nn.Sequential(
            ConvBNLeaky(in_channels, out_channels, kernel_size=1),
            nn.Upsample(scale_factor=2, mode='nearest')
        )

    def forward(self, x,):
        x = self.upsample(x)
        return x

 6、yolo_head(有三个)

 

 

#   获得yolov4的输出,filters_list是一个列表,可以巧妙地构建每个卷积的输入输出通道
#   yolo_head1,13*13,([1024, len(anchors_mask[2]) * (5 + num_classes)],512)
#   yolo_head2,26*26,([512, len(anchors_mask[1]) * (5 + num_classes)],256)
#   yolo_head3,52*52,([256, len(anchors_mask[0]) * (5 + num_classes)],128)
#---------------------------------------------------#
def yolo_head(filters_list, in_filters):
    m = nn.Sequential(
        ConvBNLeaky(in_filters, filters_list[0], 3),
        nn.Conv2d(filters_list[0], filters_list[1], 1),
    )
    return m

 7、YOLOv4整体结构的构建

#   yolo_body
#---------------------------------------------------#
class YoloBody(nn.Module):
    def __init__(self, anchors_mask, num_classes):
        super(YoloBody, self).__init__()
        #---------------------------------------------------#   
        #   生成CSPdarknet53的主干模型
        #   获得三个有效特征层,他们的shape分别是:
        #   52,52,256
        #   26,26,512
        #   13,13,1024
        #---------------------------------------------------#
        self.backbone = darknet53(None)

        self.conv1      = make_three_conv([512,1024],1024)
        self.SPP        = SpatialPyramidPooling()
        self.conv2      = make_three_conv([512,1024],2048)

        self.upsample1          = ConvUpsample(512,256)

        # 26, 26, 512 -> 26, 26, 256
        self.conv_for_P4        = ConvBNLeaky(512,256,1) #DownSample4的输出再经过ConvBNLeaky
        self.make_five_conv1    = make_five_conv([256, 512],512)

        # 26, 26, 256 -> 52, 52, 128
        self.upsample2          = ConvUpsample(256,128)

        self.conv_for_P3        = ConvBNLeaky(256,128,1) #DownSample3的输出再经过ConvBNLeaky
        self.make_five_conv2    = make_five_conv([128, 256],256)

        # 3*(5+num_classes) = 3*(5+20) = 3*(4+1+20)=75
        self.yolo_head3         = yolo_head([256, len(anchors_mask[0]) * (5 + num_classes)],128)

        self.down_sample1       = ConvBNLeaky(128,256,3,stride=2)
        self.make_five_conv3    = make_five_conv([256, 512],512)

        # 3*(5+num_classes) = 3*(5+20) = 3*(4+1+20)=75
        self.yolo_head2         = yolo_head([512, len(anchors_mask[1]) * (5 + num_classes)],256)

        self.down_sample2       = ConvBNLeaky(256,512,3,stride=2)
        self.make_five_conv4    = make_five_conv([512, 1024],1024)

        # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
        self.yolo_head1         = yolo_head([1024, len(anchors_mask[2]) * (5 + num_classes)],512)


    def forward(self, x):
        #  backbone,x2, x1, x0对应out3, out4, out5三个预测特征层
        x2, x1, x0 = self.backbone(x)

        # 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512 -> 13,13,2048 
        P5 = self.conv1(x0) # x0预测特征层经过ConvSet1
        P5 = self.SPP(P5) # 再经过SPP
        # 13,13,2048 -> 13,13,512 -> 13,13,1024 -> 13,13,512
        P5 = self.conv2(P5) # 再经过ConvSet2

        # 13,13,512 -> 13,13,256 -> 26,26,256
        P5_upsample = self.upsample1(P5) # 再经过ConvUpsample
        # 26,26,512 -> 26,26,256
        P4 = self.conv_for_P4(x1) # x1预测特征层经过ConvBNLeaky
        # 26,26,256 + 26,26,256 -> 26,26,512
        P4 = torch.cat([P4,P5_upsample],axis=1) # 将P4,P5_upsample进行拼接
        # 26,26,512 -> 26,26,256 -> 26,26,512 -> 26,26,256 -> 26,26,512 -> 26,26,256
        P4 = self.make_five_conv1(P4) # 再经过ConvSet3

        # 26,26,256 -> 26,26,128 -> 52,52,128
        P4_upsample = self.upsample2(P4) # 再经过ConvUpsample
        # 52,52,256 -> 52,52,128
        P3 = self.conv_for_P3(x2) # x2预测特征层经过ConvBNLeaky
        # 52,52,128 + 52,52,128 -> 52,52,256
        P3 = torch.cat([P3,P4_upsample],axis=1) # 将P3,P4_upsample进行拼接
        # 52,52,256 -> 52,52,128 -> 52,52,256 -> 52,52,128 -> 52,52,256 -> 52,52,128
        P3 = self.make_five_conv2(P3) # 再经过ConvSet4,得到小目标预测特征层52*52

        # 52,52,128 -> 26,26,256
        P3_downsample = self.down_sample1(P3) # 将P3通过步长为2的ConvBNLeaky进行下采样
        # 26,26,256 + 26,26,256 -> 26,26,512
        P4 = torch.cat([P3_downsample,P4],axis=1) # 将P3_downsample,P4进行拼接
        # 26,26,512 -> 26,26,256 -> 26,26,512 -> 26,26,256 -> 26,26,512 -> 26,26,256
        P4 = self.make_five_conv3(P4) # 再经过ConvSet5,得到中目标预测特征层26*26

        # 26,26,256 -> 13,13,512
        P4_downsample = self.down_sample2(P4) # 将P4通过步长为2的ConvBNLeaky进行下采样
        # 13,13,512 + 13,13,512 -> 13,13,1024
        P5 = torch.cat([P4_downsample,P5],axis=1) # 将P4_downsample,P5进行拼接
        # 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512
        P5 = self.make_five_conv4(P5) # 再经过ConvSet6,得到大目标预测特征层13*13

        #---------------------------------------------------#
        #   第三个特征层
        #   y3=(batch_size,75,52,52)
        #---------------------------------------------------#
        out2 = self.yolo_head3(P3)
        #---------------------------------------------------#
        #   第二个特征层
        #   y2=(batch_size,75,26,26)
        #---------------------------------------------------#
        out1 = self.yolo_head2(P4)
        #---------------------------------------------------#
        #   第一个特征层
        #   y1=(batch_size,75,13,13)
        #---------------------------------------------------#
        out0 = self.yolo_head1(P5)

        return out0, out1, out2

reference

YOLOv4网络详解_太阳花的小绿豆的博客-CSDN博客_yolov4网络结构

Pytorch 搭建自己的YoloV4目标检测平台(Bubbliiiing 深度学习 教程)_哔哩哔哩_bilibili

  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
需要学习Windows系统YOLOv4的同学请前往《Windows版YOLOv4目标检测实战:原理与源码解析》,课程链接 https://edu.csdn.net/course/detail/29865【为什么要学习这门课】 Linux创始人Linus Torvalds有一句名言:Talk is cheap. Show me the code. 冗谈不够,放码过来!  代码阅读是从基础到提高的必由之路。尤其对深度学习,许多框架隐藏了神经网络底层的实现,只能在上层调包使用,对其内部原理很难认识清晰,不利于进一步优化和创新。YOLOv4是最近推出的基于深度学习的端到端实时目标检测方法。YOLOv4的实现darknet是使用C语言开发的轻型开源深度学习框架,依赖少,可移植性好,可以作为很好的代码阅读案例,让我们深入探究其实现原理。【课程内容与收获】 本课程将解析YOLOv4的实现原理和源码,具体内容包括:- YOLOv4目标检测原理- 神经网络及darknet的C语言实现,尤其是反向传播的梯度求解和误差计算- 代码阅读工具及方法- 深度学习计算的利器:BLAS和GEMM- GPU的CUDA编程方法及在darknet的应用- YOLOv4的程序流程- YOLOv4各层及关键技术的源码解析本课程将提供注释后的darknet的源码程序文件。【相关课程】 除本课程《YOLOv4目标检测:原理与源码解析》外,本人推出了有关YOLOv4目标检测的系列课程,包括:《YOLOv4目标检测实战:训练自己的数据集》《YOLOv4-tiny目标检测实战:训练自己的数据集》《YOLOv4目标检测实战:人脸口罩佩戴检测》《YOLOv4目标检测实战:中国交通标志识别》建议先学习一门YOLOv4实战课程,对YOLOv4的使用方法了解以后再学习本课程。【YOLOv4网络模型架构图】 下图由白勇老师绘制  

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值