yolov4从理解到代码实现
这篇文章中,我将从细节解释yolo原理以及其代码实现。
1.什么是YOLO?
YOLO(you only look once)是一种使用卷积神经网络进行目标检测的算法。
2.YOLOv4网络结构
YOLOv4由两部分组成:
- BackBone:CSPDarknet53
- Neck:SPP+PAN
细节图如下(当输入图像为618x618):
细节图如下(当输入图像为416x416):
各层网络结构的计算细节可通过代码及:CNN中卷积层的计算细节
2.1 CSPDarknet53网络
各模块解释:
- DarknetCon2D_BN_Mish: DarknetConv2D+BatchNormalization+Mish(激活函数):
DarknetConv2D:具体参数设置如下:
@wraps(Conv2D)
def DarknetConv2D(*args, **kwargs):
darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)}
darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides')==(2,2) else 'same'
darknet_conv_kwargs.update(kwargs)
return Conv2D(*args, **darknet_conv_kwargs)
Mish是yolov4的激活函数,公式及图像如下:
M
i
s
h
=
x
×
t
a
n
h
(
l
n
(
1
+
e
x
)
Mish = x\times tanh(ln(1+e^x)
Mish=x×tanh(ln(1+ex)
代码如下:
class Mish(Layer):
# python3 中的call函数
# https://blog.csdn.net/weixin_44207181/article/details/90648473
def __init__(self, **kwargs):
super(Mish, self).__init__(**kwargs)
self.supports_masking = True
def call(self, inputs):
return input * K.tanh(K.softplus(inputs))
def get_config(self):
config = super(Mish, self).get_config()
return config
def compute_output_shape(self, input_shape):
return input_shape
将三个小模块组成DarknetCon2D_BN_Mish:
def DarknetConv2D_BN_Mish(*args, **kwargs):
no_bias_kwargs = {'use_bias': False}
no_bias_kwargs.update(kwargs)
return compose(
DarknetConv2D(*args, **no_bias_kwargs),
BatchNormalization(),
Mish())
- resrblock_body:残差边。
代码:
def resblock_body(x, num_filters, num_blocks, all_narrow=True):
# 进行长宽的压缩
preconv1 = ZeroPadding2D(((1, 0), (1, 0)))(x)
preconv1 = DarknetConv2D_BN_Mish(num_filters, (3, 3), strides=(2, 2))(preconv1)
# 生成一个大的残差边
shortconv = DarknetConv2D_BN_Mish(num_filters // 2 if all_narrow else num_filters, (1, 1))(preconv1)
# 主干部分的卷积
mainconv = DarknetConv2D_BN_Mish(num_filters // 2 if all_narrow else num_filters, (1, 1))(preconv1)
# 1x1卷积对通道数进行整合->3x3卷积提取特征,使用残差网络
for i in range(num_blocks):
y = compose(
DarknetConv2D_BN_Mish(num_filters // 2, (1, 1)),
DarknetConv2D_BN_Mish(num_filters // 2 if all_narrow else num_filters, (3, 3))
)(mainconv)
mainconv = Add()([mainconv,y])
#1x1卷积后和残差边堆叠
postconv = DarknetConv2D_BN_Mish(num_filters//2 if all_narrow else num_filters,(1,1))(mainconv)
route = Concatenate()([postconv,shortconv])
#最后对通道数进行整合
return DarknetConv2D_BN_Mish(num_filters,(1,1))(route)
2.2 Neck:SPP+PAN
目标检测模型的Neck部分主要用来融合不同尺寸特征图的特征信息。常见的有MaskRCNN中使用的FPN等,这里我们用EfficientDet论文中的一张图来进行说明。
可见,随着人们追求检测器在COCO数据集上的MAP指标,Neck部分也是出了很多花里胡哨的结构。
再贴一下SPP与PANnet的网络细节图。
代码:
from functools import wraps
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Conv2D, Add, ZeroPadding2D, UpSampling2D, Concatenate, MaxPooling2D, LeakyReLU, \
BatchNormalization
from tensorflow.keras.models import Model
from tensorflow.keras.regularizers import l2
from nets.CSPdarknet53 import darknet_body
from utils.utils import compose
# ----------------------------------------------------#
# 单次卷积
# @wraps(Con2D)保证Conv2D的__name__和__doc__属性不受改变
# ----------------------------------------------------#
@wraps(Conv2D)
def DarknetConv2D(*args, **kwargs):
darkne_conv_kwargs = {'kernel_regularizer':l2(5e-4)}
darkne_conv_kwargs['padding']= 'valid' if kwargs.get('strides') ==(2,2) else 'same'
darkne_conv_kwargs.update(kwargs)
return Conv2D(*args,**darkne_conv_kwargs)
# ------------------------------------------------------#
# 卷积块
# DarknetConv2D + BatchNormalization + LeakyReLU
# ------------------------------------------------------#
def DarknetConv2D_BN_Leaky(*args,**kwargs):
no_bias_kwargs = {'use_bias':False}
no_bias_kwargs.update(kwargs)
return compose(
DarknetConv2D(*args,**no_bias_kwargs),
BatchNormalization(),
LeakyReLU(alpha=0.1)
)
# ---------------------------------------------------#
# 五次卷积
# ---------------------------------------------------#
def make_five_convs(x,num_filters):
x = DarknetConv2D_BN_Leaky(num_filters,(1,1))(x)
x = DarknetConv2D_BN_Leaky(num_filters*2, (3,3))(x)
x = DarknetConv2D_BN_Leaky(num_filters,(1,1))(x)
x = DarknetConv2D_BN_Leaky(num_filters*2, (3,3))(x)
x = DarknetConv2D_BN_Leaky(num_filters, (1,1))(x)
return x
#----------------------------------------------------------#
# 特征层->最后的输出
# CSPdarknet53+SPP+PANnet
#-----------------------------------------------------------#
def yolo_body(inputs,num_anchors,num_classes):
#将图像输入CSPdarknet53网络得到三个不同的特征层
feat1,feat2, feat3 = darknet_body(inputs)
# 构造第一个YOLOHead
# 若输入图像为614x614,feat3=(batch_size,19,19,1024)
# 若输入图像为416x416,feat3=(batch_size,13,13,1024)
P5 = DarknetConv2D_BN_Leaky(512,(1,1))(feat3)
P5 = DarknetConv2D_BN_Leaky(1024,(3,3))(P5)
P5 = DarknetConv2D_BN_Leaky(512,(1,1))(P5)
# 使用了SPP结构,即不同尺度的最大池化后堆叠
maxpool1 = MaxPooling2D(pool_size=(13,13),strides=(1,1),padding='same')(P5)
maxpool2 = MaxPooling2D(pool_size=(9,9),strides=(1,1),padding='same')(P5)
maxpool3 = MaxPooling2D(pool_size=(5,5),strides=(1,1),padding='same')(P5)
P5 = Concatenate()[maxpool1,maxpool2,maxpool3,P5]
P5 = DarknetConv2D_BN_Leaky(512,(1,1))(P5)
P5 = DarknetConv2D_BN_Leaky(1024,(3,3))(P5)
P5 = DarknetConv2D_BN_Leaky(512,(1,1))(P5)
P5_upsample = compose(DarknetConv2D_BN_Leaky(256,(1,1)),UpSampling2D(2))(P5)
P4 = DarknetConv2D_BN_Leaky(256,(1,1))(feat2)
P4 = Concatenate()([P4,P5_upsample])
P4 = make_five_convs(P4,256)
P4_upsample = compose(DarknetConv2D_BN_Leaky(128,(1,1)),UpSampling2D(2))(P4)
P3 = DarknetConv2D_BN_Leaky(128,(1,1))(feat1)
P3 = Concatenate(P3,P4_upsample)
P3 = make_five_convs(P3,128)
# 输入图像为618x618时 (76,76,num_anchor*(num_classes+5)
P3_output = DarknetConv2D_BN_Leaky(256, (3, 3))(P3)
P3_output = DarknetConv2D(num_anchors*(num_classes+5),(1,1))(P3_output)
P3_downsaple = ZeroPadding2D(((1,0),(1,0)))(P3)
P3_downsample = DarknetConv2D_BN_Leaky(256, (3, 3), strides=(2, 2))(P3_downsample)
P4 = Concatenate()([P3_downsaple,P4])
P4 = make_five_convs(P4,256)
# (38,38,num_anchor*(num_classes+5)
P4_output = DarknetConv2D_BN_Leaky(512, (3, 3))(P4)
P4_output = DarknetConv2D(num_anchors * (num_classes + 5), (1, 1))(P4_output)
P4_downsample = ZeroPadding2D(((1, 0), (1, 0)))(P4)
P4_downsample = DarknetConv2D_BN_Leaky(512, (3, 3), strides=(2, 2))(P4_downsample)
P5 = Concatenate()([P4_downsample, P5])
P5 = make_five_convs(P5, 512)
# (19,19,num_anchor*(num_classes+5)
P5_output = DarknetConv2D_BN_Leaky(1024, (3, 3))(P5)
P5_output = DarknetConv2D(num_anchors * (num_classes + 5), (1, 1))(P5_output)
return Model(inputs, [P5_output, P4_output, P3_output])