YOLO V2 -- 学习笔记

最新推荐文章于 2024-04-11 22:35:51 发布

DIAJEY

最新推荐文章于 2024-04-11 22:35:51 发布

阅读量218

点赞数

分类专栏： YOLO 文章标签：计算机视觉

本文链接：https://blog.csdn.net/DIAJEY/article/details/115534092

版权

YOLO 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

YOLO V2(YOLO 9000)
参考视频教程：目标检测基础——YOLO系列模型（理论和代码复现）-- PULSE_

代码解读部分参考文章：https://zhuanlan.zhihu.com/p/89794762

改进点（对比V1）

1.每个卷积层后增加BN（batch normalization–主要是在全连接层后将参数限定在范围内，保留更多有效参数信息），mAP增加了2%，并去除了Dropout

2.使用高分辨率的数据集在预训练的CNN上做微调,mAP提升了4%（图像中信息更多了）

3.不对（x,y,w,h）进行预测，而是预测于Anchor框的偏差(offset)，每个格点指定n个Anchor框，在训练时，最接近ground truth的框产生loss，其他不产生loss(直接找到最接近的)

4.Anchor Box的宽高不是人为设定，而是将训练数据集中的矩形框的宽高，用kmeans聚类得到先眼眶的宽和高，例如使用5个Anchor Box。那么kmeans聚类的类别中心个数设置为5（大概是哪个宽高值范围出现的多，就用它设定为宽高）

5.为了不损失细粒度特征，在passthrough层将26x26x1的特征图，变成13*x13x4的特征图
在这里插入图片描述

6.因为没有FC层，YOLO V2可以接受多尺度的特征，进行多尺度训练

	·每10个batch后，将图片resize成(320,352,...,608)的一种(都是32的倍数，后面解释)
	·如果Anchor Box设置为5
	·当输入图片尺寸为320*320，那么输出个点为10 * 10，共输出500个预测结果
	·当输入图片尺寸为608*608，输出格点为19*19，共1805个预测结果
	·因为是多尺度训练，尺寸大的图通常用来检测小物体，因此格点比较多，尺度小的图则用来检测比较大的物体，因此格点相对较少

在这里插入图片描述
V2的多尺度比较类似于上图的结构，该金字塔结构比较简单，大特征图检测小物体，小特征图检测大物体（但是可能会有信息丢失的情况，且不同尺度的特征没有很好地融合）

YOLO V2训练

yolo v2又称yolo 9000，分类能力数倍于v1，因此训练方式也有所不同。

原理：在检测集中，有类别人，当网络有框出人的能力后，再通过分类数据及，细分框内目标的类别，如男人、女人、小孩等

训练方法：先训练一定的epoch的检测集，待预测框的loss基本稳定后，再联合分类集，检测集交替训练
·输入为检测集时，标注信息有类别，位置，那么对整个loss函数计算loss，进行反向传播（训练检测+分类能力）
·输入为分类集时，loss函数只计算分类loss，其余部分loss为零（只训练分类能力）

树结构

目的：解决类别之间可能的包含(人与男人)，相交(运动员与男人)关系
在这里插入图片描述
大类分类时，只对父节点做loss运算
小类分类时，只对子节点做loss运算（先粗分类大类，再细分成小类）

改进结果

更少地参数(更高效率)，更高地精度，但对于小物体检测效果仍然欠佳

代码解读

先总结V2对于V1的改进，再根据这些改进解读对应代码

·使用了Batch Normlization。
·训练上的不同，卷积基（darknet-19）的训练先用224*224图片训练，再使用448*448的图片训练。
·使用anchor，但是和FasterRCNN不太一样,后面我们会进行详细的对比。
·Anchor的提取，使用聚类算法找到可能最适合物体大小的5个宽高组合（5个效果就挺好的）
·passthrough层检测细粒度特征，网络层的一些改进

网络结构

def build_networks(self, inputs):
    net = self.conv_layer(inputs, [3, 3, 3, 32], name = '0_conv')
    net = self.pooling_layer(net, name = '1_pool')

    net = self.conv_layer(net, [3, 3, 32, 64], name = '2_conv')
    net = self.pooling_layer(net, name = '3_pool')

    net = self.conv_layer(net, [3, 3, 64, 128], name = '4_conv')
    net = self.conv_layer(net, [1, 1, 128, 64], name = '5_conv')
    net = self.conv_layer(net, [3, 3, 64, 128], name = '6_conv')
    net = self.pooling_layer(net, name = '7_pool')

    net = self.conv_layer(net, [3, 3, 128, 256], name = '8_conv')
    net = self.conv_layer(net, [1, 1, 256, 128], name = '9_conv')
    net = self.conv_layer(net, [3, 3, 128, 256], name = '10_conv')
    net = self.pooling_layer(net, name = '11_pool')

    net = self.conv_layer(net, [3, 3, 256, 512], name = '12_conv')
    net = self.conv_layer(net, [1, 1, 512, 256], name = '13_conv')
    net = self.conv_layer(net, [3, 3, 256, 512], name = '14_conv')
    net = self.conv_layer(net, [1, 1, 512, 256], name = '15_conv')
    net16 = self.conv_layer(net, [3, 3, 256, 512], name = '16_conv')
    net = self.pooling_layer(net16, name = '17_pool')

    net = self.conv_layer(net, [3, 3, 512, 1024], name = '18_conv')
    net = self.conv_layer(net, [1, 1, 1024, 512], name = '19_conv')
    net = self.conv_layer(net, [3, 3, 512, 1024], name = '20_conv')
    net = self.conv_layer(net, [1, 1, 1024, 512], name = '21_conv')
    net = self.conv_layer(net, [3, 3, 512, 1024], name = '22_conv')

    net = self.conv_layer(net, [3, 3, 1024, 1024], name = '23_conv')
    net24 = self.conv_layer(net, [3, 3, 1024, 1024], name = '24_conv') #这里输出为（h/32，w/32,1024)

    net = self.conv_layer(net16, [1, 1, 512, 64], name = '26_conv')
    net = self.reorg(net) #输出shape为（h/32,w/32,64*4)

    net = tf.concat([net, net24], 3) #拼接，shape为（bz，h/32，w/32，1024+256）

    net = self.conv_layer(net, [3, 3, int(net.get_shape()[3]), 1024], name = '29_conv') #转（bz，h/32，w/32，1024）
    net = self.conv_layer(net, [1, 1, 1024, self.box_per_cell * (self.num_class + 5)], batch_norm=False, name = '30_conv') #转（bz，h/32，w/32，5*(num_classes+5）)

    return net

Batch Normalization

def conv_layer(self, inputs, shape, batch_norm = True, name = '0_conv'):
    weight = tf.Variable(tf.truncated_normal(shape, stddev=0.1), name='weight')
    biases = tf.Variable(tf.constant(0.1, shape=[shape[3]]), name='biases')

    conv = tf.nn.conv2d(inputs, weight, strides=[1, 1, 1, 1], padding='SAME', name=name)

    if batch_norm:#Batch Normalization
        depth = shape[3]
        scale = tf.Variable(tf.ones([depth, ], dtype='float32'), name='scale')
        shift = tf.Variable(tf.zeros([depth, ], dtype='float32'), name='shift')
        mean = tf.Variable(tf.ones([depth, ], dtype='float32'), name='rolling_mean')
        variance = tf.Variable(tf.ones([depth, ], dtype='float32'), name='rolling_variance')

        conv_bn = tf.nn.batch_normalization(conv, mean, variance, shift, scale, 1e-05)
        conv = tf.add(conv_bn, biases)
        conv = tf.maximum(self.alpha * conv, conv)
    else:
        conv = tf.add(conv, biases)

    return conv

Loss function

Loss函数的值有两个，一个是开始设定好的label，另一个是模型输出的结果predict。通过从predict中提取位置以及分类信息，再和label进行公式计算，从而计算出loss

#从输出的结果中提取信息
#划分为(batch,13,13,class+5)的格式
predict = tf.reshape(predict, [self.batch_size, self.cell_size, self.cell_size, self.box_per_cell, self.num_class + 5]) 
#获取结果框的x,y,w,h四个信息
box_coordinate = tf.reshape(predict[:, :, :, :, :4], [self.batch_size, self.cell_size, self.cell_size, self.box_per_cell, 4])
#获取结果框的置信度
box_confidence = tf.reshape(predict[:, :, :, :, 4], [self.batch_size, self.cell_size, self.cell_size, self.box_per_cell, 1])
#获取分类信息
box_classes = tf.reshape(predict[:, :, :, :, 5:], [self.batch_size, self.cell_size, self.cell_size, self.box_per_cell, self.num_class])
#获取基于分割格子的完整位置信息
#x,y在13*13特征图中的相对位置和w，h的相对长度，并归一化到(0,1),即表示坐标占全图的比例
 #bx = sigmoid(tx)+cx -- sigmoid--限制范围0到1，tx--输出值([:,:,:,:,0])，cx -- 相对格子的个数(offset)
boxes1 = tf.stack([(1.0 / (1.0 + tf.exp(-1.0 * box_coordinate[:, :, :, :, 0])) +self.offset) / self.cell_size,
 #by = sigmoid(ty)+cy
                   (1.0 / (1.0 + tf.exp(-1.0 * box_coordinate[:, :, :, :, 1])) + tf.transpose(self.offset, (0, 2, 1, 3))) / self.cell_size,
 #bw = pw·e^tw      --  tw--输出的宽，pw--anchor的宽,最后归一化到0-1
                  tf.sqrt(tf.exp(box_coordinate[:, :, :, :, 2]) * np.reshape(self.anchor[:5], [1, 1, 1, 5]) / self.cell_size),
# bh = pw·e^th
                   tf.sqrt(tf.exp(box_coordinate[:, :, :, :, 3]) * np.reshape(self.anchor[5:], [1, 1, 1, 5]) / self.cell_size)])
     
box_coor_trans = tf.transpose(boxes1, (1, 2, 3, 4, 0))
box_confidence = 1.0 / (1.0 + tf.exp(-1.0 * box_confidence))
box_classes = tf.nn.softmax(box_classes)

计算预测框和真实框之间的IOU,并找到最大IOU框的索引

iou = self.calc_iou(box_coor_trans, boxes)#计算IOU
best_box = tf.to_float(tf.equal(iou, tf.reduce_max(iou, axis=-1, keep_dims=True)))#找出IOU最大的，设置为最佳
confs = tf.expand_dims(best_box * response, axis = 4)#confs--格子中最大IOU索引的集合
#expand_dims可以理解为将数据插入到集合中

#构建计算损失值的参数conid，cooid和proid
conid = self.noobject_scale * (1.0 - confs) + self.object_scale * confs	#置信度
cooid = self.coordinate_scale * confs	#位置参数
proid = self.class_scale * confs	#类别参数
#计算loss
coo_loss = cooid * tf.square(box_coor_trans - boxes)
con_loss = conid * tf.square(box_confidence - confs)
pro_loss = proid * tf.square(box_classes - classes)
#合成总损失
loss = tf.concat([coo_loss, con_loss, pro_loss], axis = 4)
loss = tf.reduce_mean(tf.reduce_sum(loss, axis = [1, 2, 3, 4]), name = 'loss')

DIAJEY

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
YOLO V2 -- 学习笔记

YOLO V2(YOLO 9000)改进点（对比V1）· 每个卷积层后增加BN（batch normalization–主要是在全连接层后将参数限定在范围内，保留更多有效参数信息），mAP增加了2%，并去除了Dropout· 使用高分辨率的数据集在预训练的CNN上做微调,mAP提升了4%（图像中信息更多了）· 不对（x,y,w,h）进行预测，而是预测于Anchor框的偏差(offset)，每个格点指定n个Anchor框，在训练时，最接近ground truth的框产生loss，其他不产生loss(直
复制链接

扫一扫

专栏目录