本篇文章为阅读
https://github.com/YunYang1994/TensorFlow2.0-Examples/tree/master/4-Object_Detection/YOLOV3
源代码后整理的笔记。
部分变量我还是理解的不到位(如下面打问号的地方),希望有大神能多多指教!
配置参数准备
train_input_size(图片大小):416
strides(跨度):取值 [8, 16, 32]
类别:80个类别
batch_size:4
train_output_sizes(?):
计算方法:train_input_size // self.strides
取值:52/26/13
max_bbox_per_scale(每个尺度的bbox数量最大值):150
anchor_per_scale(每个尺度的锚框数量):3
anchors(锚点):共3x3=9组,每组2个数值
[ [1.25,1.625, 2.0,3.75, 4.125,2.875,],
[1.875,3.8125, 3.875,2.8125, 3.6875,7.4375,],
[3.625,2.8125, 4.875,6.1875, 11.65625,10.1875] ]
输入数据准备
每批次的图片数据(batch_image):
形状:(4, 416, 416, 3)
(batch_size, train_input_size, train_input_size, 3)
输出标签准备:
batch_label_sbbox(小检测框的标签):
形状:(4, 52, 52, 3, 85)
(batch_size, train_output_sizes[0], train_output_sizes[0], anchor_per_scale, 5 + num_classes)
batch_label_mbbox(中检测框的标签):
形状:(4, 26, 26, 3, 85)
(batch_size, train_output_sizes[1], train_output_sizes[1], anchor_per_scale, 5 + num_classes)
batch_label_lbbox(大检测框的标签):
形状:(4, 13, 13, 3, 85)
(batch_size, train_output_sizes[2], train_output_sizes[2], anchor_per_scale, 5 + num_classes)
batch_sbboxes/batch_mbboxes/batch_lbboxes
形状:(4, 150, 4)
(batch_size, max_bbox_per_scale, 4)
每张图片的标签:
形状:[(52, 52, 3, 85), (26, 26, 3, 85), (13, 13, 3, 85),]
label = [
(train_output_sizes[0], train_output_sizes[0], anchor_per_scale, 5 + self.num_classes) ,
(train_output_sizes[1], train_output_sizes[1], anchor_per_scale, 5 + self.num_classes) ,
(train_output_sizes[2], train_output_sizes[2], anchor_per_scale, 5 + self.num_classes) ,
]
bboxes_xywh(按顺序存储每个尺度的bbox坐标)
形状:[3, 150, 4]
[尺度数, max_bbox_per_scale, boxes_xywh]
每个bbox标签:
举例:bbox = [33, 294, 55, 316, 6]
类别标签使用smooth_onehot处理
bbox_xywh(将bbox转换成xywh(其中xy为中心点坐标)):
数值:44, 305, 22, 22
bbox_xywh_scaled(将bbox_xywh除以跨度):
公式:bbox_xywh // strides
数值:[[5.5, 38.125, 2.75, 2.75], [2.75, 19.0625, 1.375, 1.375], [1.375, 9.53125, 0.6875, 0.6875]]
锚框的标签:
anchors_xywh:
形状:(3, 4)
(anchor_per_scale, 4)
数值:
[[ 5.5 38.5 1.25 1.625], [ 5.5 38.5 2. 3.75 ], [ 5.5 38.5 4.125 2.875]]
计算方法:xy值为bbox_xywh_scaled的xy,wh为anchors每组的数值
iou_scale(计算锚框的iou):
伪代码:
iou_scale = bbox_iou(bbox_xywh_scaled , anchors_xywh)
数值:[0.26859504, 0.5751634, 0.52702703]
iou_mask = iou_scale > 0.3
数值:[False True True]
如果其中一个iou_mask为True(如果都为False,则拿iou_scale最大的作为锚框):
label = [i][yind, xind, iou_mask, 数据]:
i:缩放的尺度
yind, xind:中心点落在哪个框中
iou_mask:为True才能写入后面的数据
数据:共85维=[bbox_xywh, 1, smooth_onehot]
代码:
xind, yind = np.floor(bbox_xywh_scaled[i, 0:2]).astype(np.int32)
label[i][yind, xind, iou_mask, :] = 0
label[i][yind, xind, iou_mask, 0:4] = bbox_xywh
label[i][yind, xind, iou_mask, 4:5] = 1.0
label[i][yind, xind, iou_mask, 5:] = smooth_onehot
label_Xbbox为具体每个检测框的标签:
label_sbbox, label_mbbox, label_lbbox = label
形状:
label_sbbox为例:(52, 52, 3, 85)
其中52, 52 代表每个检测框
3代表每个检测框有3个锚框,当符合该锚框时,后面的85维才会有数值,否则为0
Xbboxes则仅将所有bbox按照顺序存放:
sbboxes, mbboxes, lbboxes = bboxes_xywh
形状:
sbboxes为例:(150, 4)
最终返回的标签:
(batch_smaller_target, batch_medium_target, batch_larger_target)
形状:(((4, 52, 52, 3, 85), (4, 150, 4)), ((4, 26, 26, 3, 85), (4, 150, 4)), ((4, 13, 13, 3, 85), (4, 150, 4)))
其中:
batch_Xmaller_target = batch_label_Xbbox, batch_Xbboxes
形状:((4, 52, 52, 3, 85), (4, 150, 4))
batch_label_Xbbox = [num, *label_Xbbox](num代表批次)
形状:(4, 52, 52, 3, 85)
模型输出数据:
形状:[
(4, 52, 52, 255),
(4, 52, 52, 3, 85),
(4, 26, 26, 255),
(4, 26, 26, 3, 85),
(4, 13, 13, 255),
(4, 13, 13, 3, 85),
]
输出值每个尺度分为2组,一组为conv(?),一组为pred(?)
损失函数的计算:
conv_raw_conf = conv[:, :, :, :, 4:5] 原始置性度
conv_raw_prob = conv[:, :, :, :, 5:] 原始分类概率
pred_xywh = pred[:, :, :, :, 0:4] 预测框xywh
pred_conf = pred[:, :, :, :, 4:5] 预测置信度
label_xywh = label[:, :, :, :, 0:4] 真实框xywh
respond_bbox = label[:, :, :, :, 4:5] 真实置信度(判断网格内有无物体)
label_prob = label[:, :, :, :, 5:] 真值分类概率
giou的损失函数:
计算giou
giou = bbox_giou(pred_xywh, label_xywh)
计算giou的权重
bbox_loss_scale = 2 - 真实的w*h / 图片面积
bbox_loss_scale = 2.0 - 1.0 * label_xywh[:, :, :, :, 2:3] * label_xywh[:, :, :, :, 3:4] / (input_size ** 2)
最终得出giou_loss:
giou_loss = respond_bbox * bbox_loss_scale * (1- giou)
求均方值
giou_loss = tf.reduce_mean(tf.reduce_sum(giou_loss, axis=[1,2,3,4]))