tensorflow2实现yolov3过程

本篇文章为阅读
https://github.com/YunYang1994/TensorFlow2.0-Examples/tree/master/4-Object_Detection/YOLOV3
源代码后整理的笔记。
部分变量我还是理解的不到位(如下面打问号的地方),希望有大神能多多指教!

配置参数准备

train_input_size(图片大小):416
strides(跨度):取值 [8, 16, 32]
类别:80个类别
batch_size:4
train_output_sizes(?):
计算方法:train_input_size // self.strides
取值:52/26/13
max_bbox_per_scale(每个尺度的bbox数量最大值):150
anchor_per_scale(每个尺度的锚框数量):3
anchors(锚点):共3x3=9组,每组2个数值
[ [1.25,1.625, 2.0,3.75, 4.125,2.875,],
[1.875,3.8125, 3.875,2.8125, 3.6875,7.4375,],
[3.625,2.8125, 4.875,6.1875, 11.65625,10.1875] ]

输入数据准备

每批次的图片数据(batch_image):
形状:(4, 416, 416, 3)
(batch_size, train_input_size, train_input_size, 3)

输出标签准备:
batch_label_sbbox(小检测框的标签):

形状:(4, 52, 52, 3, 85)
(batch_size, train_output_sizes[0], train_output_sizes[0], anchor_per_scale, 5 + num_classes)

batch_label_mbbox(中检测框的标签):

形状:(4, 26, 26, 3, 85)
(batch_size, train_output_sizes[1], train_output_sizes[1], anchor_per_scale, 5 + num_classes)

batch_label_lbbox(大检测框的标签):

形状:(4, 13, 13, 3, 85)
(batch_size, train_output_sizes[2], train_output_sizes[2], anchor_per_scale, 5 + num_classes)

batch_sbboxes/batch_mbboxes/batch_lbboxes

形状:(4, 150, 4)
(batch_size, max_bbox_per_scale, 4)

每张图片的标签:

形状:[(52, 52, 3, 85), (26, 26, 3, 85), (13, 13, 3, 85),]
label = [
(train_output_sizes[0], train_output_sizes[0], anchor_per_scale, 5 + self.num_classes) ,
(train_output_sizes[1], train_output_sizes[1], anchor_per_scale, 5 + self.num_classes) ,
(train_output_sizes[2], train_output_sizes[2], anchor_per_scale, 5 + self.num_classes) ,
]

bboxes_xywh(按顺序存储每个尺度的bbox坐标)

形状:[3, 150, 4]
[尺度数, max_bbox_per_scale, boxes_xywh]

每个bbox标签:

举例:bbox = [33, 294, 55, 316, 6]

类别标签使用smooth_onehot处理

bbox_xywh(将bbox转换成xywh(其中xy为中心点坐标)):

数值:44, 305, 22, 22

bbox_xywh_scaled(将bbox_xywh除以跨度):

公式:bbox_xywh // strides
数值:[[5.5, 38.125, 2.75, 2.75], [2.75, 19.0625, 1.375, 1.375], [1.375, 9.53125, 0.6875, 0.6875]]

锚框的标签:

anchors_xywh:
形状:(3, 4)
(anchor_per_scale, 4)
数值:
[[ 5.5 38.5 1.25 1.625], [ 5.5 38.5 2. 3.75 ], [ 5.5 38.5 4.125 2.875]]
计算方法:xy值为bbox_xywh_scaled的xy,wh为anchors每组的数值

iou_scale(计算锚框的iou):
伪代码:
iou_scale = bbox_iou(bbox_xywh_scaled , anchors_xywh)
数值:[0.26859504, 0.5751634, 0.52702703]

iou_mask = iou_scale > 0.3
数值:[False True True]

如果其中一个iou_mask为True(如果都为False,则拿iou_scale最大的作为锚框):
label = [i][yind, xind, iou_mask, 数据]:
i:缩放的尺度
yind, xind:中心点落在哪个框中
iou_mask:为True才能写入后面的数据
数据:共85维=[bbox_xywh, 1, smooth_onehot]

代码:
xind, yind = np.floor(bbox_xywh_scaled[i, 0:2]).astype(np.int32)
label[i][yind, xind, iou_mask, :] = 0
label[i][yind, xind, iou_mask, 0:4] = bbox_xywh
label[i][yind, xind, iou_mask, 4:5] = 1.0
label[i][yind, xind, iou_mask, 5:] = smooth_onehot

label_Xbbox为具体每个检测框的标签:
label_sbbox, label_mbbox, label_lbbox = label
形状:
label_sbbox为例:(52, 52, 3, 85)
其中52, 52 代表每个检测框
3代表每个检测框有3个锚框,当符合该锚框时,后面的85维才会有数值,否则为0
Xbboxes则仅将所有bbox按照顺序存放:
sbboxes, mbboxes, lbboxes = bboxes_xywh
形状:
sbboxes为例:(150, 4)

最终返回的标签:

(batch_smaller_target, batch_medium_target, batch_larger_target)
形状:(((4, 52, 52, 3, 85), (4, 150, 4)), ((4, 26, 26, 3, 85), (4, 150, 4)), ((4, 13, 13, 3, 85), (4, 150, 4)))
其中:
batch_Xmaller_target = batch_label_Xbbox, batch_Xbboxes
形状:((4, 52, 52, 3, 85), (4, 150, 4))
batch_label_Xbbox = [num, *label_Xbbox](num代表批次)
形状:(4, 52, 52, 3, 85)

模型输出数据:

形状:[
(4, 52, 52, 255),
(4, 52, 52, 3, 85),
(4, 26, 26, 255),
(4, 26, 26, 3, 85),
(4, 13, 13, 255),
(4, 13, 13, 3, 85),
]
输出值每个尺度分为2组,一组为conv(?),一组为pred(?)

损失函数的计算:

conv_raw_conf = conv[:, :, :, :, 4:5] 原始置性度
conv_raw_prob = conv[:, :, :, :, 5:] 原始分类概率

pred_xywh     = pred[:, :, :, :, 0:4] 预测框xywh
pred_conf     = pred[:, :, :, :, 4:5] 预测置信度

label_xywh    = label[:, :, :, :, 0:4] 真实框xywh
respond_bbox  = label[:, :, :, :, 4:5] 真实置信度(判断网格内有无物体)

label_prob = label[:, :, :, :, 5:] 真值分类概率

giou的损失函数:

计算giou
giou = bbox_giou(pred_xywh, label_xywh)
计算giou的权重
bbox_loss_scale = 2 - 真实的w*h / 图片面积
bbox_loss_scale = 2.0 - 1.0 * label_xywh[:, :, :, :, 2:3] * label_xywh[:, :, :, :, 3:4] / (input_size ** 2)
最终得出giou_loss:
giou_loss = respond_bbox * bbox_loss_scale * (1- giou)
求均方值
giou_loss = tf.reduce_mean(tf.reduce_sum(giou_loss, axis=[1,2,3,4]))

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值