图解 YoloV2

最新推荐文章于 2024-08-02 21:26:57 发布

坏习惯的叛逆者

最新推荐文章于 2024-08-02 21:26:57 发布

阅读量6k

点赞数 4

分类专栏：物体检测物体检测文章标签： yolo 深度学习 ai 人工智能物体检测

本文链接：https://blog.csdn.net/wenxueliu/article/details/80871163

版权

物体检测同时被 2 个专栏收录

7 篇文章 1 订阅

订阅专栏

物体检测

5 篇文章 0 订阅

订阅专栏

整个训练分两部分

对 DarkNet19 的预训练
基于 DarkNet19 进行物体检测训练

原图尺寸：input_shape

标签：原始标签 box 为 (class, xmin, ymin, xmax, ymax) 每个值都是用原图片归一化( [0,1])

yolov2 分类

Darknet19 训练

预处理

论文中提到 random crops, rotation, hue, saturation, exposure shift 等，但实际代码是最常用的预处理方式，四个角剪切 + 中心，翻转，再四个角 + 中心

训练

用 ImageNet 数据集经过 DarkNet19 + avgpool + softmax 之后，得到 1000 分类。

训练过程

在 224x224x3 以 0.1 的学习速率，weight decay 0.0005 momentum 0.9 训练 160 epoch；
448x448x3 上以 learning rate 10-3 次方继续训练 10 epoch

yolov2 检测

物体检测训练

图片尺寸：input_shape = [height, width]

预处理

图片预处理

resize 为 416x416 并除以 255 进行像素级别归一化

标签预处理

boxes 转换为 (x,y, w,h, class) 格式，x, y, w, h 除以图片 input_shape 进行归一化
(x, y, w, h) 乘以 output_shape
将 box 坐标 resize 到 output_shape 得到 box_output，此时, box_output 的 x, y 中心正好落在 output 的某一个 grid cell
box_output 中的每个元素 box 与所有 anchor 计算 IOU，找到 IoU 最大值。
根据 IOU 最大值，记录 detectors_mask[i, j, anchor_index] = 1，其中 i = floor(box_output[1]), j = floor(box_output[0])，anchor_index 为最大 IOU 对应的 anchor 索引
根据 IOU 最大值，记录 true_box 为 (box_output[1] - j, box_ouput[0] - i, log(box_output[2:4] / anchor), box_class)，anchor_index 为最大 IOU 对应的 anchor 索引

由于每个 grid cell 对应 B 个 anchor，因此，上面的所有 anchor 指 B 个 anchor

detectors_mask 记录了每一个 grid cell 对应的 box 与哪个 anchor 的 IOU 最大。

true_box 记录了每个 box 相对于 grid cell 的偏移。

基础网络

经过 DarkNet19 之后，得到 output，维度为 output_shape ([batch_size, height, width, num_anchors, 5 + num_classes])，其中最后一维依次为 [x,y,w,h, confidence, class_one_hot]

编码

grid : [1, output_shape[1], output_shape[2], 1, 2] 其中 2 为 output_shape 的索引

对于 grid，比如 output_shape 为的 height 为 4, width 为 3

grid 为 (0, 0), (0, 1), (0,2) (1,0) … (3, 2)

anchor : [1, 1, 1, num_anchor, 2]

pre_yx = (grid + sigmoid(output[…, :2]) )/ output_shape[1:3]
pre_hw = exp(output[…, 2:4]) * anchor/ output_shape[1:3]
pred_box = [pre_yx, pred_hw]
pre_box_confidence = sigmoid(output_confidence)
pre_box_class = sofmax(output_class_one_hot)

此时 pre_* 也记录了相对于每个 grid cell 的偏移

Loss

将 pre_* 和 true_box 都转为 [ymin, xmin, ymax, xmax] 的格式
计算 pre_* 与 true_box 的 IoU，IoU 大于 0.6，用 object_detections 记录该 grid 包含对象

no_object_scale = 1
object_scale = 5
no_object_loss = no_object_scale * (1 - object_detections) * (1 - detectors_mask)  * (-pre_confidence)^2
objects_loss = (object_scale * detectors_mask * square(1 - pred_confidence)
confidence_loss = no_object_loss + objects_loss
classification_loss = class_scale * detectors_mask * square(matching_classes - pred_class_prob)
classification_loss = one_hot(matching_classes[..., 4], num_classes)
matching_boxes = matching_true_boxes[..., 0:4]
coordinates_loss = coordinates_scale * detectors_mask * square(matching_boxes - pred_boxes)
total_loss = 0.5 * (sum(confidence_loss) + sum(classification_loss) + sum(coordinates_loss))

训练参数

160 epoch

learning rate 10-3 60(10-4) 90(10-5)

weight decay 0.0005 momentum 0.9

data augment : 与 Yolo 相同

当看到 detection 标签的时候，反向传播包括 detection 和 classification 的 loss，看到 classification 标签的时候，只计算反向传播 classification 的 loss

验证

yolov2 验证

预处理

参考物体检测训练部分

基础网络

经过 DarkNet19 之后，得到 box, box_confidence, box_class_probs，记录了预测 box 的坐标，box 分数，box所属分类。维度依次为 ([batch_size, height, width, num_anchors, 4]，[batch_size, height, width, num_anchors, 1]，[batch_size, height, width, num_anchors, num_classes])))