从零开始实现YOLO V3——day3损失函数与模型训练

褪色的博客

已于 2023-04-02 22:51:38 修改

阅读量671

点赞数

文章标签： YOLO 深度学习计算机视觉

于 2023-04-01 16:57:30 首次发布

本文链接：https://blog.csdn.net/qq_37055672/article/details/129898155

版权

本文详细介绍了YOLO V3的损失函数，包括位置、大小的MSE损失和置信度、类别的交叉熵损失。通过IoU阈值确定正负样本，并解释了非极大值抑制在消除重叠预测框中的作用。同时，文章还讨论了模型训练和预测过程，包括数据格式、训练代码和预测的可视化展示。

摘要由CSDN通过智能技术生成

在Day2的内容中提到目标检测算法实际上需要完成3个机器学习的子任务，这三个任务分别是1）判断候选框内是否包含物体的2分类任务；2）识别候选框中图像类别的图片分类任务；3）预测框位置和大小的回归任务。所以一般目标检测任务都需要对这3个任务分布计算损失。

YOLO V3的损失函数

对掌握YOLO V3的论文里没有明确提所用的损失函数，确切地说，yolo系列论文里面只有yolo v1明确提了损失函数的公式。

对于每个预测框，Yolov3的损失函数主要包括3个部分：

对于正例，bounding box和ground true之间的位置坐标（x，y）和大小（w，h）的差异，采用MSE损失函数『对应损失函数图中的1，2两行』
对于正例和负例，计算置信度和真实之间的交叉熵『对应损失函数图中的3，4两行』
对于正例，计算80个类别维度与target的one-hot向量间的交叉熵损失『对应损失函数图中的第5行』

图1 YOLO算法的损失函数

对于计算预测框的位置和大小的损失（损失1），和计算图片分类的损失（损失3），这两个损失都比较容易计算，因为我们在Day2中已经知道了如何生成预测框，目标对应的图像类别也是事先知道的，对于之部分的损失可以直接计算。
但是现在有个问题，对于候选框中是否存在物体要如何实现呢？候选框的大小和数量都不是固定的，属于正例还是负例样本也是预先不可知的。只有先解决这个问题，才能计算其损失。
在这里插入图片描述
记得这张图吗？在Day2中我们了解了如何去标注信息。我们需要计算出所有候选框跟真实框之间的IoU，然后把那些IoU大于阈值的候选框挑选出来。将IoU阈值最大的候选框被标记为正例（1），其余样本标记为负例（0）。这是最直接的方法，但是存在一些问题，一些IoU很大但不是最大的候选框直接被判负例似乎不合理，你可以说说我不是最合适的，但不能说我不是。所以通常的做法是将哪些IoU大于阈值，但是又不是最大值的框置为-1，即表示不参与损失计算，小于阈值的标记为负例。

例如将阈值设置为0.7，此时有一个候选框于真实框的IoU为0.9，还有三个IoU为0.8,0.5,0.1的候选框。毫无疑问0.9的框被标记为正例（即包含物体），但是0.8的就不包含物体了吗？显然它也包含了大部分物体，只是他不是最合适的一个，此时如果将其设置为负例，会影响损失的计算。所以应该将0.8标记为不参与损失计算，其余小于阈值的标记为负例。

挑选出跟真实框IoU大于阈值的预测框

# 挑选出跟真实框IoU大于阈值的预测框
def get_iou_above_thresh_inds(pred_box, gt_boxes, iou_threshold):
    batchsize = pred_box.shape[0]
    num_rows = pred_box.shape[1]
    num_cols = pred_box.shape[2]
    num_anchors = pred_box.shape[3]
    ret_inds = np.zeros([batchsize, num_rows, num_cols, num_anchors])
    for i in range(batchsize):
        pred_box_i = pred_box[i]
        gt_boxes_i = gt_boxes[i]
        for k in range(len(gt_boxes_i)): #gt in gt_boxes_i:
            gt = gt_boxes_i[k]
            gtx_min = gt[0] - gt[2] / 2.
            gty_min = gt[1] - gt[3] / 2.
            gtx_max = gt[0] + gt[2] / 2.
            gty_max = gt[1] + gt[3] / 2.
            if (gtx_max - gtx_min < 1e-3) or (gty_max - gty_min < 1e-3):
                continue
            x1 = np.maximum(pred_box_i[:, :, :, 0], gtx_min)
            y1 = np.maximum(pred_box_i[:, :, :, 1], gty_min)
            x2 = np.minimum(pred_box_i[:, :, :, 2], gtx_max)
            y2 = np.minimum(pred_box_i[:, :, :, 3], gty_max)
            intersection = np.maximum(x2 - x1, 0.) * np.maximum(y2 - y1, 0.)
            s1 = (gty_max - gty_min) * (gtx_max - gtx_min)
            s2 = (pred_box_i[:, :, :, 2] - pred_box_i[:, :, :, 0]) * (pred_box_i[:, :, :, 3] - pred_box_i[:, :, :, 1])
            union = s2 + s1 - intersection
            iou = intersection / union
            above_inds = np.where(iou > iou_threshold)
            ret_inds[i][above_inds] = 1
    ret_inds = np.transpose(ret_inds, (0,3,1,2))
    return ret_inds.astype('bool')

上面的函数可以得到哪些锚框的objectness需要被标注为-1，通过下面的程序，对label_objectness进行处理，将IoU大于阈值，但又不是正样本的锚框标注为-1。

def label_objectness_ignore(label_objectness, iou_above_thresh_indices):
    # 注意：这里不能简单的使用 label_objectness[iou_above_thresh_indices] = -1，
    #         这样可能会造成label_objectness为1的点被设置为-1了
    #         只有将那些被标注为0，且与真实框IoU超过阈值的预测框才被标注为-1
    negative_indices = (label_objectness < 0.5)
    ignore_indices = negative_indices * iou_above_thresh_indices
    label_objectness[ignore_indices] = -1
    return label_objectness

通过调用这两个函数，即可实现将IoU最大的标记为正样本，将IoU小于阈值的标记为负样本，将IoU大于阈值但又不是最大的部分预测框的label_objectness设置为-1。

损失函数的定义

# 读取数据
def get_loss(output, label_objectness, label_location, label_classification, scales, num_anchors=3, num_classes=7):
    # 将output从[N, C, H, W]变形为[N, NUM_ANCHORS, NUM_CLASSES + 5, H, W]
    reshaped_output = paddle.reshape(output, [-1, num_anchors, num_classes + 5, output.shape[2], output.shape[3]])
    # 从output中取出跟objectness相关的预测值
    pred_objectness = reshaped_output[:, :, 4, :, :]
    loss_objectness = F.binary_cross_entropy_with_logits(pred_objectness, label_objectness, reduction="none")
    # pos_samples 只有在正样本的地方取值为1.，其它地方取值全为0.
    pos_objectness = label_objectness > 0
    pos_samples = paddle.cast(pos_objectness, 'float32')
    pos_samples.stop_gradient=True
    # 从output中取出所有跟位置相关的预测值
    tx = reshaped_output[:, :, 0, :, :]
    ty = reshaped_output[:, :, 1, :, :]
    tw = reshaped_output[:, :, 2, :, :]
    th = reshaped_output[:, :, 3, :, :]
    # 从label_location中取出各个位置坐标的标签
    dx_label = label_location[:, :, 0, :, :]
    dy_label = label_location[:, :, 1, :, :]
    tw_label = label_location[:, :, 2, :, :]
    th_label = label_location[:, :, 3, :, :]
    # 构建预测框位置与大小的损失函数
    loss_location_x = F.binary_cross_entropy_with_logits(tx, dx_label, reduction="none")
    loss_location_y = F.binary_cross_entropy_with_logits(ty, dy_label, reduction="none")
    loss_location_w = paddle.abs(tw - tw_label)
    loss_location_h = paddle.abs(th - th_label)
    # 计算总的位置损失函数
    loss_location = loss_location_x + loss_location_y + loss_location_h + loss_location_w
    # 乘以scales
    loss_location = loss_location * scales
    # 只计算正样本的位置损失函数
    loss_location = loss_location * pos_samples

最低0.47元/天解锁文章

褪色的博客

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
从零开始实现YOLO V3——day3损失函数与模型训练

在Day2的内容中提到目标检测算法实际上需要完成3个机器学习的子任务，这三个任务分别是1）判断候选框内是否包含物体的2分类任务；2）识别候选框中图像类别的图片分类任务；3）预测框位置和大小的回归任务。
复制链接

扫一扫