YOLOv3的损失函数_yolov3损失函数-CSDN博客

本文链接：https://blog.csdn.net/qq_32425195/article/details/102834927

损失函数的计算：

读取图像和边界框，类别

box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])#[[312  57 401 325  14]
 [241  64 330 334  14]
 [211  63 257 335  14]
 [167  89 240 347  14]
 [ 86  73 191 326  14]]

box为：

[[245 256 342 333 8]#8是chair
[384 143 440 291 14]]#14是person

利用等比缩放和填充将图片缩放到416*416

scale = min(w/iw, h/ih)
nw = int(iw*scale)
nh = int(ih*scale)
dx = (w-nw)//2
dy = (h-nh)//2

根据缩放和填充来更新box:

box_data = np.zeros((max_boxes,5))#max_boxes是指一张图片最多有多少个bouding box
if len(box)>0:
    np.random.shuffle(box)
    if len(box)>max_boxes: box = box[:max_boxes]
    box[:, [0,2]] = box[:, [0,2]]*scale + dx #0,2是缩放x的坐标
    box[:, [1,3]] = box[:, [1,3]]*scale + dy#1,3是缩放y的坐标
    box_data[:len(box)] = box

缩放后的box为

[[319 187 366 311 14]
[203 281 284 346 8]]

继续处理：

y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)

其中true_boxes是Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape.

获得中心点和宽高的绝对坐标

boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2
boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2]

得到bboxes_xy[0]

[210,299] [231,284]

bboxes_wh[0]

[302,144] [208,67]

然后，将中心点和宽高除以图片尺寸后的归一化，类别不变，已知图片尺寸是416，得到

true_boxes[0] 为[210/416,299/416,302/416,144/416,17] [231/416,284/416,208/416,67/416，14]

对于输入一张图片，我们要在3个尺度上做预测(3个尺度分别,52*52，26*26，13*13)，每个尺度上用3个anchors做预测(B=3),每个anchor在每个cell需要预测5+class个结果（5是指前景置信度confidence,坐标的偏移，每个类别的概率）。所以，一张输入图片共（13*13*3*25+26*26*3*25+52*52*3*25）.因为预测得到这么多的结果，所以我们需要根据我们在每幅图上标注的边界框得到相同大小的tensor来计算loss.

根据所有的anchor box（anchor box是我们一开始在网络中指定的，所有的图片的anchor box都是相同的，共9个anchor box）和该张图片的标注框，计算标注框与所有的anchor box 的IOU,并得到IOU最大的anchor box的索引（由于每个feature map有3个anchor box,所以索引0-2对应52*52的feature map,3-5对应26*26的feature map,6-8对应13*13的feature map ）

#根据anchor box的坐标和标注框的坐标，计算overlap的长宽和面积，进一步计算IOU。这里是通过矩阵操作，计算所有的标注框与所有的anchor box之间的IOU,得到IOU最大的anchor box的索引（本例中，该图共2个标注框，所以得到最匹配的anchor box的索引分别为7,4）
intersect_mins = np.maximum(box_mins, anchor_mins)
intersect_maxes = np.minimum(box_maxes, anchor_maxes)
intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)#get the width and heght of intersect area
intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
box_area = wh[..., 0] * wh[..., 1]
anchor_area = anchors[..., 0] * anchors[..., 1]
iou = intersect_area / (box_area + anchor_area - intersect_area)

# Find best anchor for each true box
best_anchor = np.argmax(iou, axis=-1)

根据最匹配的anchor box的索引，可以得知该anchor box在哪个特征图上；根据标注框的中心点坐标可以得知该中心点位于该特征图的哪一个cell中。

i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32')#b=0 means the first image,t=0 means the first bounding box。true_boxes[b,t,0]为210/416（中心点的x坐标），grid_shapes[l][1]为特征图的size(此处为13*13),。因此，可知中心点x坐标在13*13的特征图上的索引为6.
j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32')#同理可知中心点y坐标在13*13的特征图上的索引为9.

k = anchor_mask[l].index(n)#k表示索引为7的anchor box在13*13的特征图上排第几。13*13特征图上的anchor box的索引为（6，7，8）。索引6排在第0位，7排在第一位。

根据l,i,j,k设置一张图像的对应的feature map(l)的对应的cell（i,j）的对应的anchor box（k）的值：中心点和坐标为210/416,..; confidence为1，对应的类别概率设置为1

y_true[l][b, j, i, k, 0:4] = true_boxes[b,t, 0:4]
y_true[l][b, j, i, k, 4] = 1
y_true[l][b, j, i, k, 5+c] = 1

其他的不满足条件的（目标中心点落在该cell中，且该cell的该anchor与标注框的IOU最大），label的值为0