yolov5 mAP计算代码分析

最新推荐文章于 2024-08-18 22:25:54 发布

namelijink

最新推荐文章于 2024-08-18 22:25:54 发布

阅读量582

点赞数 22

文章标签： YOLO

本文链接：https://blog.csdn.net/namelijink/article/details/140959796

版权

前言

模型训练过程中每一轮都会计算P，R，mAP，mAP@0.5等数值，本篇分析这些数值的计算过程，分析最核心部分。我的感受是计算的过程比想象的复杂。
主要的流程在yolov5/val.py文件的process_batch处理函数中。

 if nl:
    tbox = xywh2xyxy(labels[:, 1:5])  # target boxes
    scale_boxes(im[si].shape[1:], tbox, shape, shapes[si][1])  # native-space labels
    labelsn = torch.cat((labels[:, 0:1], tbox), 1)  # native-space labels
    correct = process_batch(predn, labelsn, iouv, im[si])
    if plots:
        confusion_matrix.process_batch(predn, labelsn)

计算map@0.5的计算基本原理是：

给定一个阈值数组，从0.5-0.95每间隔0.05，生成一共10个数据
获取预测结果和标注，计算预测框和gt框的IOU，判断类别是否一致。
用阈值数组中的每一个阈值去筛选IOU，大于阈值的预测为True, 小于IOU的预测为False
统计结果，输出结果

入参分析

detections：bbox，预测框信息，格式为：x1, y1, x2, y2, conf, class
labels：gt框信息，格式为：class, x1, y1, x2, y2
iouv：阈值数组，内容是：[0.50000, 0.55000, 0.60000, 0.65000, 0.70000, 0.75000, 0.80000, 0.85000, 0.90000, 0.95000]

def process_batch(detections, labels, iouv, img=None):
    """
    Return correct prediction matrix
    Arguments:
        detections (array[N, 6]), x1, y1, x2, y2, conf, class
        labels (array[M, 5]), class, x1, y1, x2, y2
    Returns:
        correct (array[N, 10]), for 10 IoU levels
    """

预测框的可视化：

构造返回结果数组

对于每一个预测框，在不同的阈值下会判断成True 或 False。构造一个所有预测框在所有阈值下的结果二维数组。

    # 预测正确的记录，所有iouv [0.5, 0.55, 0.6 ... 0.95]阈值下iou大于阈值且类型匹配的结果。
    # shape：(300, 10)  300个预测，10个阈值
    correct = np.zeros((detections.shape[0], iouv.shape[0])).astype(bool)

detections.shape[0]： 300
iouv.shape[0]：10

最终的结果可以理解成300个预测框在10个不同阈值下一共会有3000个结果，结果为布尔数组。

计算所有预测框和标注框的IOU结果

将gt框和预测框传入box_iou函数，得到结果。

"""
计算gt框和预测的iou，这个过程会出现广播机制，5个gt框分别和300个预测框计算iou,得到的结果为5,300
(Pdb) pp detections[:, :4].shape
torch.Size([300, 4])
(Pdb) pp labels[:, 1:].shape
torch.Size([5, 4])
(Pdb) pp iou.shape
torch.Size([5, 300])
"""
iou = box_iou(labels[:, 1:], detections[:, :4])

labels[:, 1:]：（5, 4）
detections[:, :4]：（300, 4）

5个标注框和300个预测框匹配，也就是拿每一个标注框都和300个预测框匹配，得到5*300个结果。相当于每一行代表这一个标注框和预测框的iou匹配结果。

tensor([[0.00000, 0.00000, 0.78682, 0.02270, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.02236, 0.00000, 0.00000, 0.00000, 0.00000, 0.43876, 0.70469, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000,
         0.00000, 0.00000, 0.00000, 0.04540, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.46606, 0.00000, 0.00000, 0.00000, 0.00000, 0.13432, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.53255,

匹配标签

上一步计算了预测框和标注框的iou，这一步计算预测类别和标注类别的匹配。

    
    """
    计算标签匹配。labels[:, 0:1]代表所有标注的标签。用每一个gt的标签和所有的预测的标签匹配。
    过程存在广播机制，结果为：torch.Size([5, 300])
    结果中 0维也就是5是gt的label  1维也就是300是预测label
    (Pdb) labels[:, 0:1].shape
    torch.Size([5, 1])
    (Pdb) detections[:, 5].shape
    torch.Size([300])
    correct_class.shape
    torch.Size([5, 300])
    """
    correct_class = labels[:, 0:1] == detections[:, 5]

labels[:, 0:1]：[5, 1]
detections[:, 5]: [300]
和上一步类似，每一个标注类表和预测类别匹配，5个标注类别和300个预测类别匹配，得到一个5*300的结果。得到的是一个布尔数组

(Pdb) correct_class
tensor([[ True,  True,  True,  True,  True,  True,  True, False,  True,  True, False, False,  True, False,  True,  True, False, False,  True, False,  True, False,  True,  True,  True,  True,  True,  True, False, False,  True,  True,  True, False,  True,  True, False,  True,  True, False,  True, False, False,  True,
         False,  True,  True,  True,  True,  True,  True,  True,  True, False, False, False,  True,  True, False,  True,  True, False, False,  True,  True,  True, False,  True,  True,  True,  True, False,  True,  True,  True,  True, False, False,  True,  True,  True,  True,  True,  True,  True,  True,  True, False

遍历所有阈值计算在不同的阈值下预测正确的个数

阈值数组是[0.50000, 0.55000, 0.60000, 0.65000, 0.70000, 0.75000, 0.80000, 0.85000, 0.90000, 0.95000]，计算在不同的阈值下预测框和标注框的iou匹配结果。以0.5为例，大于0.5为True，小于0.5为False

# 循环遍历所有的iou阈值，计算在不同的阈值下预测正确的个数
for i in range(len(iouv)):
    # (iou >= iouv[i]) & correct_class 得到一个布尔数组
    # torch.where((iou >= iouv[i]) & correct_class) 筛选出布尔数组中为True的元素，返回和布尔数组结构一致，返回为True元素在当前张量中的坐标。当前为二维数组。
    # 返回两个元素，元素1是数组，代表为True元素的x轴下标， 元素2也是数组，代表为True元素的y轴下标。
    # 类似：(tensor([0, 0, 1, 1]), tensor([0, 1, 1, 3]))
    x = torch.where((iou >= iouv[i]) & correct_class)  # IoU > threshold and classes match

(iou >= iouv[i]) & correct_class
iou 是 [5, 300]的iou结果
correct_class 是[5, 300]的类别结果
将两者的判断与操作，得到的就是iou大于阈值，且类别正确的结果。结果是5*300的布尔数组

torch.where((iou >= iouv[i]) & correct_class)
筛选出布尔数组中为True的元素，返回Ture在张量中的坐标。
x的结果如下，结果代表的含义是第一个张量代表标注标签的ID，第二个张量代表匹配的对应的预测标签的ID。

(tensor([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5], device='cuda:0'),
 tensor([ 2, 67, 68,  3, 79,  5, 31, 90,  4, 86,  1, 85, 92,  0], device='cuda:0'))

判断是否有正确匹配的结果

# 判断是否有结果
if x[0].shape[0]:

    """
    将结果汇总成[label, detect, iou]的格式。
    label：gt的标签的下标
    detect： 预测结果标签的下标
    iou: 预测和gt的iou值
    
    torch.stack(x, 1) 将x轴和y轴表示的坐标转换成[x, y]的格式
    iou[x[0], x[1]][:, None]) 获取iou中匹配上的元素值
    """
    matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]), 1).cpu().numpy()  # [label, detect, iou]

torch.stack(x, 1)将标注框的ID和预测框的ID组合在一次
iou[x[0], x[1]][:, None]筛选出对应的IOU值
将两者组合在一起，得到的就是[label_id, detect_id, iou]

去除重复匹配

一个gt框可能和多个预测框的iou都大于阈值，这是完全可能的。这里只保留最好的匹配，所以把那些多余的匹配都去掉。一个gt框只保留一个预测框，同时一个预测框只对应一个gt框。

# 匹配存在多个元素，可能是因为广播机制导致的重复，去除重复。
if x[0].shape[0] > 1:
    # 对置信度排序，倒序排列
    matches = matches[matches[:, 2].argsort()[::-1]]

    # 对预测detect进行去重，一个预测框只保留一个iou匹配。返回去重之后的matches
    matches = matches[np.unique(matches[:, 1], return_index=True)[1]]

    # 对label进行去重，一个gt框只保留一个最高iou匹配，返回去重之后的matches
    # matches = matches[matches[:, 2].argsort()[::-1]]
    matches = matches[np.unique(matches[:, 0], return_index=True)[1]]

保存结果

correct是300*10的二维数组，将对应行和列的元素设置为True。

# 将correct中相应的元素置为True
correct[matches[:, 1].astype(int), i] = True

matches[:, 1].astype(int)预测框的下标
i某一个阈值

将某一个阈值下对应的预测框设置成True，也就是代表着在该阈值下，预测框预测的类别正确，同时IOU也超过阈值。

完整代码

def process_batch(detections, labels, iouv, img=None):
    """
    Return correct prediction matrix
    Arguments:
        detections (array[N, 6]), x1, y1, x2, y2, conf, class
        labels (array[M, 5]), class, x1, y1, x2, y2
    Returns:
        correct (array[N, 10]), for 10 IoU levels
    """

    # 预测正确的记录，所有iouv [0.5, 0.55, 0.6 ... 0.95]阈值下iou大于阈值且类型匹配的结果。
    # shape：(300, 10)  300个预测，10个阈值
    correct = np.zeros((detections.shape[0], iouv.shape[0])).astype(bool)

    """
    计算gt框和预测的iou，这个过程会出现广播机制，5个gt框分别和300个预测框计算iou,得到的结果为5,300
    (Pdb) pp detections[:, :4].shape
    torch.Size([300, 4])
    (Pdb) pp labels[:, 1:].shape
    torch.Size([5, 4])
    (Pdb) pp iou.shape
    torch.Size([5, 300])
    """
    iou = box_iou(labels[:, 1:], detections[:, :4])

    breakpoint()

    # 计算标签匹配。labels[:, 0:1]代表所有标注的标签。用每一个gt的标签和所有的预测的标签匹配。
    # 过程存在广播机制，结果为：torch.Size([5, 300])
    # 结果中 0维也就是5是gt的label  1维也就是300是预测label
    """
    (Pdb) labels[:, 0:1].shape
    torch.Size([5, 1])
    (Pdb) detections[:, 5].shape
    torch.Size([300])
    correct_class.shape
    torch.Size([5, 300])
    """
    correct_class = labels[:, 0:1] == detections[:, 5]

    # 循环遍历所有的iou阈值，计算在不同的阈值下预测正确的个数
    for i in range(len(iouv)):
        # (iou >= iouv[i]) & correct_class 得到一个布尔数组
        # torch.where((iou >= iouv[i]) & correct_class) 筛选出布尔数组中为True的元素，返回和布尔数组结构一致，返回为True元素在当前张量中的坐标。当前为二维数组。
        # 返回两个元素，元素1是数组，代表为True元素的x轴下标， 元素2也是数组，代表为True元素的y轴下标。
        # 类似：(tensor([0, 0, 1, 1]), tensor([0, 1, 1, 3]))
        x = torch.where((iou >= iouv[i]) & correct_class)  # IoU > threshold and classes match

        # 判断是否有结果
        if x[0].shape[0]:

            """
            将结果汇总成[label, detect, iou]的格式。
            label：gt的标签的下标
            detect： 预测结果标签的下标
            iou: 预测和gt的iou值
            
            torch.stack(x, 1) 将x轴和y轴表示的坐标转换成[x, y]的格式
            iou[x[0], x[1]][:, None]) 获取iou中匹配上的元素值
            """
            matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]), 1).cpu().numpy()  # [label, detect, iou]

            # 匹配存在多个元素，可能是因为广播机制导致的重复，去除重复。
            if x[0].shape[0] > 1:
                # 对置信度排序，倒序排列
                matches = matches[matches[:, 2].argsort()[::-1]]

                # 对预测detect进行去重，一个预测框只保留一个iou匹配。返回去重之后的matches
                matches = matches[np.unique(matches[:, 1], return_index=True)[1]]

                # 对label进行去重，一个gt框只保留一个最高iou匹配，返回去重之后的matches
                matches = matches[np.unique(matches[:, 0], return_index=True)[1]]

            # 将correct中相应的元素置为True
            correct[matches[:, 1].astype(int), i] = True

            """
            (Pdb) pp correct
            array([[False, False, False, ..., False, False, False],
                   [False, False, False, ..., False, False, False],
                   [False, False, False, ..., False, False, False],
                   ...,
                   [False, False, False, ..., False, False, False],
                   [False, False, False, ..., False, False, False],
                   [False, False, False, ..., False, False, False]])
            (Pdb) pp correct.shape
            (300, 10)
            """

    return torch.tensor(correct, dtype=torch.bool, device=iouv.device)