深度学习评价指标

最新推荐文章于 2024-07-25 11:57:50 发布

陈昊-1

最新推荐文章于 2024-07-25 11:57:50 发布

阅读量6.8k

点赞数 5

分类专栏：深度学习

本文链接：https://blog.csdn.net/fireflychh/article/details/83590716

版权

深度学习专栏收录该内容

6 篇文章 1 订阅

订阅专栏

与目标识别不同，目标检测中不仅仅需要在一张图片中检测到是否含有某物体，还需要将该物体的位置找出来，所以在判定模型的好坏时，就有其标准--mAP

一、Mean Average Precision--mAP

(一)什么是mAP

平均精度均值（mAP）是预测目标位置以及类别的这一类算法的性能度量标准。mAP对于评估目标定位模型、目标检测模型以及实例分割模型非常有用。

在模型预测时，输出的bounding box是有很多的，但是大多数都是置信度很小的，我们zhix只需要输出置信度chao超过某个阈值的bounding box。

（二）mAP是怎么计算的

（1）准确率---precision

若yige一个待检测的物体为狗，我们将被正确识别的狗，即检测为狗实际也为狗，成为true positives。将被正确识别为猫实际也为猫，成为true negatives。被错误识别为狗的猫成为False positive，被错误识别为猫的狗成为False negatives。

则precision的计算为：

准确率可以反映一个类别的正确预测正确率。

（2）召回率----recall

recall的计算为：

召回率指的是一类目标有多少被识别出来了。

（3）精确度---accuracy

精确度就是在所有预测的样本中，识别正确的占了多少。

准确率和召回率是相互影响的，因为如果想要提高准确率就会把预测的置信率阈值调高，所有置信率较高的预测才会被显示出来，而那一些正确预测（true positive）可能因为置信率比较低而没有被显示了。一般情况下准确率高、召回率就低，召回率低、准确率高，如果两者都低，就是网络出问题了。一般情况，用不同的阈值，统计出一组不同阈值下的精确率和召回率，如下图：

3、AP值

Average precision 平均精确度

如何衡量一个模型的性能，单纯用precision和recall都不科学。于是人们想到，为何不把PR曲线下的面积当作衡量尺度？于是就

有了AP值这一概念。这里的average等于对precision取平均。

4、mAP值

是对多个验证集个体求平均AP值。如下图

二、影响mAP的因素

一般来说影响mAP的原因有很多，主要以下几个：

1、不好的训练数据

2、训练数据不够多

3、标注的框不够准确

4、数据的多变性

有的时候增加训练数据可能mAP并不会增加多少。当然了，使用表现更好的网络，其mAP自然地也会更加的高。

三、列举一个实例

如一个检测任务，标注的label值为：一共9个实际目标

gt_class_id = [2 2 2 2 1 1 3 3 3]

gt_bbox = [[616 2 634 32] [583 77 603 117] [564 137 579 174] [540 211 555 247] [583 0 731 332] [544 117 605 464] [570 91 586 127] [595 315 628 377] [539 451 555 488]]

gt_mask为shape （960，960，9）

检测值为：检测出了8个

'class_ids': array([1, 1, 3, 2, 2, 2, 2, 3]

'rois': array([[543, 112, 607, 459], [580, 10, 731, 341], [598, 317, 631, 373], [584, 80, 601, 114], [563, 136, 580, 172], [541, 213, 554, 245], [617, 1, 635, 30], [539, 449, 553, 481]],

'scores': array([0.99981266, 0.9997358 , 0.9985984 , 0.99840206, 0.9980519 , 0.99351305, 0.9930728 , 0.9420509 ]

接下来计算出真实label目标和检测目标间的匹配关系

gt_match = [ 6.  3.  4.  5.  1.  0. -1.  2. -1.]
pred_match = [ 5.  4.  7.  1.  2.  3.  0. -1.]
overlaps = [[0.0000000e+00 0.0000000e+00 1.5040052e-02 1.1429034e-02 0.0000000e+00
  9.3192905e-01 0.0000000e+00 2.4425616e-03 7.9968013e-05]
 [4.8068608e-03 1.0620207e-02 0.0000000e+00 0.0000000e+00 9.3075949e-01
  0.0000000e+00 0.0000000e+00 1.3462573e-04 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 6.4238455e-05
  0.0000000e+00 0.0000000e+00 6.1530399e-01 0.0000000e+00]
 [0.0000000e+00 5.7894737e-01 0.0000000e+00 0.0000000e+00 6.1281337e-03
  0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 8.3177572e-01 0.0000000e+00 0.0000000e+00
  1.4685095e-02 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 6.2641507e-01 0.0000000e+00
  1.1368091e-02 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [7.1212119e-01 0.0000000e+00 0.0000000e+00 0.0000000e+00 6.2919874e-03
  0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
  1.5520721e-04 0.0000000e+00 0.0000000e+00 4.9000001e-01]]
precisions = [1.    1.    1.    1.    1.    1.    1.    0.875]

随后，根据匹配情况计算精度，精度是按照检测正确的个数除以检测的个数，这里首先对预测的sores进行按从大到小排序，之后得到该预测框与真实label框的每个的iou，iou大于0.5，并且两个框的类别相同，则标记为检测准确，之后将label对应的预测id和预测对应的labelid分别记录，检测精度是检测到正确的mubi目标个数除以检测的目标个数：

precisions = np.cumsum(pred_match > -1) / (np.arange(len(pred_match)) + 1)

对于召回率，上面计算了针对score从大到小的排序得到的精度，那么对于不同精度的召回率为检测的准确值除以真实值个数，

recalls = np.cumsum(pred_match > -1).astype(np.float32) / len(gt_match)

根据上面的公式计算出精度和召回率分别为：

precisions = [1. 1. 1. 1. 1. 1. 1. 0.875]

recalls = [0.11111111 0.22222222 0.33333334 0.44444445 0.5555556 0.6666667 0.7777778 0.7777778 ]

画出召回率和精度的曲线如下，其中x轴为召回率，y轴为精度，曲线如下：

随后计算曲线下方的面积。

# Ensure precision values decrease but don't increase. This way, the
# precision value at each recall threshold is the maximum it can be
# for all following recall thresholds, as specified by the VOC paper.
for i in range(len(precisions) - 2, -1, -1):
precisions[i] = np.maximum(precisions[i], precisions[i + 1])

# Compute mean AP over recall range
indices = np.where(recalls[:-1] != recalls[1:])[0] + 1
mAP = np.sum((recalls[indices] - recalls[indices - 1]) *precisions[indices])

最终计算所有验证集的相关参数：

# Compute VOC-style Average Precision
def compute_batch_ap(image_ids):
APs = []
precisions = []
recalls =[]
overlaps = []
for image_id in image_ids:
# Load image
image, image_meta, gt_class_id, gt_bbox, gt_mask =\
modellib.load_image_gt(dataset, config,
image_id, use_mini_mask=False)
# Run object detection
results = model.detect([image], verbose=0)
# Compute AP
r = results[0]
AP, precision, recall, overlap =\
utils.compute_ap(gt_bbox, gt_class_id, gt_mask,
r['rois'], r['class_ids'], r['scores'], r['masks'])
APs.append(AP)
precisions.append(precision)
recalls.append(recall)
overlaps.append(overlap)
return APs, precisions, recalls, overlaps

# Pick a set of random images
print("len(dataset.image_ids) =", len(dataset.image_ids))
#image_ids = np.random.choice(dataset.image_ids, NUM_IMAGES)
#APs, ps, rs, os = compute_batch_ap(image_ids)
APs, ps, rs, os = compute_batch_ap(dataset.image_ids)

# Do the math
precisions = []
recalls = []
overlaps = []
for r,o,p in zip(rs,os,ps):
recalls.append(np.max(r))
overlaps.append(np.max(o))
precisions.append(np.max(p))

# Show Results
print("maxAP @ IoU=50: ", np.max(APs))
print("meanAP @ IoU=50: ", np.mean(APs))
print("meanRecall @ IoU=50: ", np.mean(recalls))
print("meanPrecision @ IoU=50: ", np.mean(precisions))
print("meanOverlaps @ IoU=50: ", np.mean(overlaps))

得到：

maxAP @ IoU=50: 1.0

meanAP @ IoU=50: 0.7458983721724436

meanRecall @ IoU=50: 1.0

meanPrecision @ IoU=50: 0.9580298124915688

meanOverlaps @ IoU=50: 0.8675879