【目标检测】目标检测算法评估指标(性能度量) AP，mAP 详细介绍

最新推荐文章于 2025-03-24 10:23:47 发布

置顶 __阿健__

最新推荐文章于 2025-03-24 10:23:47 发布

阅读量7.9k

点赞数 21

分类专栏：计算机视觉 Computer Vision

本文链接：https://blog.csdn.net/qq_24224067/article/details/108987897

版权

计算机视觉 Computer Vision 专栏收录该内容

4 篇文章

订阅专栏

参考论文：《A Survey on Performance Metrics for Object-Detection Algorithms》

对应Github：https://github.com/rafaelpadilla/Object-Detection-Metrics

如何评估(evaluate)目标检测算法的表现(performance)？

目标检测算法的评估和分类算法的评估有所不同，在目标检测任务中，我们即需要 识别出正确的目标类别，又需要 定位出准确的目标位置。

评估目标检测算法性能 最常用的 指标是 AP (average precision，针对单类别) 和 mAP (mean AP，针对多类别)。

AP

1、重要的基础概念（前置知识）

IOU (Intersection Over Union, 交并比)

IOU 是评估 两个 bounding box 的接近程度 的度量（同时考虑了大小和位置），等于 交集的面积 / 并集的面积，范围为 [0, 1]。

在这里插入图片描述
IOU 越大，说明两个 bounding box 的接近程度越高，1 代表完全重合，0 代表不相交。

在这里插入图片描述

True Positive, False Positive, False Negative and True Negative

ground truth 和 detection 总共有以下几种 可能的对应情况：

True positive (TP): A correct detection of a ground-truth bounding (对 gt 的正确检测). Detection with IOU ≥ threshold
False positive (FP): A wrong detection【An incorrect detection of a nonexistent object (对不存在对象的错误检测), or a misplaced detection of an existing object (对存在对象的位置错误的检测)】.Detection with IOU < threshold
False negative (FN): A ground truth not detected (没有被检测到的 gt).

在目标检测中，我们没有使用 true negative (TN)，因为 有无穷多个 bounding boxes that should not be detected within any given image.

其中，threshold 取决于具体任务的指标，通常取 50%, 75% 或是 95%。

Precision, Recall

因为在目标检测中没有使用 TN，所以我们无法使用任何基于 TN 的度量，比如 TPR, FPR 和 ROC 曲线。作为代替，目标检测算法的评估主要是基于 Precision(准确率) 和 Recall(召回率)。

在这里插入图片描述

Precision：等于 positive predictions (所有 detection) 中 correct positive predictions (预测正确的 detection) 占的比例，反映了模型 identify only relevant objects 的能力。
Recall：等于 correct positive predictions (预测正确的 detection) 占 all given ground truths (所有 gt)，反映了模型 find all relevant cases (all ground-truth bounding boxes) 的能力。

一个完美，理想的目标检测器应该 find all ground-truth objects (FN = 0，即 high recall)，同时 identifying only relevant objects (FP = 0，即 high precision)。

2、P-R curve (Precision × Recall curve)

Precision 和 Recall 是一对矛盾的度量。具体地，

当检测器的置信度阈值上升时，detection，包括正确的detection(TP) 和错误的detection(FP) 都会减少，总体上Precision 会震荡上升，同时，未被检出的ground truth(FN) 会增多，Recall 会下降。
相反也同理

我们可以使用 Precision x Recall curve 来评估目标检测器 在不同的置信度阈值下 Precision 和 Recall 之间的 权衡(trade-off) 的情况。

An object detector of a particular class is considered good if its precision stays high as recall increases, which means that if you vary the confidence threshold, the precision and recall will still be high.
A poor object detector needs to increase the number of detected objects (increasing False Positives = lower precision) in order to retrieve all ground truth objects (high recall). That’s why the Precision x Recall curve usually starts with high precision values, decreasing as recall increases.

3、AP (Average Precision)

A high AP (area under the curve (AUC) of the Precision x Recall curve) 可以表明 detector has both high precision and high recall。

但是，P-R 曲线往往为上下波动的锯齿状，这给准确测量曲线的 AUC 带来了挑战。

在估计(estimate) AUC 之前，我们需要对 P-R 曲线进行处理，以消除锯齿现象。

一般有两种处理方式：11点插值法(11-point interpolation) 和 全点插值法(all-point interpolation)。

11点插值法

11点插值法，是通过 averaging the maximum precision values at a set of 11 equally spaced recall levels [0, 0.1, 0.2, … , 1] 【对一系列等间隔的recall level下的最大的precision值求平均】来近似 P-R 曲线。写作数学公式，

在这里插入图片描述
其中，

在这里插入图片描述
Instead of using the precision observed at each point, the AP is obtained by interpolating the precision only at the 11 levels $R$ taking the maximum precision whose recall value is greater than $R$ .

全点插值法

全点插值法，是通过以下方式对所有点进行插值。

在这里插入图片描述
其中，

在这里插入图片描述
In this case, instead of using the precision observed at only few points, the AP is now obtained by interpolating the precision at each level $R$ , taking the maximum precision whose recall value is greater or equal than $R_{n+1}$ .

计算实例

举个例子来帮助理解。

如下图所示，7幅图像上有15个ground-truth框【用绿色方框表示】，模型给出了24个detections【用红色方框表示，使用字母 (A,B,…,Y) 进行编号】，每个detection有一个置信度。

在这里插入图片描述
在本例中，我们设定 IOU 的阈值为 30%，即，如果 detection 和某个 ground truth 的 IOU 大于等于30%，则判断为正确的 (TP)，否则为错误的 (FP)。

另外，对于 单个 ground truth，检测器可能会预测出 多个重复的 detection (如图2中的D和E；图3中的G、H和I)。这种情况下，我们将 置信度最高的 detection 判断为 TP，其余的判断为 FP。

各 detection 的判断如下表所示。

在这里插入图片描述
为了计算绘制 TP or FP detections 的 Precision x Recall curve，我们首先需要将 detections 按照置信度的大小进行排序，然后根据累计的 TP or FP detections 计算对应的 Precision 和 Recall，如下表所示

在这里插入图片描述
其中，Acc TP 和 Acc FP 两列是对对应置信度以上的所有 TP 和 FP detections 的累计。

Plotting the precision and recall values we have the following Precision x Recall curve:

在这里插入图片描述
如上面所述，有两种方法来测量 AP：

计算11点插值法的AP

The idea of the 11-point interpolated average precision is to average the precisions at a set of 11 recall levels (0,0.1,…,1). The interpolated precision values are obtained by taking the maximum precision whose recall value is greater than its current recall value as follows:

在这里插入图片描述
可以得到：

在这里插入图片描述
计算全点插值法的AP

By interpolating all points, the Average Precision (AP) can be interpreted as an approximated AUC of the Precision x Recall curve. The intention is to reduce the impact of the wiggles in the curve. By applying the equations presented before, we can obtain the areas as it will be demostrated here. We could also visually have the interpolated precision points by looking at the recalls starting from the highest (0.4666) to 0 (looking at the plot from right to left) and, as we decrease the recall, we collect the precision values that are the highest as shown in the image below:

在这里插入图片描述

可以得到：