【收藏 | 概述】理清目标检测系列 | （一）目标检测基础知识+基本任务+计算mAP代码解析

健0000

于 2021-10-20 14:41:08 发布

阅读量1.4k

点赞数 1

分类专栏：目标检测文章标签：深度学习机器学习 pytorch 计算机视觉

本文链接：https://blog.csdn.net/qq_41094058/article/details/120750434

版权

目标检测专栏收录该内容

1 篇文章

订阅专栏

本文深入讲解了目标检测中的IOU、误检(FP/FN/TP/TN)、精确率(Precision)和召回率(Recall)等基本概念，以及常用的评估指标如F-Score、PR曲线和AP（平均精度）。实战部分演示了如何使用mmdetection和pycocotools计算VOC mAP，适合进阶学习者。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在这里插入图片描述

本文适用与已有一定深度学习和一点目标检测基础的同学，本文对部分概念性问题不会详细介绍

一、目标检测任务简介

（一）基础知识

参考：FP、FN、TP、TN、精确率(Precision)、召回率(Recall)、准确率(Accuracy)评价指标详述
参考：目标检测基础知识
参考：rafaelpadilla/Object-Detection-Metrics

1、IOU

交并比：两个框之间，交集区域面积除以并集区域面积

2、FP、FN、TP、TN

（1）二分类

对于二分类问题，可以直接划分为正样本和负样本，然后依据该图划分。

（2）检测

对于多分类的检测问题：可以理解为多个二分类问题：对某一类单独计算FP、FN、TP、TN。
先取出该类所有的预测框和真值框，进行匹配操作，把真值框和预测框进行配对。
1）每个预测框去匹配与其IOU最大的真值框。
2）如果该预测框与该真值框的IOU大于某一阈值，则认为匹配成功。
3）复查，如果有多个预测框成功匹配同一个真值框，则只保留一个成功匹配的，其余全部认为匹配失败。
匹配完之后，直接简单的依照下图划分FP、FN、TP、TN。

在这里插入图片描述

更直观地来看是这样的

3、Precision、Recall、Accuracy

在这里插入图片描述

Precision精确率：所有预测框中，成功匹配的预测框所占的比例
$\frac{TP}{P}$
Recall召回率：所有真值框中，成功匹配的真值框所占的比例
$\frac{TP}{TP+FN}$
Accuracy准确率，其公式如下所示，可以看出，因为使用到了TN，所以对于检测没有意义
$\frac{T}{T+F}$

（二）常用评估指标

1、F-Score与F1-Score

Precision和Recall是一对相互矛盾的量，而F-Score能够综合评价Precision和Recall。
$(1+\beta^2)\frac{P \times R}{\beta^2 P + R}$
F-Score是 $\beta=1$ 的特殊情况
$F_1 Score = \frac{P \times R}{ P + R}$

2、PR曲线（Precision-Recall曲线）

本文上述的Precision和Recall都是具体数值，也并没有自变量和因变量之间的关系，如何绘制一个曲线呢？
Precision和Recall都是相对于预测框数量的函数：预测框数量多时，Recall高Precision低，而预测框数量少时，Precision高Recall低。可以通过修改预测框数量，获得Precision和Recall，从而描点绘制曲线。
首先通过预测框的confidence score进行排序，然后按照顺序选取前N个预测框计算Precision和Recall。只要我们每修改一次预测框的数目N，就可以获得一个新的(Precision,Recall)点。只要计算足够多的(Precision,Recall)点，就可以获得Precision-Recall曲线了。
-在某一给定IOU阈值下，流程如下图所示：（曲线的插值细节见后文计算mAP的地方）

在这里插入图片描述

图中只有TP和FP，并没有出现FN，只要用真值框的数量减去TP的数量就是FN的数量了

3、AP与mAP

2010年以前（PASCAL VOC论文）使用插值 11 个等距点的方法；PASCAL VOC 2012之后采用所有点执行的插值的方法，本文介绍PASCAL VOC 2012之后的方法。

（1）PR曲线与AP

AP指的是PR曲线的近似面积，但是PR曲线是离散的点构成的，如何公平的计算面积呢？
以下这里参考了rafaelpadilla/Object-Detection-Metrics的讲解，图片来源于它，讲得非常好。
下表显示了按照置信度排序后的预测框信息。
直接描点画图，可以得到以下的折线图。
按照PASCAL VOC 2012之后的计算方法，通过对所有点进行插值（the interpolation performed in all points），平均精度 (AP) 可以解释为 Precision x Recall 曲线的近似 AUC。目的是减少曲线中抖动的影响。
插值方法为：The interpolated precision values are obtained by taking the maximum precision whose recall value is greater than its current recall value as follows。在大于该recall的曲线中，选择最大的P值。
直观地来说就是：相当于一束光从右边向左侧射过来，光线所能到达的区域的边界就是新的曲线。
详细来说：对于一个的P-R折线图，任意给定一个R就可以获得一个P，于是可以表示为一个函数P®，即蓝色实线。设新的插值得到折线图（红色虚线）为 $P 2 (R)$ 。对于任意一个点 $R_0$ ， $P2(R_0)$ 可以通过下式计算得到：
$P2(R_0) = max_{R \in [R_0,1]}P(R)$

（2）AP@[.50 : .05 : .95]、mAP

AP 0.5，指的是IOU阈值设置为0.5时的AP的值
AP@[.50 : .05 : .95]，指的是IOU阈值分别设置为0.5, 0.55, 0.6, …0.95时，所有AP的均值
mAP 常指 mAP (IOU=.50)，指的是在IOU阈值设置为0.5时，所有类别的AP的均值

（三）实战

0、读代码之前：一些numpy函数介绍

numpy官网
一个查阅包的官方文档的离线软件Zeal is an offline documentation browser for software developers.

(1) np.bincount

'''
np.bincount
    Count number of occurrences of each value in array of non-negative ints.
    计算非负整数数组中每个值的出现次数。
'''
# 计算每个非负整数出现的次数
np.bincount(np.array([0, 1, 2, 2, 7, 6]))
# array([1, 1, 2, 0, 0, 0, 1, 1], dtype=int64)

# 计算每个非负整数所对应的权重的和
w = np.array([0.3, 0.5, 0.2, 0.7, 1., -0.6])
x = np.array([0, 1, 1, 2, 2, 2])
np.bincount(x,  weights=w)
# array([0.3, 0.7, 1.1])

(2) np.argsort

np.argsort(np.array([[0, 3], [2, 2]]))
np.argsort(x, axis=0) # 每列分别排序
# array([[0, 1],
#        [1, 0]])

(3) np.unique

"""
np.unique
	找唯一的元素
"""
np.unique([1, 1, 2, 2, 3, 3])
# array([1, 2, 3])

(4) np.linspace

"""
np.linspace
	生成等差数组
"""
np.linspace(2.0, 3.0, num=5)
# array([2.  , 2.25, 2.5 , 2.75, 3.  ])
np.linspace(2.0, 3.0, num=5 =, endpoint=False)
# array([2. ,  2.2,  2.4,  2.6,  2.8])

(5) np.interp

"""
np.interp
	One-dimensional linear interpolation for monotonically increasing sample points.
	为单调递增的采样点 进行 一维线性插值，（散点图 变成 折线图）
"""
xp = [1, 2, 3]
fp = [3, 2, 0]
np.interp(1.5, xp, fp)
# 2.5
np.interp([1.5, 2.5], xp, fp)
# array([2.5, 1. ])

(6) np.concatenate

字面意思，拼接两个数组

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
np.concatenate((a, b), axis=0) # 0指在0维度上拼接
# array([[1, 2],
#        [3, 4],
#        [5, 6]])
np.concatenate((a, b.T), axis=1)
# array([[1, 2, 5],
#        [3, 4, 6]])

(7) np.flip

"""
np.flip
	Reverse the order of elements in an array along the given axis.
"""
A = np.arange(8).reshape((2,2,2))
# array([[[0, 1],
#         [2, 3]],
#        [[4, 5],
#         [6, 7]]])
np.flip(A, 0)
# array([[[4, 5],
#         [6, 7]],
#        [[0, 1],
#         [2, 3]]])

(8) np.maximum

"""
np.maximum
	Element-wise maximum of array elements.
	The maximum is equivalent to np.where(x1 >= x2, x1, x2) when neither x1 nor x2 are nans, but it is faster and does proper broadcasting.
	简要的说，就是取两个数组中对应元素较大的那个，和 np.where(x1 >= x2, x1, x2)等价，但是更快
"""
np.maximum([2, 3, 4], [1, 5, 2])
# array([2, 5, 4])

(9) np.ufunc.accumulate

"""
numpy.ufunc.accumulate
	Accumulate the result of applying the operator to all elements.
	应用于所有元素的结果
"""
# add的例子
np.add(2,3)
# 5
np.add(np.add(2,3),5)
# 10
np.add.accumulate([2, 3, 5])
# array([ 2,  5, 10], dtype=int32)

# multiply的例子
np.multiply.accumulate([2, 3, 5])
# array([ 2,  6, 30])

# maximum的例子
np.maximum(3,2)
# 3
np.maximum(np.maximum(3,2),5)
# 5
np.maximum.accumulate([3, 2, 5])
# array([3, 3, 5], dtype=int32)

(9) np.trapz

"""
numpy.trapz
	Integrate along the given axis using the composite trapezoidal rule.
	数值积分，复合梯形求积公式；使用什么求积公式对工程其实没大区别，重点是这个函数是用来求积分的
"""
# case 1
np.trapz([1,2,3], x=[4,6,8])
# 8
(1+2)*(6-4)/2 + (2+3)*(8-6)/2 # 手动计算面积
# 8

# case 2
theta = np.linspace(0, 2 * np.pi, num=1000, endpoint=True)
np.trapz(np.cos(theta), x=np.sin(theta))
# 3.141571941375841

# 以下公式为手动计算面积

$\int_{\theta=0}^{\theta=2\pi}cos(\theta)dsin(\theta) = \int_0^{2\pi}cos^2(\theta)d\theta = \int_0^{2\pi} \frac{1+cos(2\theta)}{2} d\theta = \int_0^{4\pi} \frac{1+cos(2\theta)}{4} d(2\theta) = \pi+0=\pi$

(10) np.where

# Return elements chosen from x or y depending on condition.
# 直接看例子
a = np.arange(10)
np.where(a < 5, a, 10*a)
# array([ 0,  1,  2,  3,  4, 50, 60, 70, 80, 90])
# 类似于 b = a if a<5 else 10*a，并对数组中每一个元素都执行

(11) np.cumsum

"""
numpy.cumsum
	Return the cumulative sum of the elements along a given axis.
"""
a = np.array([[1,2,3], [4,5,6]])
np.cumsum(a,axis=1)
# array([[ 1,  3,  6],
#        [ 4,  9, 15]])

1、解析mmdetection计算VOC mAP源码

(1) 使用multiprocessing 多进程加速

在这里插入图片描述

知识点，看下例，将两个列表对应元素相加

from multiprocessing import Pool
def func(a,b):
    return a+b
if __name__ =="__main__":
    list1 = [i for i in range(10)]
    list2 = [i*2 for i in range(10)]
    with Pool(4) as pool:
        res = pool.starmap(func, zip(list1, list2))
    print(res)
    # [0, 3, 6, 9, 12, 15, 18, 21, 24, 27]

(2) mmdetection计算VOC mAP源码

在这里插入图片描述

以下为简化版本，仅用作伪代码参考，详细请见官网

def eval_map(det_results,
             annotations,
             iou_thr=0.5):
    assert len(det_results) == len(annotations)

    use_legacy_coordinate, extra_length = True, 1. # 长宽是否加1，不用在意
    tpfp_fn = tpfp_default

    num_imgs = len(det_results)
    num_scales = 1
    num_classes = len(det_results[0])  # positive class num
    area_ranges = None

    pool = Pool(4)
    eval_results = []
    for i in range(num_classes):
        # 获得第i类的预测框和真值框
        cls_dets, cls_gts, cls_gts_ignore = get_cls_results(
            det_results, annotations, i)

        # 核心函数，多进程对预测框和真值框匹配，计算tp和fp
        tpfp = pool.starmap(
            tpfp_fn,
            zip(cls_dets, cls_gts, cls_gts_ignore,
                [iou_thr for _ in range(num_imgs)],
                [area_ranges for _ in range(num_imgs)],
                [use_legacy_coordinate for _ in range(num_imgs)]))
        tp, fp = tuple(zip(*tpfp))

        # 计算真值框总数
        num_gts = np.zeros(num_scales, dtype=int)
        for j, bbox in enumerate(cls_gts):
            num_gts[0] += bbox.shape[0]

        # 按照置信度，对检测框排序
        cls_dets = np.vstack(cls_dets)
        num_dets = cls_dets.shape[0]
        sort_inds = np.argsort(-cls_dets[:, -1])
        tp = np.hstack(tp)[:, sort_inds]
        fp = np.hstack(fp)[:, sort_inds]


        # 计算recall, precision
        tp = np.cumsum(tp, axis=1)
        fp = np.cumsum(fp, axis=1)
        eps = np.finfo(np.float32).eps
        recalls = tp / np.maximum(num_gts[:, np.newaxis], eps)
        precisions = tp / np.maximum((tp + fp), eps)
        
        # 计算AP值，有area和11points两种插值方式
        recalls = recalls[0, :]
        precisions = precisions[0, :]
        num_gts = num_gts.item()
        mode = 'area' # if dataset == 'voc07': mode = '11points'
        ap = average_precision(recalls, precisions, mode)

        # 结束
        eval_results.append({
            'num_gts': num_gts,
            'num_dets': num_dets,
            'recall': recalls,
            'precision': precisions,
            'ap': ap
        })
    pool.close()

    
    # 计算mAP，是所有类AP的均值
    aps = []
    for cls_result in eval_results:
        if cls_result['num_gts'] > 0:
            aps.append(cls_result['ap'])
    mean_ap = np.array(aps).mean().item() if aps else 0.0

    return mean_ap, eval_results

2、调用pycocotools计算mAP

在这里插入图片描述

读取COCO格式文件，直接调用

from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
import numpy as np
if __name__ == "__main__":
    gt = COCO('gt.json')
    det = gt.loadRes('res.json')
    cocoEval = COCOeval(gt, det, "bbox")
    cocoEval.evaluate()
    cocoEval.accumulate()
    cocoEval.summarize()