目标检测测评指标——mAP

最新推荐文章于 2024-07-28 08:30:59 发布

Peanut_X

最新推荐文章于 2024-07-28 08:30:59 发布

阅读量1.4w

点赞数 9

分类专栏：机器学习

本文链接：https://blog.csdn.net/xiezongsheng1990/article/details/89608923

版权

机器学习专栏收录该内容

37 篇文章 1 订阅

订阅专栏

文章目录

mAP：mean Average Precision,简单翻译过来就是平均的平均精确度（没错，就是两个平均），首先是一个类别内，求平均精确度（Average Precision），然后对所有类别的平均精确度再求平均（mean Average Precision）。

精确度

对于简单的二分类问题，精确度的计算公式为：
$\frac{TP}{TP + FP} = \frac{TP}{N_{detection}}$
实际上就是看所检测出来的结果中有多少是正确的，也就是预测结果为Positive的样例中，True Positive的比例。
但对于目标检测问题，一般以检测出来的bounding box与ground true的bounding box的交并比（IOU）来判断检测结果是TP还是FP。因此，取不同的IOU，TP和FP的数量就会不一样，如下图所示（横坐标表示IOU值，纵坐标表示样例的数量，左侧“驼峰”内的是标签为False的样例，右侧“驼峰”内的是标签为True的样例，中间的分界线表示IOU的判别阈值，模型把分界线左侧的样例判别为False，右侧的判别为True，得到混淆矩阵的TP、FP、FN和TN四个不同的结果。当分界线取不同值时，矩阵内的各个值都会变化）。在评判模型优劣的时候，不能以某个单一的IOU来衡量精确度，因此，就需要计算AP（Average Precision）
在这里插入图片描述

Average Precision

在判断模型的优劣时，除了精确度之外，还有另一个非常重要的指标——Recall，计算方式如下：
$R=\frac{TP}{TP+FN}=\frac{TP}{N_{gt}}$
实际上就是看有多少正例被找出来了，等于TP占所有ground true标签为True的比例。
在一定程度上，Precision和Recall是“对立”的，当IOU的判别阈值增大时（IOU更大的才被判别为正例），即上图的分界线向右移时，Precision会增大，而Recall会减小；当IOU阈值减小时，即分界线往左移，Precision会减小，而Recall会增大。反过来考虑，在不同的Recall值下，会有不同的Precision值。以Recall值为横坐标，以Precision为纵坐标，把取不同阈值时的结果画到图上，就得到了所谓的PR曲线，如下图所示。
在这里插入图片描述
计算不同Recall值下的Precision的平均值，就得到了所谓的Average Precision。对不同的Recall值，可以有不同的取值方法，其中PASCAL VOC 2007和PASCAL VOC 2012 中采取的是如下两种不同的方式

PASCAL VOC 2007的AP

对于上面的PR曲线，取（0.0， 0.1， 0.2 …… 1.0）共11个Recall值，计算Precision的平均值。考虑到在样本有限的情况下，有可能出现如下图这样的抖动，即Recall值较大时，Precision反而比Recall较小时的值更大。因此，还需采取一定的平滑措施，计算方式如下。
在这里插入图片描述
$\frac{1}{11} \sum_{r \in \{0, 0.1, ..., 1.0\}} \rho_{interp}(r) \\ \rho_{interp}(r)=\max_{\hat{r}:\hat{r}\geqslant r}(\hat{r})$
实际上就是对于每个Recall值下的Precision，取所有比当前值大的Recall对应的Precision的最大值作为当前Recall值下的Precision，对应上图，就是取当前recall值右侧的最大Precision作为当前的Precision。平滑后，得到的就是如上面红色虚线所示的值。
上面的计算过程，可以看成是把横坐标分成11个bin，每个bin的宽度就是 $\frac{1}{11}$ ，高度就是Precision值，AP值就是这11个bin的面积和。如上图的计算过程如下
$\begin{aligned} AP &= \frac{1}{11}(1 + 0.6666 + 0.4285 + 0.4285 + 0.4285 + 0 + 0 + 0 + 0 + 0 + 0) \\ &=0.2684 \end{aligned}$

PASCAL VOC 2012 的AP

PASCAL VOC 2012 中（貌似是从VOC2010开始的），针对每一个不同的Recall值（包括0和1），选取其大于等于这些Recall值时的Precision最大值，然后计算PR曲线下面积作为AP值。平滑的结果还是上面那个曲线，只不过计算平均值的点更多了。
这样取均值的结果，就可以看成是计算平滑后的PR曲线（上图红色虚线）的AUC（Area Under Curve）。
在这里插入图片描述
如上图所示，AP值实际上就等于四个方框的面积和。
$\\$
例如上图中
$\begin{aligned} A1& = (0.0666-0)*1=0.0666 \\ A2 &= (0.1333 - 0.0666)*0.0666 = 0.0444 \\ A3 &= (0.4 - 0.1333) * 0.4285 = 0.1142 \\ A4 &= (0.4666 - 0.4) * 0.3043 = 0.0202 \\ AP &= 0.0666 + 0.0444 + 0.1142 + 0.0202 = 0.2456 \end{aligned}$
实际计算时，可能是用两个Recall的间隔作为bin的宽度，以平滑后的Precision作为高，求和后与上面的过程是一样的。

实现

以下为Object-Detection-Metrics的实现
输入：
boundingboxes：主要是BoundingBox的列表，加上相关的方法，BoundingBox定义如下

class BoundingBox:
	def __init__(self,
                 imageName,
                 classId,
                 x,
                 y,
                 w,
                 h,
                 typeCoordinates=CoordinatesType.Absolute,
                 imgSize=None,
                 bbType=BBType.GroundTruth,
                 classConfidence=None,
                 format=BBFormat.XYWH)

计算map过程如下

def GetPascalVOCMetrics(
    self,
    boundingboxes, 
    IOUThreshold=0.5, 
    method=MethodAveragePrecision.EveryPointInterpolation):
        ret = []  # 结果列表，每个class一个元素
        groundTruths = [] #  每个标签一个元素，元素格式为[imageName,class,confidence=1, (bb coordinates XYX2Y2)])
        detections = [] #检测结果，每个结果一个元素，元素格式为[imageName,class,confidence,(bb coordinates XYX2Y2)]
        classes = []
        # boundingboxes包含了标签和检测值两类，需要分开
        for bb in boundingboxes.getBoundingBoxes():
            if bb.getBBType() == BBType.GroundTruth: #GroundTruth
                groundTruths.append([
                    bb.getImageName(),
                    bb.getClassId(), 1,
                    bb.getAbsoluteBoundingBox(BBFormat.XYX2Y2)
                ])
            else: # Detected
                detections.append([
                    bb.getImageName(),
                    bb.getClassId(),
                    bb.getConfidence(),
                    bb.getAbsoluteBoundingBox(BBFormat.XYX2Y2)
                ])
            # 获取所有的class id
            if bb.getClassId() not in classes:
                classes.append(bb.getClassId())
        classes = sorted(classes)
        
        for c in classes: #对每个class分别计算Precision和Recall
            # Get only detection of class c
            dects = []
            [dects.append(d) for d in detections if d[1] == c]
            # Get only ground truths of class c
            gts = []
            [gts.append(g) for g in groundTruths if g[1] == c]
            npos = len(gts)
            # 把detections按照conf排序，排在前面的准确率一般较高
            dects = sorted(dects, key=lambda conf: conf[2], reverse=True)
            # one hot形式
            TP = np.zeros(len(dects))
            FP = np.zeros(len(dects))
            # 计算没张图片的gt标签数
            det = Counter([cc[0] for cc in gts])
            # 转换成one hot
            for key, val in det.items():
                det[key] = np.zeros(val)
            # 分别计算每个detection的iou
            for d in range(len(dects)):
                # 寻找与detection同类的gt标签
                gt = [gt for gt in gts if gt[0] == dects[d][0]]
                iouMax = sys.float_info.min
                # 在所有同类gt标签中寻找iou最大的
                for j in range(len(gt)):
                    iou = Evaluator.iou(dects[d][3], gt[j][3])
                    if iou > iouMax:
                        iouMax = iou
                        jmax = j
                # 把tp和fp相应的位置置1
                if iouMax >= IOUThreshold:
                    if det[dects[d][0]][jmax] == 0: # 该gt标签未被“占用”
                        TP[d] = 1  
                        det[dects[d][0]][jmax] = 1  
                    else:
                        FP[d] = 1  
                else:
                    FP[d] = 1  
            # 计算FP的“累计”形式
            acc_FP = np.cumsum(FP) # 每个元素记录当前元素为止，FP的个数
            acc_TP = np.cumsum(TP)# 每个元素记录当前元素为止，TP的个数
            rec = acc_TP / npos # 截止到当前detection的recall
            prec = np.divide(acc_TP, (acc_FP + acc_TP))# 截止到当前detection的precision
            # 两种计算方式，ElevenPoint和EveryPoint
            if method == MethodAveragePrecision.EveryPointInterpolation:
                [ap, mpre, mrec, ii] = Evaluator.CalculateAveragePrecision(rec, prec)
            else:
                [ap, mpre, mrec, _] = Evaluator.ElevenPointInterpolatedAP(rec, prec)
            r = {
                'class': c,
                'precision': prec,
                'recall': rec,
                'AP': ap,
                'interpolated precision': mpre,
                'interpolated recall': mrec,
                'total positives': npos,
                'total TP': np.sum(TP),
                'total FP': np.sum(FP)
            }
            ret.append(r)
        return ret

关键函数：ElevenPointInterpolatedAP 和 CalculateAveragePrecision（ElevenPoint)

def CalculateAveragePrecision(rec, prec):
        mrec = []
        # recall第一个元素为0，最后一个元素为1
        mrec.append(0) 
        [mrec.append(e) for e in rec]
        mrec.append(1)
        # precision第一个元素为0？最后一个元素为0
        mpre = []
        mpre.append(0)
        [mpre.append(e) for e in prec]
        mpre.append(0)
        # 对precision进行平滑，每个precision都是其后所有元素的最大值
        for i in range(len(mpre) - 1, 0, -1):
            mpre[i - 1] = max(mpre[i - 1], mpre[i])
        # 按照recall的值进行分段
        ii = []
        for i in range(len(mrec) - 1):
            if mrec[1:][i] != mrec[0:-1][i]:
                ii.append(i + 1)
        ap = 0
        for i in ii:
            ap = ap + np.sum((mrec[i] - mrec[i - 1]) * mpre[i])
        # return [ap, mpre[1:len(mpre)-1], mrec[1:len(mpre)-1], ii]
        return [ap, mpre[0:len(mpre) - 1], mrec[0:len(mpre) - 1], ii]

def ElevenPointInterpolatedAP(rec, prec):
    # def CalculateAveragePrecision2(rec, prec):
    mrec = []
    # mrec.append(0)
    [mrec.append(e) for e in rec]
    # mrec.append(1)
    mpre = []
    # mpre.append(0)
    [mpre.append(e) for e in prec]
    # mpre.append(0)
    recallValues = np.linspace(0, 1, 11)
    recallValues = list(recallValues[::-1]) #[1, 0.9, ..., 0.1, 0]
    rhoInterp = []
    recallValid = []
    # For each recallValues (0, 0.1, 0.2, ... , 1)
    for r in recallValues:
        # Obtain all recall values higher or equal than r
        argGreaterRecalls = np.argwhere(mrec[:] >= r)
        pmax = 0
        # If there are recalls above r
        if argGreaterRecalls.size != 0:
            pmax = max(mpre[argGreaterRecalls.min():])
        recallValid.append(r)
        rhoInterp.append(pmax)
    # By definition AP = sum(max(precision whose recall is above r))/11
    ap = sum(rhoInterp) / 11
    # Generating values for the plot
    rvals = []
    rvals.append(recallValid[0])
    [rvals.append(e) for e in recallValid]
    rvals.append(0)
    pvals = []
    pvals.append(0)
    [pvals.append(e) for e in rhoInterp]
    pvals.append(0)
    # rhoInterp = rhoInterp[::-1]
    cc = []
    for i in range(len(rvals)):
        p = (rvals[i], pvals[i - 1])
        if p not in cc:
            cc.append(p)
        p = (rvals[i], pvals[i])
        if p not in cc:
            cc.append(p)
    recallValues = [i[0] for i in cc]
    rhoInterp = [i[1] for i in cc]
    return [ap, rhoInterp, recallValues, None]