1、将所有预测框根据置信度从大到小进行排序;
2、选取不同的置信度阈值,预测框的置信度大于此阈值为Positive,小于为negative;
3、确定IOU阈值(一般0.5)。计算Positive与每个GT的IOU,当最大的IOU大于0.5时,为TP;小于0.5时为FP;
FP 误检;FN漏检
precision
=
T
P
T
P
+
F
P
\text {precision}=\frac{T P}{T P+F P}
precision=TP+FPTP
recall
=
T
P
T
P
+
F
N
\text {recall}=\frac{T P}{T P+F N}
recall=TP+FNTP
每个GT只有对应一个预测框。如果有多个检测框都与同一个GT匹配,那么置信度最高的那个会被算为TP,其余均为FP
4、画出P-R曲线,接下来再对P-R进行修正。
选择11个不同的r([0, 0.1, …, 0.9, 1.0]),对于某个recall值r,precision值取所有recall>r中的最大值。
Facebook开源的Detectron包含VOC数据集的mAP计算(https://github.com/facebookresearch/Detectron/blob/05d04d3a024f0991339de45872d02f2f50669b3d/lib/datasets/voc_eval.py),这里贴出其核心实现,以对mAP的计算有更深入的理解。首先是precision和recall的计算:
# 按照置信度降序排序
sorted_ind = np.argsort(-confidence)
BB = BB[sorted_ind, :] # 预测框坐标
image_ids = [image_ids[x] for x in sorted_ind] # 各个预测框的对应图片id# 便利预测框,并统计TPs和FPs
nd = len(image_ids)
tp = np.zeros(nd)
fp = np.zeros(nd)
for d in range(nd):
R = class_recs[image_ids[d]]
bb = BB[d, :].astype(float) # 预测框
ovmax = -np.inf
BBGT = R['bbox'].astype(float) # ground truth
if BBGT.size > 0:
# 计算IoU
# intersection
ixmin = np.maximum(BBGT[:, 0], bb[0])
iymin = np.maximum(BBGT[:, 1], bb[1])
ixmax = np.minimum(BBGT[:, 2], bb[2])
iymax = np.minimum(BBGT[:, 3], bb[3])
iw = np.maximum(ixmax - ixmin + 1., 0.)
ih = np.maximum(iymax - iymin + 1., 0.)
inters = iw * ih # union
uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
(BBGT[:, 2] - BBGT[:, 0] + 1.) *
(BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
overlaps = inters / uni
ovmax = np.max(overlaps)
jmax = np.argmax(overlaps) # 取最大的IoU
if ovmax > ovthresh: # 是否大于阈值
if not R['difficult'][jmax]: # 非difficult物体
if not R['det'][jmax]: # 未被检测
tp[d] = 1.
R['det'][jmax] = 1 # 标记已被检测
else:
fp[d] = 1.
else:
fp[d] = 1.# 计算precision recallfp = np.cumsum(fp)
tp = np.cumsum(tp)
rec = tp / float(npos)# avoid divide by zero in case the first detection matches a difficult# ground truth
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)