之前和百度有个合作,要测一下他们模型的mAP,所以研究了一下Darknet里mAP到底是怎么算的。
validate_detector_map函数的原型是
float validate_detector_map(char *datacfg, char *cfgfile, char *weightfile, float thresh_calc_avg_iou, const float iou_thresh, const int map_points, int letter_box, network *existing_net)
datacfg - data文件
cfgfile - cfg文件
weightfile - weight文件
thresh_calc_avg_iou - 计算precision和recall的阈值(注:mAP和此值无关)
iou_thresh - iou阈值,即目标与gt的iou超过多少认为是检测正确
map_points - 用多少个recall点来计算mAP,点越多越精确,点少算出的mAP偏小。默认为0,即用全部的点
// MS COCO - uses 101-Recall-points on PR-chart.
// PascalVOC2007 - uses 11-Recall-points on PR-chart.
// PascalVOC2010-2012 - uses Area-Under-Curve on PR-chart.
// ImageNet - uses Area-Under-Curve on PR-chart.
letter_box - 是否保持原始分辨率
existing_net - 是否已存在网络,训练时调map是存在的,直接调map是要从配置文件新建网络
下面分析一下核心代码。
1.每4幅图像一组计算,对于每一幅图像,先inference得到检测结果,再过滤掉小于阈值(注:这里阈值传的是0.005,因为要得到所有的检测结果)的检测。hier_thresh是以前YOLOv2用的,现在没用了。
for (t = 0; t < nthreads && i + t - nthreads < m; ++t)
{
const int image_index = i + t - nthreads;
char *path = paths[image_index];
char *id = basecfg(path);
float *X = val_resized[t].data;
network_predict(net, X);
int nboxes = 0;
float hier_thresh = 0;
detection *dets;
if (args.type == LETTERBOX_DATA)
{
dets = get_network_boxes(&net, val[t].w, val[t].h, thresh, hier_thresh, 0, 1, &nboxes, letter_box);
}
else
{
dets = get_network_boxes(&net, 1, 1, thresh, hier_thresh, 0, 0, &nboxes, letter_box);
}
if (nms)
{
if (l.nms_kind == DEFAULT_NMS) do_nms_sort(dets, nboxes, l.classes, nms);
else diounms_sort(dets, nboxes, l.classes, nms, l.nms_kind, l.beta_nms);
}
2.得到网络的检测结果后,都存入detections这个box_prob类的数组里,它对应的属性有bbox,prob,index,类别,是否与gt匹配,对应gt的index。这个detections是后面求mAP用的。然后对每一个prob大于0的检测(实际上是大于0.005,因为小于此值的检测在NMS时被清零0),寻找与它IOU超过阈值且最大,类别相同的gt。如果能找到这样的gt,则更新truth_flag和unique_truth_index。
for (i = 0; i < nboxes; ++i)
{
int class_id;
for (class_id = 0; class_id < classes; ++class_id)
{
float prob = dets[i].prob[class_id];
if (prob > 0)
{
detections_count++;
detections = (box_prob*)xrealloc(detections, detections_count * sizeof(box_prob));
detections[detections_count - 1].b = dets[i].bbox;
detections[detections_count - 1].p = prob;
detections[detections_count - 1].image_index = image_index;
detections[detections_count - 1].class_id = class_id;
detections[detections_count - 1].truth_flag = 0;
detections[detections_count - 1].unique_truth_index = -1;
int truth_index = -1;
float max_iou = 0;
for (j = 0; j < num_labels; ++j)
{
box t = { truth[j].x, truth[j].y, truth[j].w, truth[j].h };
float current_iou = box_iou(dets[i].bbox, t);
if (current_iou > iou_thresh && class_id == truth[j].id)
{
if (current_iou > max_iou)
{
max_iou = current_iou;
truth_index = unique_truth_count + j;
}
}
}
// best IoU
if (truth_index > -1)
{
detections[detections_count - 1].truth_flag = 1;
detections[detections_count - 1].unique_truth_index = truth_index;
}
3.存完detections后,然后计算TP、FP和平均IOU。这时的阈值就是thresh_calc_avg_iou了,从外部传入的,用于计算这个特定阈值下的TP、FP和平均IOU。但mAP是衡量多个阈值下的precision和recall的整体情况,与具体阈值无关。这里的found指当前检测的gt是否被匹配过。假设当前bbox预测第truth_index个gt,但这个gt已经被前面的bbox预测过了(z的范围是checkpoint_detections_count到detections_count - 1,即当前图像上已经处理过的bbox),由于NMS后各个预测结果的prob是降序排的,所以前面的那个预测的才是TP,这个是FP。
// calc avg IoU, true-positives, false-positives for required Threshold
if (prob > thresh_calc_avg_iou)
{
int z, found = 0;
for (z = checkpoint_detections_count; z < detections_count - 1; ++z)
{
if (detections[z].unique_truth_index == truth_index)
{
found = 1; break;
}
}
if (truth_index > -1 && found == 0)
{
avg_iou += max_iou;
++tp_for_thresh;
avg_iou_per_class[class_id] += max_iou;
tp_for_thresh_per_class[class_id]++;
}
else
{
fp_for_thresh++;
fp_for_thresh_per_class[class_id]++;
}
}
4.统计完所有图像后,计算平均IOU和各类的平均IOU。TP的IOU已计入avg_iou,FP的IOU是0。
if ((tp_for_thresh + fp_for_thresh) > 0)
avg_iou = avg_iou / (tp_for_thresh + fp_for_thresh);
int class_id;
for(class_id = 0; class_id < classes; class_id++)
{
if ((tp_for_thresh_per_class[class_id] + fp_for_thresh_per_class[class_id]) > 0)
avg_iou_per_class[class_id] = avg_iou_per_class[class_id] / (tp_for_thresh_per_class[class_id] + fp_for_thresh_per_class[class_id]);
}
5.下面开始计算每个类的AP和mAP。先将detections按降序排好,detections[0]对应所有类别中最大的prob。rank表示置信度的等级,rank = 0时对应的prob最大,而rank = detections_count - 1时prob最小。
再来看一下pr的含义,pr是一个classes × detections_count的数组,pr[class_id][rank]表示第class_id类只考虑prob大于等于第rank级对应的prob的检测结果的pr,也就是prob >= detections[rank].p这样条件下的所有目标的pr情况。所以初始化pr[class_id][rank].tp = pr[class_id][rank - 1].tp,且pr[class_id][rank] >= pr[class_id][rank-1]。因为rank提高了,要求的prob降低了,出现的检测结果不会比之前少,TP和FP也不会降低。最后rank == detections_count - 1时,所有检测的prob都大于这个水平(高于0.005)。truth_flags和之前一样是gt是否匹配了某个检测结果的标志。在每一个检测结果对应的prob上,根据其是否检测到了gt增加TP或FP数,再计算其precision和recall。
qsort(detections, detections_count, sizeof(box_prob), detections_comparator);
// for PR-curve
pr_t** pr = (pr_t**)calloc(classes, sizeof(pr_t*));//pr[classes][detections_count]
for (i = 0; i < classes; ++i)
pr[i] = (pr_t*)calloc(detections_count, sizeof(pr_t));
for (rank = 0; rank < detections_count; ++rank)
{
if (rank > 0)
{
int class_id;
for (class_id = 0; class_id < classes; ++class_id)
{
pr[class_id][rank].tp = pr[class_id][rank - 1].tp;
pr[class_id][rank].fp = pr[class_id][rank - 1].fp;
}
}
box_prob d = detections[rank];
// if (detected && isn't detected before)
if (d.truth_flag == 1)
{
if (truth_flags[d.unique_truth_index] == 0)
{
truth_flags[d.unique_truth_index] = 1;
pr[d.class_id][rank].tp++; // true-positive
}
else
pr[d.class_id][rank].fp++;
}
else
{
pr[d.class_id][rank].fp++; // false-positive
}
for (i = 0; i < classes; ++i)
{
const int tp = pr[i][rank].tp;
const int fp = pr[i][rank].fp;
const int fn = truth_classes_count[i] - tp; // false-negative = objects - true-positive
pr[i][rank].fn = fn;
if ((tp + fp) > 0) pr[i][rank].precision = (double)tp / (double)(tp + fp);
else pr[i][rank].precision = 0;
if ((tp + fn) > 0) pr[i][rank].recall = (double)tp / (double)(tp + fn);
else pr[i][rank].recall = 0;
}
}
6.有了各点的pr情况后,下面就可以计算mAP了。分为两种情况,map_points为0时考虑所有recall点的precision,再累积求和,相当于PR曲线下的面积(注:采用外插方法,每一个recall对应的precision取不小于该recall的所有点中precision的最大值)。由于prob是由高到低排序的,从rank由大到小来看recall是从高到低遍历,对应的precision从低到高。recall是单调下降的,但precision可能有波动,如果随着recall下降precision没上升,则不计算这个点,直到遇到更高的precision才累加。map_points不为0时就更直观了,直接搜索大于recall点的最大precision值。相同数据集 -points 0 要比 -points 101 的mAP高一点。
for (i = 0; i < classes; ++i)
{
double avg_precision = 0;
if (map_points == 0)
{
double last_recall = pr[i][detections_count - 1].recall;
double last_precision = pr[i][detections_count - 1].precision;
for (rank = detections_count - 2; rank >= 0; --rank)
{
double delta_recall = last_recall - pr[i][rank].recall;
last_recall = pr[i][rank].recall;
if (pr[i][rank].precision > last_precision)
last_precision = pr[i][rank].precision;
avg_precision += delta_recall * last_precision;
}
//add remaining area of PR curve when recall isn't 0 at rank-1
double delta_recall = last_recall - 0;
avg_precision += delta_recall * last_precision;
}
// MSCOCO - 101 Recall-points, PascalVOC - 11 Recall-points
else
{
int point;
for (point = 0; point < map_points; ++point)
{
double cur_recall = point * 1.0 / (map_points - 1);
double cur_precision = 0;
for (rank = 0; rank < detections_count; ++rank)
{
if (pr[i][rank].recall >= cur_recall) // > or >=
if (pr[i][rank].precision > cur_precision)
cur_precision = pr[i][rank].precision;
}
avg_precision += cur_precision;
}
avg_precision = avg_precision / map_points;
}
mean_average_precision += avg_precision;
}
直接看代码可能有点抽象,可以结合这篇文章后面的图理解一下。pr数组里每一个元素对应pr图上的一个点。计算过程中始终维护着last_precision这个变量,表示当前见过的最大precision。计算mAP时从右往左遍历这张图,可以想象一个点从右往左划过整个绿线:
(1)向左移动时,delta_recall为移动的水平距离,这时last_precision不变,增加的AP为这段水平距离和最大precision组成的矩形面积(外插);
(2)向上移动时,recall不变,delta_recall = 0,所以AP不增加,但last_precision持续增加,达到下一个最高点。
以上就是个人对mAP函数的一些理解,欢迎交流讨论。