COCO detection evaluation metric

http://cocodataset.org/#detection-eval

1. Detection Evaluation

This page describes the detection evaluation metrics used by COCO. The evaluation code provided here can be used to obtain results on the publicly available COCO validation set. It computes multiple metrics described below. To obtain results on the COCO test set, for which ground-truth annotations are hidden, generated results must be uploaded to the evaluation server. The exact same evaluation code, described below, is used to evaluate results on the test set.

2. Metrics

The following 12 metrics are used for characterizing the performance of an object detector on COCO:

Average Precision (AP):

AP                % AP at IoU=.50:.05:.95 (primary challenge metric)

APIoU=.50   % AP at IoU=.50 (PASCAL VOC metric)

APIoU=.75    % AP at IoU=.75 (strict metric)

AP Across Scales:

APsmall          % AP for small objects: area < 322

APmedium      % AP for medium objects: 322 < area < 962

APlarge           % AP for large objects: area > 962

Average Recall (AR):

ARmax=1          % AR given 1 detection per image

ARmax=10        % AR given 10 detections per image

ARmax=100      % AR given 100 detections per image

AR Across Scales:

ARsmall             % AR for small objects: area < 322

ARmedium         % AR for medium objects: 322 < area < 962

ARlarge              % AR for large objects: area > 962

 

  1. Unless otherwise specified, AP and AR are averaged over multiple Intersection over Union (IoU) values. Specifically we use 10 IoU thresholds of .50:.05:.95. This is a break from tradition, where AP is computed at a single IoU of .50 (which corresponds to our metric APIoU=.50). Averaging over IoUs rewards detectors with better localization.
  2. AP is averaged over all categories. Traditionally, this is called "mean average precision" (mAP). We make no distinction between AP and mAP (and likewise AR and mAR) and assume the difference is clear from context.
  3. AP (averaged across all 10 IoU thresholds and all 80 categories) will determine the challenge winner. This should be considered the single most important metric when considering performance on COCO.
  4. In COCO, there are more small objects than large objects. Specifically: approximately 41% of objects are small (area < 322), 34% are medium (322 < area < 962), and 24% are large (area > 962). Area is measured as the number of pixels in the segmentation mask.
  5. AR is the maximum recall given a fixed number of detections per image, averaged over categories and IoUs. AR is related to the metric of the same name used in proposal evaluation but is computed on a per-category basis.
  6. All metrics are computed allowing for at most 100 top-scoring detections per image (across all categories).
  7. The evaluation metrics for detection with bounding boxes and segmentation masks are identical in all respects except for the IoU computation (which is performed over boxes or masks, respectively).

3. Evaluation Code

Evaluation code is available on the COCO github. Specifically, see either CocoEval.m or cocoeval.py in the Matlab or Python code, respectively. Also see evalDemo in either the Matlab or Python code (demo). Before running the evaluation code, please prepare your results in the format described on the results format page.

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值