计算机视觉-目标检测任务常用评价指标
呐,这边笔记写的是目标检测文章中典型的评测指标mAP(精度)和FPS(速度),以及mAP的具体PyTorch版本实现。Enjoy---------------------------😃
1. mAP(mean average precision)
-
什么是mAP?
mAP就是平均精确度均值,对于mAP而言,他是针对一整个数据集中存在的所有类别的目标而言的;而AP仅针对数据集中的某一个类别而言的,mAP就是对于数据集中各个类别下的P求均值。
具体的,如果我们要对COCO数据集中的person类求P,那必须要逐张图片进行遍历。首先要规定下一个IOU阈值用来划分某一张测试图片中预测到person类下的所有pred bbox是TP还是FP,(可以看到,对于每一张图片中,我们针对person类也可以根据划分出的TP,FP,进而计算这张图中的preson类的Precision和Recall,但是在COCO数据集中,我们不这样做,这样为每张图片计算Precision,Recall,然后再计算person类的AP的方式被用在Pascal VOC数据集中),然后进行下一张图片的样本划分,就这样逐张图片统计person的正样本检测框和FP检测框,最后统计完事后,统一计算person类的P。进一步的,如果依次求出所有类别的P,然后再求平均,就得到了AP(mAP,在COCO数据集中AP与mAP区分的不是很明显)。
-
AP50是个啥?
顾名思义,AP50就是样本划分标准(IOU阈值)为0.5时,先求出各个类别的P,然后再求数据集的AP。
-
AP75是个啥?
顾名思义,AP50就是样本划分标准(IOU阈值)为0.75时,先求出各个类别的P,然后再求数据集的AP。
-
AP@[0.5:0.95]又是啥?
AP@[0.5:0.95]是[AP50,AP55,AP60,AP65,AP70,AP75,AP80,AP85,AP90,AP95]的进一步平均。
-
-
怎么计算mAP?
- 先选定一个类别 C C C,再按照固定IOU阈值划分每张图片中类别 C C C的物体的TP和FP,然后遍历图片,累加TP和FP,等所有图片遍历完成后,计算类别 C C C下的精确度P;按照这样的方式继续计算下一个类别的精确度,等所有类别的精确度都计算完成后,最后求一次平均,就得到了某个阈值下的AP,比如AP50。
-
如何实现mAP?
import torch from collections import Counter def intersection_over_union(boxes_preds, boxes_labels, box_format="midpoint"): """ Calculates intersection over union Parameters: boxes_preds (tensor): Predictions of Bounding Boxes (BATCH_SIZE, 4) boxes_labels (tensor): Correct Labels of Boxes (BATCH_SIZE, 4) box_format (str): midpoint/corners, if boxes (x,y,w,h) or (x1,y1,x2,y2) Returns: tensor: Intersection over union for all examples """ # Slicing idx:idx+1 in order to keep tensor dimensionality # Doing ... in indexing if there would be additional dimensions # Like for Yolo algorithm which would have (N, S, S, 4) in shape if box_format == "midpoint": box1_x1 = boxes_preds[..., 0:1] - boxes_preds[..., 2:3] / 2 box1_y1 = boxes_preds[..., 1:2] - boxes_preds[..., 3:4] / 2 box1_x2 = boxes_preds[..., 0:1] + boxes_preds[..., 2:3] / 2 box1_y2 = boxes_preds[..., 1:2] + boxes_preds[..., 3:4] / 2 box2_x1 = boxes_labels[..., 0:1] - boxes_labels[..., 2:3] / 2 box2_y1 = boxes_labels[..., 1:2] - boxes_labels[..., 3:4] / 2 box2_x2 = boxes_labels[..., 0:1] + boxes_labels[..., 2:3] / 2 box2_y2 = boxes_labels[..., 1:2] + boxes_labels[..., 3:4] / 2 elif box_format == "corners": box1_x1 = boxes_preds[..., 0:1] box1_y1 = boxes_preds[..., 1:2] box1_x2 = boxes_preds[..., 2:3] box1_y2 = boxes_preds[..., 3:4] box2_x1 = boxes_labels[..., 0:1] box2_y1 = boxes_labels[..., 1:2] box2_x2 = boxes_labels[..., 2:3] box2_y2 = boxes_labels[..., 3:4] x1 = torch.max(box1_x1, box2_x1) y1 = torch.max(box1_y1, box2_y1) x2 = torch.min(box1_x2, box2_x2) y2 = torch.min(box1_y2, box2_y2) # Need clamp(0) in case they do not intersect, then we want intersection to be 0 intersection = (x2 - x1).clamp(0) * (y2 - y1).clamp(0) ##处理两个框相离的特殊情况 box1_area = abs((box1_x2 - box1_x1) * (box1_y2 - box1_y1)) box2_area = abs((box2_x2 - box2_x1) * (box2_y2 - box2_y1)) return intersection / (box1_area + box2_area - intersection + 1e-6) def mean_average_precision( pred_boxes, true_boxes, iou_threshold=0.5, box_format="midpoint", num_classes=20 ): """ Calculates mean average precision Parameters: pred_boxes (list): list of lists containing all bboxes with each bboxes specified as [train_idx, class_prediction, prob_score, x1, y1, x2, y2] true_boxes (list): Similar as pred_boxes except all the correct ones iou_threshold (float): threshold where predicted bboxes is correct box_format (str): "midpoint" or "corners" used to specify bboxes num_classes (int): number of classes Returns: float: mAP value across all classes given a specific IoU threshold """ # list storing all AP for respective classes average_precisions = [] # used for numerical stability later on epsilon = 1e-6 for c in range(num_classes): detections = [] ground_truths = [] # Go through all predictions and targets, # and only add the ones that belong to the # current class c for detection in pred_boxes: if detection[1] == c: detections.append(detection) for true_box in true_boxes: if true_box[1] == c: ground_truths.append(true_box) # find the amount of bboxes for each training example # Counter here finds how many ground truth bboxes we get # for each training example, so let's say img 0 has 3, # img 1 has 5 then we will obtain a dictionary with: # amount_bboxes = {0:3, 1:5} amount_bboxes = Counter([gt[0] for gt in ground_truths]) # We then go through each key, val in this dictionary # and convert to the following (w.r.t same example): # ammount_bboxes = {0:torch.tensor[0,0,0], 1:torch.tensor[0,0,0,0,0]} for key, val in amount_bboxes.items(): amount_bboxes[key] = torch.zeros(val) # sort by box probabilities which is index 2 detections.sort(key=lambda x: x[2], reverse=True) TP = torch.zeros((len(detections))) FP = torch.zeros((len(detections))) total_true_bboxes = len(ground_truths) # If none exists for this class then we can safely skip if total_true_bboxes == 0: continue for detection_idx, detection in enumerate(detections): # Only take out the ground_truths that have the same # training idx as detection ground_truth_img = [ bbox for bbox in ground_truths if bbox[0] == detection[0] ] num_gts = len(ground_truth_img) best_iou = 0 for idx, gt in enumerate(ground_truth_img): iou = intersection_over_union( torch.tensor(detection[3:]), torch.tensor(gt[3:]), box_format=box_format, ) if iou > best_iou: best_iou = iou best_gt_idx = idx if best_iou > iou_threshold: # only detect ground truth detection once if amount_bboxes[detection[0]][best_gt_idx] == 0: # true positive and add this bounding box to seen TP[detection_idx] = 1 amount_bboxes[detection[0]][best_gt_idx] = 1 else: FP[detection_idx] = 1 # if IOU is lower then the detection is a false positive else: FP[detection_idx] = 1 TP_cumsum = torch.cumsum(TP, dim=0) FP_cumsum = torch.cumsum(FP, dim=0) recalls = TP_cumsum / (total_true_bboxes + epsilon) precisions = TP_cumsum / (TP_cumsum + FP_cumsum + epsilon) precisions = torch.cat((torch.tensor([1]), precisions)) recalls = torch.cat((torch.tensor([0]), recalls)) # torch.trapz for numerical integration average_precisions.append(torch.trapz(precisions, recalls)) return sum(average_precisions) / len(average_precisions)
2. 速度指标(FPS)
除了检测准确度,目标检测算法的另外一个重要性能指标是速度,只有速度快,才能实现实时检测,这对一些应用场景极其重要。评估速度的常用指标是每秒帧率(Frame Per Second,FPS),即每秒内可以处理的图片数量。当然要对比FPS,你需要在同一硬件上进行。另外也可以使用处理一张图片所需时间来评估检测速度,时间越短,速度越快。