前言
目标检测的后处理部分
本博客的讲解代码来源:https://github.com/open-mmlab/mmdetection
路径:mmdet/models/dense_heads/base_dense_head.py
后处理部分
整个后处理的部分可概括为如下部分:
-
1.预测输出的分类分数是没经过sigmod的(因为sigmod是在计算损失的时候加上),所以要先将分类分数进行sigmod。
-
2.第一次通过低分数阈值0.05来筛选质量低的预测值,选择给定的top-k个数量值(一般给1000),通过分类分数进行top-k个框的选择。如果有质量评估分数,一样要筛选它。
-
3.将预测的bboxes通过分配的anchor进行解码,将预测输出deltaxyhw转换为预测的bbox。
-
4.将bboxes进行rescale:因为预测的是resize大小的输入图像,所以这里要还原回去。
-
5.进行NMS:如果有预测输出质量评估分数,那么进行nms预测时要将分类分数*质量评估分数作为排序的标准。这里会获得第二次筛选的输出bboxes和labels。一幅图像最多输出100个预测目标。
1.获取bbox:_get_bboxes_single
前面将预测的分类分数,回归值和质量评估分数的维度由[N,C,H,W]变为了[N*C,H,W]
def _get_bboxes_single(self,
cls_score_list,
bbox_pred_list,
score_factor_list,
mlvl_priors,
img_meta,
cfg,
rescale=False,
with_nms=True,
**kwargs):
if score_factor_list[0] is None: # panduan
# e.g. Retina, FreeAnchor, etc.
with_score_factors = False
else:
# e.g. FCOS, PAA, ATSS, etc.
with_score_factors = True
cfg = self.test_cfg if cfg is None else cfg
img_shape = img_meta['img_shape']
nms_pre = cfg.get('nms_pre', -1)
mlvl_bboxes = []
mlvl_scores = []
mlvl_labels = []
if with_score_factors:
mlvl_score_factors = []
else:
mlvl_score_factors = None
for level_idx, (cls_score, bbox_pred, score_factor, priors) in \
enumerate(zip(cls_score_list, bbox_pred_list,
score_factor_list, mlvl_priors)):
assert cls_score.size()[-2:] == bbox_pred.size()[-2:] # 判断特征图大小
bbox_pred = bbox_pred.permute(1, 2, 0).reshape(-1, 4) # [N,4]
if with_score_factors:
score_factor = score_factor.permute(1, 2,0).reshape(-1).sigmoid()
cls_score = cls_score.permute(1, 2, 0).reshape(-1, self.cls_out_channels)
if self.use_sigmoid_cls:
scores = cls_score.sigmoid() # 进行sigmod
else:
# remind that we set FG labels to [0, num_class-1]
# since mmdet v2.0
# BG cat_id: num_class
scores = cls_score.softmax(-1)[:, :-1]
# After https://github.com/open-mmlab/mmdetection/pull/6268/,
# this operation keeps fewer bboxes under the same `nms_pre`.
# There is no difference in performance for most models. If you
# find a slight drop in performance, you can set a larger
# `nms_pre` than before.
results = filter_scores_and_topk( # 先通过阈值0.05筛选,找到给定的TOP-K数量的scores, labels, keep_idxs ,filtered_results
scores, cfg.score_thr, nms_pre, # 第一次筛选
dict(bbox_pred=bbox_pred, priors=priors))
scores, labels, keep_idxs, filtered_results = results
bbox_pred = filtered_results['bbox_pred'] # zhengyangben
priors = filtered_results['priors'] # dian
if with_score_factors:
score_factor = score_factor[keep_idxs]
bboxes = self.bbox_coder.decode(
priors, bbox_pred, max_shape=img_shape) # 解码->bbox
mlvl_bboxes.append(bboxes)
mlvl_scores.append(scores)
mlvl_labels.append(labels)
if with_score_factors:
mlvl_score_factors.append(score_factor)
return self._bbox_post_process(mlvl_scores, mlvl_labels, mlvl_bboxes,
img_meta['scale_factor'], cfg, rescale,
with_nms, mlvl_score_factors, **kwargs)
第一次过滤,获取了scores, labels, keep_idxs, filtered_results 。filtered_results字典文件包含bbox_pred,priors。
def filter_scores_and_topk(scores, score_thr, topk, results=None):
valid_mask = scores > score_thr #(num_bboxes, c)bool类型
scores = scores[valid_mask] # 变成一维了(N,)
valid_idxs = torch.nonzero(valid_mask) #torch.nonzero()返回非零元素索引坐标(X,Y),Y其实就是代表了LABEL
num_topk = min(topk, valid_idxs.size(0)) #返回最小的那个
# torch.sort is actually faster than .topk (at least on GPUs)
scores, idxs = scores.sort(descending=True) #将过滤的分类分数从大到小排序,返回值和对应的索引
scores = scores[:num_topk]
topk_idxs = valid_idxs[idxs[:num_topk]] #从对应索引中找出TOP—K的索引
keep_idxs, labels = topk_idxs.unbind(dim=1) #将二维才成两个一维的
filtered_results = None
if results is not None: #results:dict(bbox_pred=bbox_pred, priors=priors)
if isinstance(results, dict): #通过索引值筛选出BOX pred 和样本点
filtered_results = {k: v[keep_idxs] for k, v in results.items()}
elif isinstance(results, list):
filtered_results = [result[keep_idxs] for result in results]
elif isinstance(results, torch.Tensor):
filtered_results = results[keep_idxs]
else:
raise NotImplementedError(f'Only supports dict or list or Tensor, '
f'but get {type(results)}.')
return scores, labels, keep_idxs, filtered_results
N为筛选的数量
2.后处理:_bbox_post_process
代码如下(示例):
def _bbox_post_process(self,
mlvl_scores,
mlvl_labels,
mlvl_bboxes,
scale_factor,
cfg,
rescale=False,
with_nms=True,
mlvl_score_factors=None,
**kwargs):
"""bbox post-processing method.
Args:
mlvl_scores (list[Tensor]): Box scores from all scale
levels of a single image, each item has shape
(N, ).
mlvl_labels (list[Tensor]): Box class labels from all scale
levels of a single image, each item has shape
(N, ).
mlvl_bboxes (list[Tensor]): Decoded bboxes from all scale
levels of a single image, each item has shape (N, 4).
"""
assert len(mlvl_scores) == len(mlvl_bboxes) == len(mlvl_labels) #
mlvl_bboxes = torch.cat(mlvl_bboxes) #每张图片的每层
if rescale: # 将bboxes还原回原图的大小
mlvl_bboxes /= mlvl_bboxes.new_tensor(scale_factor)
mlvl_scores = torch.cat(mlvl_scores)
mlvl_labels = torch.cat(mlvl_labels)
if mlvl_score_factors is not None:
# TODO: Add sqrt operation in order to be consistent with
# the paper.
mlvl_score_factors = torch.cat(mlvl_score_factors)
mlvl_scores = mlvl_scores * mlvl_score_factors # 分数*质量,联合起来作为NMS
if with_nms:
if mlvl_bboxes.numel() == 0:
det_bboxes = torch.cat([mlvl_bboxes, mlvl_scores[:, None]], -1)
return det_bboxes, mlvl_labels
det_bboxes, keep_idxs = batched_nms(mlvl_bboxes, mlvl_scores,
mlvl_labels, cfg.nms) # NMS
det_bboxes = det_bboxes[:cfg.max_per_img] #100
det_labels = mlvl_labels[keep_idxs][:cfg.max_per_img]
return det_bboxes, det_labels
else:
return mlvl_bboxes, mlvl_scores, mlvl_labels
总结
总的来说,后处理部分就是将预测输出的值转换回原图上的det_bboxes, det_labels。若要将det_bboxes, det_labels跟原图目标匹配上进行可视化后面还要继续处理。整个推理输出结果过程将在后面的image_demo解析。