前言
上一个博客已经介绍了box_head的整个网络结构部分,在对box_head进行inference过程就需要一个后处理类,接下来这篇博客将介绍inference文件中的make_roi_box_post_processor()函数和PostProcessor类。
PostProcessor类:因为输入ROI_heads模块的Proposals数目本身就很多,然后在网络分类过程中,一般会给每个Proposal生成对应类别个box,因此最后生成的box会非常的多,所以需要PostProcessor类来挑选并决定最后输出哪些box和类别。
举个栗子:总共是进行10分类,输入ROI_head模块的Proposals数目为100个,就意味着要给每个Proposals的box生成10种偏移量(每个类被一种),那个最后输出的box有100*10=1000个,但是模型不可能最后在一张图片框出1000个框作为输出,所以需要PostProcessor类来对这些box进行筛选。
一、 make_roi_box_post_processor()函数
首先看到make_roi_box_post_processor()函数,它的作用就是获得一个PostProcessor类对象,并返回,我们来看看相关代码:
def make_roi_box_post_processor(cfg):
use_fpn = cfg.MODEL.ROI_HEADS.USE_FPN
# 和box编解码相关的参数
bbox_reg_weights = cfg.MODEL.ROI_HEADS.BBOX_REG_WEIGHTS
box_coder = BoxCoder(weights=bbox_reg_weights)
# 设置得分阈值 作为哪些box是否输出的依据
score_thresh = cfg.MODEL.ROI_HEADS.SCORE_THRESH
# NMS的阈值(用于去除掉一部分box)
nms_thresh = cfg.MODEL.ROI_HEADS.NMS
# 每张图片的检测的最大instance数目
detections_per_img = cfg.MODEL.ROI_HEADS.DETECTIONS_PER_IMG
cls_agnostic_bbox_reg = cfg.MODEL.CLS_AGNOSTIC_BBOX_REG
bbox_aug_enabled = cfg.TEST.BBOX_AUG.ENABLED
# 生成PostProcessor类对象
postprocessor = PostProcessor(
score_thresh,
nms_thresh,
detections_per_img,
box_coder,
cls_agnostic_bbox_reg,
bbox_aug_enabled
)
# 返回PostProcessor类对象
return postprocessor
二、PostProcessor类
接下来介绍PostProcessor类,我们首先看一下类的__init__()函数:
1、__init__()函数
# inference过程用到的类
class PostProcessor(nn.Module):
"""
From a set of classification scores, box regression and proposals,
computes the post-processed boxes, and applies NMS to obtain the
final results
从一系列的类别分类得分,边框回归以及proposals中,计算post-processed boxes,
以及应用NMS得到最后的结果。
"""
def __init__(
self,
score_thresh=0.05,
nms=0.5,
detections_per_img=100,
box_coder=None,
cls_agnostic_bbox_reg=False,
bbox_aug_enabled=False
):
"""
Arguments:
score_thresh (float)
nms (float)
detections_per_img (int)
box_coder (BoxCoder)
"""
super(PostProcessor, self).__init__()
# 类别得分阈值
self.score_thresh = score_thresh
# nms阈值
self.nms = nms
# 一张图片最后检测结果最大输出box数目
self.detections_per_img = detections_per_img
if box_coder is None:
box_coder = BoxCoder(weights=(10., 10., 5., 5.))
# box编解码器
self.box_coder = box_coder
self.cls_agnostic_bbox_reg = cls_agnostic_bbox_reg
self.bbox_aug_enabled = bbox_aug_enabled
2、forward()函数
接下来看看forward()函数:
def forward(self, x, boxes):
"""
Arguments:
x (tuple[tensor, tensor, tensor]): x contains the class logits
and the box_regression from the model.
boxes (list[BoxList]): bounding boxes that are used as
reference, one for each image
Returns:
results (list[BoxList]): one BoxList for each image, containing
the extra fields labels and scores
"""
# 得到box_head结构为每个proposals输出的类别分类结果和box偏移量
class_logits, box_regression = x
# 进行一个softmax操作
class_prob = F.softmax(class_logits, -1)
# TODO think about a representation of batch of boxes
# 获取每张图片的size
image_shapes = [box.size for box in boxes]
# 获取每一张图片的Proposals数目
boxes_per_image = [len(box) for box in boxes]
concat_boxes = torch.cat([a.bbox for a in boxes], dim=0)
# 这个地方先不用管它
if self.cls_agnostic_bbox_reg:
box_regression = box_regression[:, -4:]
# 给每个Proposal加上偏移量,得到网络微调之后的Proposals
proposals = self.box_coder.decode(
box_regression.view(sum(boxes_per_image), -1), concat_boxes
)
if self.cls_agnostic_bbox_reg:
proposals = proposals.repeat(1, class_prob.shape[1])
# 获取分类类别数(包含了背景类别)
num_classes = class_prob.shape[1]
# 按照每张图片的Proposals数进行切分
# 得到的proposals变量维度就是(batch size, 每张图片的Proposals数)
proposals = proposals.split(boxes_per_image, dim=0)
class_prob = class_prob.split(boxes_per_image, dim=0)
results = []
# 分别对每一张图片进行操作 因为图片都是按照batch size传入的
for prob, boxes_per_img, image_shape in zip(
class_prob, proposals, image_shapes
):
# 将每个加上偏移量(微调)之后的Proposals 按照BoxList的类型进行保存
# 每个boxlist对象都包含有一张图片中进行ROI_head结构微调过后得到的Proposals
boxlist = self.prepare_boxlist(boxes_per_img, prob, image_shape)
boxlist = boxlist.clip_to_image(remove_empty=False)
if not self.bbox_aug_enabled:
# If bbox aug is enabled, we will do it later
# 最后对这些微调之后的Proposals进行筛选
boxlist = self.filter_results(boxlist, num_classes)
results.append(boxlist)
return results
3、prepare_boxlist()函数
forward()函数中还涉及到了prepare_boxlist()函数和filter_results()函数,其实filter_results()函数才是真正进行筛选的函数,下面首先介绍prepare_boxlist()函数:
# 这个函数就是对一张图片微调之后得到的proposal(box)信息
# 类别分类的scores信息 图片的size信息都是整合到一个BoxList对象中去
def prepare_boxlist(self, boxes, scores, image_shape):
"""
Returns BoxList from `boxes` and adds probability scores information
as an extra field
`boxes` has shape (#detections, 4 * #classes), where each row represents
a list of predicted bounding boxes for each of the object classes in the
dataset (including the background class). The detections in each row
originate from the same object proposal.
`scores` has shape (#detection, #classes), where each row represents a list
of object detection confidence scores for each of the object classes in the
dataset (including the background class). `scores[i, j]`` corresponds to the
box at `boxes[i, j * 4:(j + 1) * 4]`.
"""
boxes = boxes.reshape(-1, 4)
scores = scores.reshape(-1)
boxlist = BoxList(boxes, image_shape, mode="xyxy")
boxlist.add_field("scores", scores)
return boxlist
4、filter_results()函数
将每一张图片微调之后的Proposals信息、置信度信息、图片尺寸信息都保存在一个BoxList对象当中之后,我们就要通过filter_results()函数来筛选BoxList中的哪些box可以作为结果输出,让我们来看看filter_results()函数的相关代码:
注本文后续就将inference过程进行筛选过后得到box的最终结果叫做instances,中间生成的结果还是称作Proposals。
def filter_results(self, boxlist, num_classes):
"""Returns bounding-box detection results by thresholding on scores and
applying non-maximum suppression (NMS).
"""
# unwrap the boxlist to avoid additional overhead.
# if we had multi-class NMS, we could perform this directly on the boxlist
# 将BoxList对象中的box(Proposals)取出来 shape is(Proposals数, 类别数*4)
boxes = boxlist.bbox.reshape(-1, num_classes * 4)
# 将BoxList对象中的类别置信度得分取出来 shape is(Proposals数, 类别数)
scores = boxlist.get_field("scores").reshape(-1, num_classes)
device = scores.device
result = []
# Apply threshold on detection probabilities and apply NMS
# Skip j = 0, because it's the background class
# 判断哪些得分大于阈值
inds_all = scores > self.score_thresh
# 通过遍历类别进行筛选 index=0为背景所以跳过 从index=1开始
for j in range(1, num_classes):
# 获取得分(类别置信度)大于阈值的索引
inds = inds_all[:, j].nonzero().squeeze(1)
# 获取当前类别得分大于阈值的索引
scores_j = scores[inds, j]
# 获取上面获取的索引 所对应的box信息
boxes_j = boxes[inds, j * 4 : (j + 1) * 4]
# 将该类别(第j类别)的类别得分低于阈值的box信息,图片信息都保存在
# boxlist_for_class对象中
boxlist_for_class = BoxList(boxes_j, boxlist.size, mode="xyxy")
# 给boxlist_for_class添加大于阈值的该类别得分信息
boxlist_for_class.add_field("scores", scores_j)
# boxlist_for_class进行NMS操作 操作之后剩余的box都在boxlist_for_class中
boxlist_for_class = boxlist_nms(
boxlist_for_class, self.nms
)
# 给剩余下来的box添加第j个类别标签
num_labels = len(boxlist_for_class)
boxlist_for_class.add_field(
"labels", torch.full((num_labels,), j, dtype=torch.int64, device=device)
)
# 进行保存
result.append(boxlist_for_class)
result = cat_boxlist(result)
number_of_detections = len(result)
# Limit to max_per_image detections **over all classes**
# 如果检测得到的总的intances数目(Proposals)要大于参数设定的最大限制数目
# 通过置信度排序去除掉一部分
if number_of_detections > self.detections_per_img > 0:
cls_scores = result.get_field("scores")
image_thresh, _ = torch.kthvalue(
cls_scores.cpu(), number_of_detections - self.detections_per_img + 1
)
keep = cls_scores >= image_thresh.item()
keep = torch.nonzero(keep).squeeze(1)
result = result[keep]
return result
其实关键的思路就是按照每个类取出满足得分阈值要求的Proposals,然后分别对每个类选出的Proposals进行NMS操作(注意:不是对所有类别选出的Proposals,一起做NMS操作。)。下面用图例进行简单的说明:
至此,box_head的inference文件已经介绍完了该文件主要是用于inference过程,训练过程并未用到该文件。通过文件中的make_roi_box_post_processor()函数生成PostProcessor类对象,对box_head部分预测好的Proposals进行后处理操作,选出最后作为输出的instances。
下一篇将介绍用于box_head训练阶段的loss文件:
maskrcnn-benchmark-master(十):box_head的loss文件
待续~
码字不易 未经许可 请勿随意转载!