maskrcnn-benchmark-master（九）：box_head的inference文件

最新推荐文章于 2022-12-08 18:51:56 发布

秋名山翻车的

最新推荐文章于 2022-12-08 18:51:56 发布

阅读量533

点赞数

分类专栏：深度学习文章标签：计算机视觉目标检测人工智能深度学习神经网络

本文链接：https://blog.csdn.net/foolishpeng/article/details/119333618

版权

深度学习专栏收录该内容

20 篇文章 10 订阅

订阅专栏

前言

上一个博客已经介绍了box_head的整个网络结构部分，在对box_head进行inference过程就需要一个后处理类，接下来这篇博客将介绍inference文件中的make_roi_box_post_processor()函数和PostProcessor类。

PostProcessor类:因为输入ROI_heads模块的Proposals数目本身就很多，然后在网络分类过程中，一般会给每个Proposal生成对应类别个box，因此最后生成的box会非常的多，所以需要PostProcessor类来挑选并决定最后输出哪些box和类别。

举个栗子：总共是进行10分类，输入ROI_head模块的Proposals数目为100个，就意味着要给每个Proposals的box生成10种偏移量（每个类被一种），那个最后输出的box有100*10=1000个，但是模型不可能最后在一张图片框出1000个框作为输出，所以需要PostProcessor类来对这些box进行筛选。

一、 make_roi_box_post_processor()函数

首先看到make_roi_box_post_processor()函数，它的作用就是获得一个PostProcessor类对象，并返回，我们来看看相关代码：

def make_roi_box_post_processor(cfg):
    use_fpn = cfg.MODEL.ROI_HEADS.USE_FPN

    # 和box编解码相关的参数
    bbox_reg_weights = cfg.MODEL.ROI_HEADS.BBOX_REG_WEIGHTS
    box_coder = BoxCoder(weights=bbox_reg_weights)

    # 设置得分阈值 作为哪些box是否输出的依据
    score_thresh = cfg.MODEL.ROI_HEADS.SCORE_THRESH
    # NMS的阈值（用于去除掉一部分box）
    nms_thresh = cfg.MODEL.ROI_HEADS.NMS
    # 每张图片的检测的最大instance数目
    detections_per_img = cfg.MODEL.ROI_HEADS.DETECTIONS_PER_IMG
    cls_agnostic_bbox_reg = cfg.MODEL.CLS_AGNOSTIC_BBOX_REG
    bbox_aug_enabled = cfg.TEST.BBOX_AUG.ENABLED

    # 生成PostProcessor类对象
    postprocessor = PostProcessor(
        score_thresh,
        nms_thresh,
        detections_per_img,
        box_coder,
        cls_agnostic_bbox_reg,
        bbox_aug_enabled
    )
    # 返回PostProcessor类对象
    return postprocessor

二、PostProcessor类

接下来介绍PostProcessor类，我们首先看一下类的__init__（）函数:

1、init()函数

# inference过程用到的类
class PostProcessor(nn.Module):
    """
    From a set of classification scores, box regression and proposals,
    computes the post-processed boxes, and applies NMS to obtain the
    final results

    从一系列的类别分类得分，边框回归以及proposals中，计算post-processed boxes,
    以及应用NMS得到最后的结果。
    """

    def __init__(
        self,
        score_thresh=0.05,
        nms=0.5,
        detections_per_img=100,
        box_coder=None,
        cls_agnostic_bbox_reg=False,
        bbox_aug_enabled=False
    ):
        """
        Arguments:
            score_thresh (float)
            nms (float)
            detections_per_img (int)
            box_coder (BoxCoder)
        """
        super(PostProcessor, self).__init__()
        # 类别得分阈值
        self.score_thresh = score_thresh
        # nms阈值
        self.nms = nms
        # 一张图片最后检测结果最大输出box数目
        self.detections_per_img = detections_per_img
        if box_coder is None:
            box_coder = BoxCoder(weights=(10., 10., 5., 5.))
        # box编解码器
        self.box_coder = box_coder
        self.cls_agnostic_bbox_reg = cls_agnostic_bbox_reg
        self.bbox_aug_enabled = bbox_aug_enabled

2、forward()函数

接下来看看forward()函数:

    def forward(self, x, boxes):
        """
        Arguments:
            x (tuple[tensor, tensor, tensor]): x contains the class logits   
            and the box_regression from the model.
            boxes (list[BoxList]): bounding boxes that are used as
                reference, one for each image

        Returns:
            results (list[BoxList]): one BoxList for each image, containing
                the extra fields labels and scores
        """
        # 得到box_head结构为每个proposals输出的类别分类结果和box偏移量
        class_logits, box_regression = x
        # 进行一个softmax操作
        class_prob = F.softmax(class_logits, -1)

        # TODO think about a representation of batch of boxes
        # 获取每张图片的size
        image_shapes = [box.size for box in boxes]
        # 获取每一张图片的Proposals数目
        boxes_per_image = [len(box) for box in boxes]
        concat_boxes = torch.cat([a.bbox for a in boxes], dim=0)

        # 这个地方先不用管它
        if self.cls_agnostic_bbox_reg:
            box_regression = box_regression[:, -4:]
        # 给每个Proposal加上偏移量，得到网络微调之后的Proposals
        proposals = self.box_coder.decode(
            box_regression.view(sum(boxes_per_image), -1), concat_boxes
        )
        if self.cls_agnostic_bbox_reg:
            proposals = proposals.repeat(1, class_prob.shape[1])
        # 获取分类类别数（包含了背景类别）
        num_classes = class_prob.shape[1]
        # 按照每张图片的Proposals数进行切分
        # 得到的proposals变量维度就是（batch size, 每张图片的Proposals数）
        proposals = proposals.split(boxes_per_image, dim=0)
        class_prob = class_prob.split(boxes_per_image, dim=0)

        results = []
        # 分别对每一张图片进行操作 因为图片都是按照batch size传入的
        for prob, boxes_per_img, image_shape in zip(
            class_prob, proposals, image_shapes
        ):
            # 将每个加上偏移量（微调）之后的Proposals 按照BoxList的类型进行保存
            # 每个boxlist对象都包含有一张图片中进行ROI_head结构微调过后得到的Proposals
            boxlist = self.prepare_boxlist(boxes_per_img, prob, image_shape)
            boxlist = boxlist.clip_to_image(remove_empty=False)
            if not self.bbox_aug_enabled:  
                # If bbox aug is enabled, we will do it later
                # 最后对这些微调之后的Proposals进行筛选
                boxlist = self.filter_results(boxlist, num_classes)
            results.append(boxlist)
        return results

3、prepare_boxlist()函数

forward()函数中还涉及到了prepare_boxlist()函数和filter_results()函数，其实filter_results()函数才是真正进行筛选的函数，下面首先介绍prepare_boxlist()函数：

    # 这个函数就是对一张图片微调之后得到的proposal（box）信息 
    # 类别分类的scores信息 图片的size信息都是整合到一个BoxList对象中去
    def prepare_boxlist(self, boxes, scores, image_shape):
        """
        Returns BoxList from `boxes` and adds probability scores information
        as an extra field
        `boxes` has shape (#detections, 4 * #classes), where each row represents
        a list of predicted bounding boxes for each of the object classes in the
        dataset (including the background class). The detections in each row
        originate from the same object proposal.
        `scores` has shape (#detection, #classes), where each row represents a list
        of object detection confidence scores for each of the object classes in the
        dataset (including the background class). `scores[i, j]`` corresponds to the
        box at `boxes[i, j * 4:(j + 1) * 4]`.
        """
        boxes = boxes.reshape(-1, 4)
        scores = scores.reshape(-1)
        boxlist = BoxList(boxes, image_shape, mode="xyxy")
        boxlist.add_field("scores", scores)
        return boxlist

4、filter_results()函数

将每一张图片微调之后的Proposals信息、置信度信息、图片尺寸信息都保存在一个BoxList对象当中之后，我们就要通过filter_results()函数来筛选BoxList中的哪些box可以作为结果输出，让我们来看看filter_results()函数的相关代码：

注本文后续就将inference过程进行筛选过后得到box的最终结果叫做instances，中间生成的结果还是称作Proposals。

    def filter_results(self, boxlist, num_classes):
        """Returns bounding-box detection results by thresholding on scores and
        applying non-maximum suppression (NMS).
        """
        # unwrap the boxlist to avoid additional overhead.
        # if we had multi-class NMS, we could perform this directly on the boxlist
        # 将BoxList对象中的box(Proposals)取出来  shape is(Proposals数, 类别数*4)
        boxes = boxlist.bbox.reshape(-1, num_classes * 4)
        # 将BoxList对象中的类别置信度得分取出来   shape is(Proposals数, 类别数)
        scores = boxlist.get_field("scores").reshape(-1, num_classes)

        device = scores.device
        result = []
        # Apply threshold on detection probabilities and apply NMS
        # Skip j = 0, because it's the background class
        # 判断哪些得分大于阈值
        inds_all = scores > self.score_thresh
        # 通过遍历类别进行筛选  index=0为背景所以跳过  从index=1开始
        for j in range(1, num_classes):
            # 获取得分（类别置信度）大于阈值的索引
            inds = inds_all[:, j].nonzero().squeeze(1)
            # 获取当前类别得分大于阈值的索引
            scores_j = scores[inds, j]
            # 获取上面获取的索引  所对应的box信息
            boxes_j = boxes[inds, j * 4 : (j + 1) * 4]
            # 将该类别（第j类别）的类别得分低于阈值的box信息，图片信息都保存在                
            # boxlist_for_class对象中
            boxlist_for_class = BoxList(boxes_j, boxlist.size, mode="xyxy")
            # 给boxlist_for_class添加大于阈值的该类别得分信息
            boxlist_for_class.add_field("scores", scores_j)
            # boxlist_for_class进行NMS操作  操作之后剩余的box都在boxlist_for_class中
            boxlist_for_class = boxlist_nms(
                boxlist_for_class, self.nms
            )
            # 给剩余下来的box添加第j个类别标签
            num_labels = len(boxlist_for_class)
            boxlist_for_class.add_field(
                "labels", torch.full((num_labels,), j, dtype=torch.int64, device=device)
            )
            # 进行保存
            result.append(boxlist_for_class)

        result = cat_boxlist(result)
        number_of_detections = len(result)

        # Limit to max_per_image detections **over all classes**
        # 如果检测得到的总的intances数目（Proposals）要大于参数设定的最大限制数目
        # 通过置信度排序去除掉一部分
        if number_of_detections > self.detections_per_img > 0:
            cls_scores = result.get_field("scores")
            image_thresh, _ = torch.kthvalue(
                cls_scores.cpu(), number_of_detections - self.detections_per_img + 1
            )
            keep = cls_scores >= image_thresh.item()
            keep = torch.nonzero(keep).squeeze(1)
            result = result[keep]
        return result

其实关键的思路就是按照每个类取出满足得分阈值要求的Proposals，然后分别对每个类选出的Proposals进行NMS操作（注意：不是对所有类别选出的Proposals，一起做NMS操作。）。下面用图例进行简单的说明：

至此，box_head的inference文件已经介绍完了该文件主要是用于inference过程，训练过程并未用到该文件。通过文件中的make_roi_box_post_processor()函数生成PostProcessor类对象，对box_head部分预测好的Proposals进行后处理操作，选出最后作为输出的instances。

下一篇将介绍用于box_head训练阶段的loss文件：

maskrcnn-benchmark-master（十）：box_head的loss文件

待续~

码字不易未经许可请勿随意转载！

秋名山翻车的

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
maskrcnn-benchmark-master（九）：box_head的inference文件

前言上一个博客已经介绍了box_head的整个网络结构部分，在对box_head进行inference过程就需要一个后处理类，接下来这篇博客将介绍inference文件中的make_roi_box_post_processor()函数和PostProcessor类。PostProcessor类:因为输入ROI_heads模块的Proposals数目本身就很多，然后在网络分类过程中，一般会给每个Proposal生成对应类别个box，因此最后生成的box会非常的多，所以需要PostProcessor类
复制链接

扫一扫