【Detectron2】详解Detectron2中Mask RCNN的部分代码

Detectron2 专栏收录该内容
7 篇文章 1 订阅

整体来说,Backbone、RPN和Fast RCNN是三个相对独立的模块。Backbone对每张图片产生5 level的特征,并送入RPN。

RPN对送入的特征,首先经过3x3卷积,随后用sibling 1x1卷积产生分类和bbox信息,分类是指该anchor是否包含Object,bbox信息为四维,包括(dx, dy, dw, dh)。初始anchor加上偏移量后用于判断正负或忽略样本,并确定归属的gt instance。然后从中采样256个anchor(正负样本各一半)用于计算损失。最后,通过cls_score排序和NMS筛选出1000个样本,送入Fast RCNN。

Fast RCNN对送入样本重新确定正负样本,并确定归属的gt instance。然后从中采样512个proposals(正负样本比为1:3),送入RoIAlign,根据proposal的w,h确定在哪一层(注意这里不包含P5采样得到的P6层),用对应层的比例缩放proposal,切出ROI。其中包括设置采样点(论文中采样点设4最好,设1效果差不多),双线性插值(根据落在的坐标方格进行插值,双线性就是线性插值两次,第一次先对x坐标插值,第二次对y坐标插值),maxpooling。

用ROI计算classificaiton和bbox regression:和gt计算 softmax cross entropy loss和smooth L1 loss。

需要注意的是,只有在测试时,会在Fast RCNN中使用NMS的选取最后的结果。


如果不选择用cfg初始化模型,则Mask RCNN的初始化代码如下:可以简单的分为三个部分:(1) backbone = FPN(), (2) proposal_generator = RPN(); (3) roi_heads=StandardROIHeads()。

model = GeneralizedRCNN(
    backbone=FPN(
        ResNet(
            BasicStem(3, 64, norm="FrozenBN"),
            ResNet.make_default_stages(50, stride_in_1x1=True, norm="FrozenBN"),
            out_features=["res2", "res3", "res4", "res5"],
        ).freeze(2),
        ["res2", "res3", "res4", "res5"],
        256,
        top_block=LastLevelMaxPool(),
    ),
    proposal_generator=RPN(
        in_features=["p2", "p3", "p4", "p5", "p6"],
        head=StandardRPNHead(in_channels=256, num_anchors=3),
        anchor_generator=DefaultAnchorGenerator(
            sizes=[[32], [64], [128], [256], [512]],
            aspect_ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64],
            offset=0.0,
        ),
        anchor_matcher=Matcher([0.3, 0.7], [0, -1, 1], allow_low_quality_matches=True),
        box2box_transform=Box2BoxTransform([1.0, 1.0, 1.0, 1.0]),
        batch_size_per_image=256,
        positive_fraction=0.5,
        pre_nms_topk=(2000, 1000),
        post_nms_topk=(1000, 1000),
        nms_thresh=0.7,
    ),
    roi_heads=StandardROIHeads(
        num_classes=80,
        batch_size_per_image=512,
        positive_fraction=0.25,
        proposal_matcher=Matcher([0.5], [0, 1], allow_low_quality_matches=False),
        box_in_features=["p2", "p3", "p4", "p5"],
        box_pooler=ROIPooler(7, (1.0 / 4, 1.0 / 8, 1.0 / 16, 1.0 / 32), 0, "ROIAlignV2"),
        box_head=FastRCNNConvFCHead(
            ShapeSpec(channels=256, height=7, width=7), conv_dims=[], fc_dims=[1024, 1024]
        ),
        box_predictor=FastRCNNOutputLayers(
            ShapeSpec(channels=1024),
            test_score_thresh=0.05,
            box2box_transform=Box2BoxTransform((10, 10, 5, 5)),
            num_classes=80,
        ),
        mask_in_features=["p2", "p3", "p4", "p5"],
        mask_pooler=ROIPooler(14, (1.0 / 4, 1.0 / 8, 1.0 / 16, 1.0 / 32), 0, "ROIAlignV2"),
        mask_head=MaskRCNNConvUpsampleHead(
            ShapeSpec(channels=256, width=14, height=14),
            num_classes=80,
            conv_dims=[256, 256, 256, 256, 256],
        ),
    ),
    pixel_mean=[103.530, 116.280, 123.675],
    pixel_std=[1.0, 1.0, 1.0],
    input_format="BGR",
)

以下涉及的符号表示:

  • N: number of images in the minibatch
  • L: number of feature maps per image on which RPN is run
  • A: number of cell anchors (must be the same for all feature maps),指feature map上的每个点产生3种aspect_ratios的anchors
  • Hi, Wi: height and width of the i-th feature map
  • B: size of the box parameterization,指bbox的参数量4

1. Backbone

backbone使用ResNet-FPN,产生["p2", "p3", "p4", "p5", "p6"],共5层特征,每层特征的维度为:[N, C, Hi, Wi]。输出为一个dict,dict的keys是in_features的五个元素。

 

2. RPN

用于初筛anchors,并产生Fast RCNN中使用的proposals。

 proposal_generator=RPN(
        in_features=["p2", "p3", "p4", "p5", "p6"],
        head=StandardRPNHead(in_channels=256, num_anchors=3),
        anchor_generator=DefaultAnchorGenerator(
            sizes=[[32], [64], [128], [256], [512]],
            aspect_ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64],
            offset=0.0,
        ),
        anchor_matcher=Matcher([0.3, 0.7], [0, -1, 1], allow_low_quality_matches=True),
        box2box_transform=Box2BoxTransform([1.0, 1.0, 1.0, 1.0]),
        batch_size_per_image=256,
        positive_fraction=0.5,
        pre_nms_topk=(2000, 1000),
        post_nms_topk=(1000, 1000),
        nms_thresh=0.7,

对应RPN forward()的代码如下:

        features = [features[f] for f in self.in_features]
        anchors = self.anchor_generator(features)

        pred_objectness_logits, pred_anchor_deltas = self.rpn_head(features)
        # Transpose the Hi*Wi*A dimension to the middle:
        pred_objectness_logits = [
            # (N, A, Hi, Wi) -> (N, Hi, Wi, A) -> (N, Hi*Wi*A)
            score.permute(0, 2, 3, 1).flatten(1)
            for score in pred_objectness_logits
        ]
        pred_anchor_deltas = [
            # (N, A*B, Hi, Wi) -> (N, A, B, Hi, Wi) -> (N, Hi, Wi, A, B) -> (N, Hi*Wi*A, B)
            x.view(x.shape[0], -1, self.anchor_generator.box_dim, x.shape[-2], x.shape[-1])
            .permute(0, 3, 4, 1, 2)
            .flatten(1, -2)
            .float()  # ensure fp32 for decoding precision
            for x in pred_anchor_deltas
        ]

        if self.training:
            assert gt_instances is not None, "RPN requires gt_instances in training!"
            gt_labels, gt_boxes = self.label_and_sample_anchors(anchors, gt_instances)
            losses = self.losses(
                anchors, pred_objectness_logits, gt_labels, pred_anchor_deltas, gt_boxes
            )
        else:
            losses = {}
        proposals = self.predict_proposals(
            anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes
        )

可以提炼为以下几个部分:

  1. features = [features[f] for f in self.in_features]: 从backbone取得的图片特征,共有5 level,每个level的特征为[batch_size, channel, w, h]
  2. anchors = self.anchor_generator(features) -> DefaultAnchorGenerator():每个level的feature map对应一种anchor sizes,feature map上的每个点对应三种aspect_ratios。该函数旨在生成所有的anchors。
  3. anchors pred_objectness_logits, pred_anchor_deltas = self.rpn_head(features) -> StandardRPNHead():送入特征,算出是否是目标和(dx, dy, dw, dh)。pred_objectness_logits[in_features] = [N, A, Hi, Wi],pred_objectness_logits[in_features] = [N, A*B, Hi, Wi]
  4. reshape
  5. gt_labels, gt_boxes = self.label_and_sample_anchors(anchors, gt_instances):其中包括self.anchor_matcher(),通过预设的ROI_THRESHOLDS=[0.3, 0.7],会将每个anchors分为正负样本,并返回每个anchors对应的gt_instances。
  6. losses = self.losses() : 随机选取batch_size_per_image=256个anchors,其中positive_fraction=0.5为正样本,0.5为负样本,用作训练
  7. proposals = predict_proposals(anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes)->find_top_rpn_proposals(),该函数用于产生proposals,执行流程包括,根据cls_score选取pre_nms_topk个anchors,根据rpn_bbpx_pred对anchors的location进行微调,获得调整后的bbox,随后根据nms_thresh=0.7执行NMS,然后去除超过边界和过小的anchors,最后根据cls_score选取post_nms_topk个样本
  8. return proposals, losses

 

3. roi_heads = StandardROIHeads()。

roi_heads=StandardROIHeads(
        num_classes=80,
        batch_size_per_image=512,
        positive_fraction=0.25,
        proposal_matcher=Matcher([0.5], [0, 1], allow_low_quality_matches=False),
        box_in_features=["p2", "p3", "p4", "p5"],
        box_pooler=ROIPooler(7, (1.0 / 4, 1.0 / 8, 1.0 / 16, 1.0 / 32), 0, "ROIAlignV2"),
        box_head=FastRCNNConvFCHead(
            ShapeSpec(channels=256, height=7, width=7), conv_dims=[], fc_dims=[1024, 1024]
        ),
        box_predictor=FastRCNNOutputLayers(
            ShapeSpec(channels=1024),
            test_score_thresh=0.05,
            box2box_transform=Box2BoxTransform((10, 10, 5, 5)),
            num_classes=80,
        ),
        mask_in_features=["p2", "p3", "p4", "p5"],
        mask_pooler=ROIPooler(14, (1.0 / 4, 1.0 / 8, 1.0 / 16, 1.0 / 32), 0, "ROIAlignV2"),
        mask_head=MaskRCNNConvUpsampleHead(
            ShapeSpec(channels=256, width=14, height=14),
            num_classes=80,
            conv_dims=[256, 256, 256, 256, 256],
        ),
    ),

1. proposal_matcher = Matcher(),会根据ROI_HEADS.IOU_THRESHOLDS来筛选正负样本,并重新指定GT bbox,Fast RCNN从每张图片种选择512个proposals用于训练,其中0.25是正样本,其余为负样本。

2. 在训练时,Fast RCNN不会使用NMS,只有在测试时,会先通过score筛选掉部分结果后,再用NMS输出结果。

 

 

  • 0
    点赞
  • 0
    评论
  • 3
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

相关推荐
©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值