faster-rcnn mmdetection版注释

参考(这两个写得很好):

一文读懂Faster RCNN - 知乎

https://szukevin.site/2022/01/08/mmdetection%E4%B9%8BFaster-RCNN%E6%B3%A8%E9%87%8A%E8%AF%A6%E8%A7%A3/

 深入浅出理解Faster R-CNN - 知乎 (zhihu.com)

1. extract_feat

从图片中提取feature map

得到的x包括5个tuple,代表5个不同尺度的特征图,5个tuple的shape分别为:

x[0](2,256,16,32)
x[1](2,256,8,16)
x[2](2,256,4,8)
x[3](2,256,2,4)
x[4](2,256,1,2)

注意:维度的第一个数代表batch_size;第二个数为通道数,是在config中确定的;第三-四个数为h和w(如何对应是不知道的),h和w是不固定的,因为输入图像尺度是不固定的。

2. RPN forward and loss

2.1 计算多尺度特征图的分类和回归输出

在base_dense_head.py函数中

def forward_train(self,
                  x,
                  img_metas,
                  gt_bboxes,
                  gt_labels=None,
                  gt_bboxes_ignore=None,
                  proposal_cfg=None,
                  **kwargs):
    """ Args: x (list[Tensor]): Features from FPN. img_metas (list[dict]): Meta information of each image, e.g., image size, scaling factor, etc. gt_bboxes (Tensor): Ground truth bboxes of the image, shape (num_gts, 4). gt_labels (Tensor): Ground truth labels of each box, shape (num_gts,). gt_bboxes_ignore (Tensor): Ground truth bboxes to be ignored, shape (num_ignored_gts, 4). proposal_cfg (mmcv.Config): Test / postprocessing configuration, if None, test_cfg would be used Returns: tuple: losses: (dict[str, Tensor]): A dictionary of loss components. proposal_list (list[Tensor]): Proposals of each image. """    
    outs = self(x)
    # 进行而分类,gt_label=None     
    if gt_labels is None:
        loss_inputs = outs + (gt_bboxes, img_metas)
    else:
        loss_inputs = outs + (gt_bboxes, gt_labels, img_metas)
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
    if proposal_cfg is None:
        return losses
    else:
        proposal_list = self.get_bboxes(*outs, img_metas, cfg=proposal_cfg)
        return losses, proposal_list

其中,self(x) 调用了 base_dense_head 的 forward 函数,然后调用 anchor_head 的 forward,然后 rpn_head 又重写了 forward_single 这个函数,所以最终是通过 rpn_head 的 forward_single 函数将每一个 FPN 后的特征图通过一个共享的卷积和两个分类回归分支得到输出。

outs包括2个tuple,第一个是rpn_cls_score,第二个是rpn_bbox_pred;每一个tuple中,有5个list,对应着x中的5个不同尺度的特征图。

outs的维度分别为:

rpn_cls_scorerpn_bbox_pred
outs[0][0]:2,3,16,32outs[1][0]:2,12,16,32
outs[0][1]:2,3,8,16outs[2][0]:2,12,8,16
outs[0][2]2,3,4,8outs[3][0]:2,12,4,8
outs[0][3]2,3,2,4outs[4][0]:2,12,2,4
outs[0][4]2,3,1,2outs[5][0]:2,12,1,2

2.2. 计算损失

进的是rpn_head.py中的loss函数

def loss(self,
         cls_scores,
         bbox_preds,
         gt_bboxes,
         img_metas,
         gt_bboxes_ignore=None):
    """Compute losses of the head. Args: cls_scores (list[Tensor]): Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W) bbox_preds (list[Tensor]): Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W) gt_bboxes (list[Tensor]): Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format. img_metas (list[dict]): Meta information of each image, e.g., image size, scaling factor, etc. gt_bboxes_ignore (None | list[Tensor]): specify which bounding boxes can be ignored when computing the loss. Returns: dict[str, Tensor]: A dictionary of loss components. """
    # 调用 anchor_head 的 loss 函数得到 box loss 和二分类 loss     
   losses = super(RPNHead, self).loss(
        cls_scores,
        bbox_preds,
        gt_bboxes,
        None,
        img_metas,
        gt_bboxes_ignore=gt_bboxes_ignore)
    return dict(
        loss_rpn_cls=losses['loss_cls'], loss_rpn_bbox=losses['loss_bbox'])

然后进anchor_head.py中的loss函数

    def loss(self,
             cls_scores,
             bbox_preds,
             gt_bboxes,
             gt_labels,
             img_metas,
             gt_bboxes_ignore=None):
        """Compute losses of the head.
        ------涉及到anchor的生成,anchor target 的生成,损失函数----------
        Args:
            cls_scores (list[Tensor]): Box scores for each scale level
                Has shape (N, num_anchors * num_classes, H, W)
            bbox_preds (list[Tensor]): Box energies / deltas for each scale
                level with shape (N, num_anchors * 4, H, W)  定位框的预测值
            gt_bboxes (list[Tensor]): Ground truth bboxes for each image with
                shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format. 定位框的label值
            gt_labels (list[Tensor]): class indices corresponding to each box
            img_metas (list[dict]): Meta information of each image, e.g.,
                image size, scaling factor, etc.
            gt_bboxes_ignore (None | list[Tensor]): specify which bounding
                boxes can be ignored when computing the loss. Default: None

        Returns:
            dict[str, Tensor]: A dictionary of loss components.
        """
        featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
        assert len(featmap_sizes) == self.prior_generator.num_levels  # 这里 num_levels 得改成3

        device = cls_scores[0].device

        # anchor_list 是每张图多个尺度的anchor num_imgs  -> num_levels ->[num_anchors,4]
        anchor_list, valid_flag_list = self.get_anchors(
            featmap_sizes, img_metas, device=device)
        label_channels = self.cls_out_channels if self.use_sigmoid_cls else 1
        # 生成anchor的target,包括anchor的label,weight,bbox target
        # cls_reg_targets 6维 tuple:
        #                 - labels_list (list[Tensor]): Labels of each level.
        #                 - label_weights_list (list[Tensor]): Label weights of each level.
        #                 - bbox_targets_list (list[Tensor]): BBox targets of each level.
        #                 - bbox_weights_list (list[Tensor]): BBox weights of each level.
        #                 - num_total_pos (int): Number of positive samples in all  images.
        #                 - num_total_neg (int): Number of negative samples in all images.
        # num_levels -> num_images ,num_anchors,4
        cls_reg_targets = self.get_targets(
            anchor_list,
            valid_flag_list,
            gt_bboxes,
            img_metas,
            gt_bboxes_ignore_list=gt_bboxes_ignore,
            gt_labels_list=gt_labels,
            label_channels=label_channels)
        if cls_reg_targets is None:
            return None
        (labels_list, label_weights_list, bbox_targets_list, bbox_weights_list,
         num_total_pos, num_total_neg) = cls_reg_targets
        num_total_samples = (
            num_total_pos + num_total_neg if self.sampling else num_total_pos)

        # anchor number of multi levels
        num_level_anchors = [anchors.size(0) for anchors in anchor_list[0]]
        # concat all level anchors and flags to a single tensor
        concat_anchor_list = []
        for i in range(len(anchor_list)):
            concat_anchor_list.append(torch.cat(anchor_list[i]))
        all_anchor_list = images_to_levels(concat_anchor_list,
                                           num_level_anchors)

        losses_cls, losses_bbox = multi_apply(
            self.loss_single,
            cls_scores, #  每个框的预测得分 num_levels ->N, num_anchors * num_classes, H, W
            bbox_preds, #  预测框的位置  num_levels -> N, num_anchors * 4, H, W
            all_anchor_list, # [batch,num_anchors,4]
            labels_list, # Labels of each level [batch,num_anchors] pos,=1;neg=0
            label_weights_list,
            bbox_targets_list, # [batch,num_anchors,4]
             bbox_weights_list,
            num_total_samples=num_total_samples)
        return dict(loss_cls=losses_cls, loss_bbox=losses_bbox)

首先在每个特征图上都生成一堆 anchor,然后将所有 anchor 和所有 gt 一一匹配,互相找到 IoU 最大的索引,然后确定回归和分类目标,只不过分类目标是 0 或者 1,0代表正样本,1 代表负样本(如何一一匹配?)

(1) 首先得到featmap_sizes:(16,32), (8,16), (4,8), (2,4), (1,2)

(2) 然后生成这些featmap_sizes大小的anchor_list

先按照batch_size分开

anchor_list【0】

[1536, 4]

[384, 4]

[96, 4]

[24, 4]

[6, 4]

anchor_list【1】

[1536, 4]

[384, 4]

[96, 4]

[24, 4]

[6, 4]

加起来,每张图一共生成了(2046,4)个anchor

(3) 然后生成anchor的target,包括anchor的label,weight,bbox target

最终是在anchor_head.py中的_get_targets_single函数中,对每张图分别进行处理

其中,对anchor进行正负样本分配和筛选的函数分别为:

        # 对anchor进行标签分配,和gt一一对应,确定IoU最大的索引
        assign_result = self.assigner.assign(
            anchors, gt_bboxes, gt_bboxes_ignore,
            None if self.sampling else gt_labels)
        # 取样,筛选一部分正负样本进行训练
        sampling_result = self.sampler.sample(assign_result, anchors,
                                              gt_bboxes)

a. 分配

self.assigner: mmdet.core.bbox.assigners.max_iou_assigner,参数配置如下:

设定与 gt_box 的 IoU 大于 0.7 为正样本,小于 0.3 为负样本,中间的为忽略样本。

 进入到max_iou_assigner.py文件中的assign函数:

 overlaps = self.iou_calculator(gt_bboxes, bboxes)
 assign_result = self.assign_wrt_overlaps(overlaps, gt_labels)
gt_bboxes.shape[1, 4][2, 4]
bboxes.shape[2046, 4][2046, 4]
overlaps.shape
[1, 2046][2, 2046]

即 :

 num_gts, num_bboxes = overlaps.size(0), overlaps.size(1)

然后分别求:

对于每一个anchor,IoU最大的标签是哪个:

max_overlaps, argmax_overlaps = overlaps.max(dim=0)

对于每一个标签,IoU最大的anchor是哪个:

gt_max_overlaps, gt_argmax_overlaps = overlaps.max(dim=1)

对于max_overlaps(【2046】,表示每一个anchor,与不同标签的交并比中最大的值,和对应的标签),当 大于 0.7 为正样本,小于 0.3 为负样本,中间的为忽略样本。

2.3 得到 proposal list

上图4展示了RPN网络的具体结构。可以看到RPN网络实际分为2条线,上面一条通过softmax分类anchors获得positive和negative分类,下面一条用于计算对于anchors的bounding box regression偏移量,以获得精确的proposal。而最后的Proposal层则负责综合positive anchors和对应bounding box regression偏移量获取proposals,同时剔除太小和超出边界的proposals。其实整个网络到了Proposal Layer这里,就完成了相当于目标定位的功能。

 

            proposal_list = self.get_bboxes(
                *outs, img_metas=img_metas, cfg=proposal_cfg)   # 非极大值抑制

其中,输入的outs为:

rpn_cls_scorerpn_bbox_pred
outs[0][0]:2,3,16,32outs[1][0]:2,12,16,32
outs[0][1]:2,3,8,16outs[2][0]:2,12,8,16
outs[0][2]2,3,4,8outs[3][0]:2,12,4,8
outs[0][3]2,3,2,4outs[4][0]:2,12,2,4
outs[0][4]2,3,1,2outs[5][0]:2,12,1,2

主要是在rpn_head.py中的_get_bboxes_single函数中:

该函数的作用:Transform outputs of a single image into bbox predictions.

    def _get_bboxes_single(self,
                           cls_score_list,
                           bbox_pred_list,
                           score_factor_list,
                           mlvl_anchors,
                           img_meta,
                           cfg,
                           rescale=False,
                           with_nms=True,
                           **kwargs):
        """Transform outputs of a single image into bbox predictions.

        Args:
            cls_score_list (list[Tensor]): Box scores from all scale
                levels of a single image, each item has shape
                (num_anchors * num_classes, H, W).
            bbox_pred_list (list[Tensor]): Box energies / deltas from
                all scale levels of a single image, each item has
                shape (num_anchors * 4, H, W).
            score_factor_list (list[Tensor]): Score factor from all scale
                levels of a single image. RPN head does not need this value.
            mlvl_anchors (list[Tensor]): Anchors of all scale level
                each item has shape (num_anchors, 4).
            img_meta (dict): Image meta info.
            cfg (mmcv.Config): Test / postprocessing configuration,
                if None, test_cfg would be used.
            rescale (bool): If True, return boxes in original image space.
                Default: False.
            with_nms (bool): If True, do nms before return boxes.
                Default: True.

        Returns:
            Tensor: Labeled boxes in shape (n, 5), where the first 4 columns
                are bounding box positions (tl_x, tl_y, br_x, br_y) and the
                5-th column is a score between 0 and 1.
        """
        cfg = self.test_cfg if cfg is None else cfg
        cfg = copy.deepcopy(cfg)
        img_shape = img_meta['img_shape']

        # bboxes from different level should be independent during NMS,
        # level_ids are used as labels for batched NMS to separate them
        level_ids = []
        mlvl_scores = []
        mlvl_bbox_preds = []
        mlvl_valid_anchors = []
        nms_pre = cfg.get('nms_pre', -1)
        # 对FPN的每一层进行操作:
        for level_idx in range(len(cls_score_list)):
            rpn_cls_score = cls_score_list[level_idx]
            rpn_bbox_pred = bbox_pred_list[level_idx]
            assert rpn_cls_score.size()[-2:] == rpn_bbox_pred.size()[-2:]
            rpn_cls_score = rpn_cls_score.permute(1, 2, 0)
            # 如果 use_sigmoid=True 的话就直接给分类预测加上 sigmoid 函数得到 score
            if self.use_sigmoid_cls:
                rpn_cls_score = rpn_cls_score.reshape(-1) # [16, 32, 3]  ————> 16*32*3 = 1536
                scores = rpn_cls_score.sigmoid()
            else:
                rpn_cls_score = rpn_cls_score.reshape(-1, 2)
                # We set FG labels to [0, num_class-1] and BG label to
                # num_class in RPN head since mmdet v2.5, which is unified to
                # be consistent with other head since mmdet v2.0. In mmdet v2.0
                # to v2.4 we keep BG label as 0 and FG label as 1 in rpn head.
                scores = rpn_cls_score.softmax(dim=1)[:, 0]
            rpn_bbox_pred = rpn_bbox_pred.permute(1, 2, 0).reshape(-1, 4)  # rpn_bbox_pred [12, 16, 32]  ——————> [1536, 4]

            anchors = mlvl_anchors[level_idx]
            if 0 < nms_pre < scores.shape[0]:  # 这一段先忽略的话,就是nms_pre最多只需要2000个数,上面现在有1536个数,<2000
                # sort is faster than topk
                # _, topk_inds = scores.topk(cfg.nms_pre)
                ranked_scores, rank_inds = scores.sort(descending=True)
                topk_inds = rank_inds[:nms_pre]
                scores = ranked_scores[:nms_pre]
                rpn_bbox_pred = rpn_bbox_pred[topk_inds, :]
                anchors = anchors[topk_inds, :]

            mlvl_scores.append(scores)
            mlvl_bbox_preds.append(rpn_bbox_pred)
            mlvl_valid_anchors.append(anchors)
            level_ids.append(
                scores.new_full((scores.size(0), ),
                                level_idx,
                                dtype=torch.long))

        return self._bbox_post_process(mlvl_scores, mlvl_bbox_preds,
                                       mlvl_valid_anchors, level_ids, cfg,
                                       img_shape)

_get_bboxes_single中,前面所做的:

把outs的输出(cls_score_list 和 bbox_pred_list)reshape,比如说,

对于单张图片,cls_score_list[0]:(3,16,32)——>(16,32,3)——> (1536),然后通过sigmoid函数,得到scores:(1536)  # 1536 = 16*32*3

rpn_bbox_pred[0]:(12,16,32)——> (16,32,12) ——> (1536,4)

然后进入到_bbox_post_process函数中:

Do the nms operation for bboxes in same level.
    def _bbox_post_process(self, mlvl_scores, mlvl_bboxes, mlvl_valid_anchors,
                           level_ids, cfg, img_shape, **kwargs):
        """bbox post-processing method.

        Do the nms operation for bboxes in same level.

        Args:
            mlvl_scores (list[Tensor]): Box scores from all scale
                levels of a single image, each item has shape
                (num_bboxes, ).
            mlvl_bboxes (list[Tensor]): Decoded bboxes from all scale
                levels of a single image, each item has shape (num_bboxes, 4).
            mlvl_valid_anchors (list[Tensor]): Anchors of all scale level
                each item has shape (num_bboxes, 4).
            level_ids (list[Tensor]): Indexes from all scale levels of a
                single image, each item has shape (num_bboxes, ).
            cfg (mmcv.Config): Test / postprocessing configuration,
                if None, `self.test_cfg` would be used.
            img_shape (tuple(int)): The shape of model's input image.

        Returns:
            Tensor: Labeled boxes in shape (n, 5), where the first 4 columns
                are bounding box positions (tl_x, tl_y, br_x, br_y) and the
                5-th column is a score between 0 and 1.
        """

各个输入的维度:

mlvl_scores[0]

mlvl_bboxes[0]

mlvl_valid_anchors[0]

1536

1536,4

1536, 4

mlvl_scores[1]384
mlvl_scores[2]96
mlvl_scores[3]24
mlvl_scores[4]6
        # 把多个尺度的 score  、 bbox 、 anchor 合并
        scores = torch.cat(mlvl_scores)
        anchors = torch.cat(mlvl_valid_anchors)
        rpn_bbox_pred = torch.cat(mlvl_bboxes)

        #  decode
        proposals = self.bbox_coder.decode(
            anchors, rpn_bbox_pred, max_shape=img_shape)
        ids = torch.cat(level_ids)

        if cfg.min_bbox_size >= 0:
            w = proposals[:, 2] - proposals[:, 0]
            h = proposals[:, 3] - proposals[:, 1]
            valid_mask = (w > cfg.min_bbox_size) & (h > cfg.min_bbox_size)
            if not valid_mask.all():
                proposals = proposals[valid_mask]
                scores = scores[valid_mask]
                ids = ids[valid_mask]

        if proposals.numel() > 0:
            dets, _ = batched_nms(proposals, scores, ids, cfg.nms)
        else:
            return proposals.new_zeros(0, 5)

        return dets[:cfg.max_per_img]

decode函数: Apply transformation `pred_bboxes` to `boxes`.

decode函数中:

            decoded_bboxes = delta2bbox(bboxes, pred_bboxes, self.means,
                                        self.stds, max_shape, wh_ratio_clip,
                                        self.clip_border, self.add_ctr_clamp,
                                        self.ctr_clamp)

# 其中,bboxes为anchors,pred_bboxes为rpn_bbox_pred,即由feature map经过卷积层预测得到的bbox

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值