proposal_layer.py

最新推荐文章于 2020-04-07 19:03:03 发布

bestrivern

最新推荐文章于 2020-04-07 19:03:03 发布

阅读量635

点赞数

分类专栏： Faster-RCNN

本文链接：https://blog.csdn.net/bestrivern/article/details/89301198

版权

Faster-RCNN 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

proposal_layer负责综合所有的 $[d_{x}(A),d_{y}(A),d_{w}(A),d_{h}(A)]$ 变换量和foreground anchors，计算出精准的proposal，送入后续RoI Pooling Layer。

Proposal Layer forward(前向传递函数)按照以下顺序依次处理：

1.生成anchors，利用 $[d_{x}(A),d_{y}(A),d_{w}(A),d_{h}(A)]$ 对所有的anchors做bbox regression回归（这里的anchors生成和训练时完全一致）

2.按照输入的foreground softmax scores由大到小排序anchors，提取前pre_nms_topN(e.g. 6000)个anchors，即提取修正位置后的foreground anchors。

3.限定超出图像边界的foreground anchors为图像边界（防止后续roi pooling时proposal超出图像边界）

4.剔除非常小（width<threshold or height<threshold）的foreground anchors

5.进行nonmaximum suppression

6.再次按照nms后的foreground softmax scores由大到小排序fg anchors，提取前post_nms_topN(e.g. 300)结果作为proposal输出。

7.输出的output的shape是[batch_size, 2000, 5]，在第3个维度上，第0列表示当前的region proposal属于batch size中的哪一张图像编号的，第1~4列表示region proposal在经过变换之后的输入图像分辨率上的坐标 [xmin,ymin,xmax,ymax]

class _ProposalLayer(nn.Module):
    """
    Outputs object detection proposals by applying estimated bounding-box
    transformations to a set of regular boxes (called "anchors").
    """
    def __init__(self, feat_stride, scales, ratios):
        super(_ProposalLayer, self).__init__()

        self.feat_stride = feat_stride
        self._anchors = torch.from_numpy(generate_anchors(scales=np.array(scales),
                            ratios=np.array(ratios))).float()
        self._num_anchors = self._anchors.size(0)

    def forward(self, input):
        # According to channel C, the frame of the RPN prediction is taken as the foreground score.
        # Please note that among the 18 channels, the first 9 are the probability that the box belongs
        # to the background, and the last 9 are the probability of belonging to the foreground.
        # the shape of scores:[batch_size, 9, H, W]
        # H = M / 16, W = N /16
        scores = input[0][:, self._num_anchors:, :, :]

        # bbox_deltas represents the coordinate transformation information of each frame of the RPN output
        bbox_deltas = input[1]

        im_info = input[2]
        cfg_key = input[3]

        pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N
        post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
        nms_thresh = cfg[cfg_key].RPN_NMS_THRESH
        min_size = cfg[cfg_key].RPN_MIN_SIZE

        batch_size = bbox_deltas.size(0)

        # 1. Generate proposals from bbox deltas and shifted anchors
        feat_height, feat_width = scores.size(2), scores.size(3)
        # Enumerate all shifts
        # [0, 16, 32, 48...]
        shift_x = np.arange(0, feat_width) * self.feat_stride
        # [0, 16, 32, 48...]
        shift_y = np.arange(0, feat_height) * self.feat_stride
        # generating grid on the feature map
        # shift_x shape: [height, width], shift_y shape: [height, width]
        # H and W respectively represent the images that are actually input into the network
        # for the original image after the scaled and padding operations in the data set.
        shift_x, shift_y = np.meshgrid(shift_x, shift_y)
        # shifts shape:[height * width, 4]
        shifts = torch.from_numpy(np.vstack((shift_x.ravel(), shift_y.ravel(),
                                             shift_x.ravel(), shift_y.ravel())).transpose())
        shifts = shifts.contiguous().type_as(scores).float()

        A = self._num_anchors  # A=9
        # Indicates the area of the feature map (equal to the height of the feature map * width)
        # S=H*W
        K = shifts.size(0)

        # now the anchors shape:[batch_size, K * A, 4]
        # K*A is the number of all anchor boxes in an input image
        self._anchors = self._anchors.type_as(scores)
        anchors = self._anchors.view(1, A, 4) + shifts.view(K, 1, 4)
        anchors = anchors.view(1, K * A, 4).expand(batch_size, K * A, 4)

        # Transpose and reshape predicted bbox transformations to get them
        # into the same order as the anchors:
        bbox_deltas = bbox_deltas.permute(0, 2, 3, 1).contiguous()
        bbox_deltas = bbox_deltas.view(batch_size, -1, 4)

        # Same story for the scores:
        scores = scores.permute(0, 2, 3, 1).contiguous()
        scores = scores.view(batch_size, -1)

        # Convert anchors into proposals via bbox transformations
        # According to the position offset predicted by the anchors and the RPN,
        # the RPN prediction coordinate values are decoded to obtain the absolute
        # coordinate values of the RPN prediction values on the training image
        # resolution (actually input to the network), and the returned proposal
        # is the Predicted coordinate values of boxes of RPN for all the anchors.
        proposals = bbox_transform_inv(anchors, bbox_deltas, batch_size)  # [xmin,ymin,xmax,ymax]

        # 2. clip predicted boxes to image
        proposals = clip_boxes(proposals, im_info, batch_size)
        # proposals = clip_boxes_batch(proposals, im_info, batch_size)

        scores_keep = scores  # shape:[batch_size,(H/16)*(W/16)*9]
        proposals_keep = proposals
        # For each input image in the batch size, for (H/16)*(W/16)*9 region proposals are sorted
        # in descending order according to the forward score of the RPN prediction, and the first
        # parameter returned is the score after descending order. Value, the second parameter is
        # the position index number in descending order
        _, order = torch.sort(scores_keep, 1, True)

        # output shape  [batch_size, post_nms_topN, 5]
        # Torch.new() method generates new tensor post_nms_topN=2000 with the same data type as the
        # current torch.tensor, indicating that the final proposal is sent to the subsequent
        # Fast R-CNN model (region proposal) 2000. This is obvious from the RPN. The function of
        # the R-CNN model and the selective search algorithm in the Fast R-CNN model is that
        # the selective search algorithm does not need to learn the training process, and
        # the RPN is obtained by training the network to obtain the region proposal.
        # The selective search algorithm is also The subsequent detector provides a
        # subsequent frame. It can be seen that the R-CNN model and the Fast R-CNN model
        # generate 2000 proposal proposals through the selective search algorithm,
        # so it cannot be regarded as dense detection and the Faster R-CNN obtains
        # the region proposal/ROI by training RPN, SSD/YOLO Furthermore, the anchor
        # boxes are generated by the sliding window form, which is the absolute sense detection.
        output = scores.new(batch_size, post_nms_topN, 5).zero_()

        for i in range(batch_size):
            # # 3. remove predicted boxes with either height or width < threshold
            # # (NOTE: convert min_size to input image scale stored in im_info[2])
            proposals_single = proposals_keep[i]
            scores_single = scores_keep[i]

            # # 4. sort all (proposal, score) pairs by score from highest to lowest
            # # 5. take top pre_nms_topN (e.g. 6000)
            order_single = order[i]

            if pre_nms_topN > 0 and pre_nms_topN < scores_keep.numel():
                order_single = order_single[:pre_nms_topN]

            proposals_single = proposals_single[order_single, :]
            scores_single = scores_single[order_single].view(-1, 1)

            # 6. apply nms (e.g. threshold = 0.7)
            # 7. take after_nms_topN (e.g. 300)
            # 8. return the top proposals (-> RoIs top)
            eep_idx_i = nms(torch.cat((proposals_single, scores_single), 1), nms_thresh, force_cpu=not cfg.USE_GPU_NMS)
            keep_idx_i = keep_idx_i.long().view(-1)

            if post_nms_topN > 0:
                keep_idx_i = keep_idx_i[:post_nms_topN]
            proposals_single = proposals_single[keep_idx_i, :]
            scores_single = scores_single[keep_idx_i, :]

            # padding 0 at the end.
            num_proposal = proposals_single.size(0)
            output[i, :, 0] = i
            output[i, :num_proposal, 1:] = proposals_single

        # output shape:[batch_size,2000,5]
        # In the third dimension, the 0th column indicates which image number in the current size
        # belongs to the batch size, and the first to fourth columns indicate the coordinates of t
        # he region proposal on the input image resolution after the transformation [xmin, ymin ,xmax,ymax]
        return output



    def backward(self, top, propagate_down, bottom):
        """This layer does not propagate gradients."""
        pass

    def reshape(self, bottom, top):
        """Reshaping happens during the call to forward."""
        pass

    def _filter_boxes(self, boxes, min_size):
        """Remove all boxes with any side smaller than min_size."""
        ws = boxes[:, :, 2] - boxes[:, :, 0] + 1
        hs = boxes[:, :, 3] - boxes[:, :, 1] + 1
        keep = ((ws >= min_size.view(-1, 1).expand_as(ws)) & (hs >= min_size.view(-1, 1).expand_as(hs)))
        return keep