Faster-RCNN

最新推荐文章于 2022-02-10 16:06:00 发布

ReLuJie

最新推荐文章于 2022-02-10 16:06:00 发布

阅读量395

点赞数 1

分类专栏： # 目标检测深度学习文章标签：目标检测 Faster-RCNN

本文链接：https://blog.csdn.net/On_theway10/article/details/95208359

版权

深度学习同时被 2 个专栏收录

66 篇文章 1 订阅

订阅专栏

目标检测

17 篇文章 0 订阅

订阅专栏

Paper-info

title : Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[2016-NIPS]
author : Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun.
Github : fasterrcnn-pytorch

Motivation

在引出Faster R-CNN之前，我们首先快速回顾一下R-CNN和Fast R-CNN的主要环节：

# R-CNN
ROIs = region_proposal(image)     # Selective Search etc...
for ROI in ROIs
    patch = get_patch(image, ROI) # extract feature
    results = detector(patch)

# Fast R-CNN
feature_maps = process(image)     # extract feature for image only once!
ROIs = region_proposal(image)     # time consuming !!! [ Selective Search etc...]
for ROI in ROIs
    patch = roi_pooling(feature_maps, ROI)
    results = detector2(patch)

下面是R-CNN和Fast R-CNN的网络框架图：

R-CNN

Fast R-CNN

Fast R-CNN的测试速度依赖于region proposal的生成方法(eg: selective search等)，而这些方法在GPU机器上运行很慢，在测试环节为：整个测试流程耗时2.3s，其中2000个ROIs的产生环节就耗时约2s！通过以上的回顾，我们知道Fast R-CNN将R-CNN的multi-stages整合到一个网络中，从而实现了end-to-end的训练模式，但是它的inference速度很慢(0.5 FPS，主要的原因是产生proposal环节耗时严重)，这大大限制了模型的实际应用场景。针对Fast R-CNN的这一缺陷，Faster R-CNN创造性地提出了anchor-based - RPN(Region Proposal Net)，优雅而高效地解决了Fast R-CNN的痛点，成为了R-CNN系列方法的巅峰之作，同时也是目前detection领域最常用的模型之一。

Idea

首先来看，selective search方法的大致做法[2]，它的本质是图像区域由小到大地进行动态聚类，然后输出在这个动态聚类的过程中生成的各个区域对应的bbox作为proposal。Selective search工作机理的示意图如下：

为了加速proposal的生成过程，Faster RCNN中创造性地提出了RPN(一个内置的浅层网络)，通过引入anchor机制，可以为产生多样化的proposal。具体操作为：

step - 1. 对于feature_map上的每个pixel，产生k个多样化[由[size] x [aspect_ratio]控制, 每个pixel的base_anchor都一样]base anchor，再根据feature_map与input_img的尺度关系，将base anchor转为基于原图尺寸的anchor。

step - 2. RPN中包含两个支路，分别输出anchor对应的objectness和offset

[为什么这里k个anchor要产生2k个score呢？]

step - 3. 根据offset对anchor进行修正，注意offset的内容是：

这里记得整理Faster RCNN的loss-function。

def decode_single(self, rel_codes, boxes):
        '''
        From a set of original boxes and encoded relative box offsets,
        get the decoded boxes.

        Arguments:
            rel_codes (Tensor): bbox_offset
            boxes (Tensor)    : anchor boxes.
        '''

        boxes = boxes.to(rel_codes.dtype)

        widths  = boxes[:, 2] - boxes[:, 0]
        heights = boxes[:, 3] - boxes[:, 1]
        ctr_x = boxes[:, 0] + 0.5 * widths
        ctr_y = boxes[:, 1] + 0.5 * heights

        wx, wy, ww, wh = self.weights  # 默认值为[1,1,1,1]
        dx = rel_codes[:, 0::4] / wx
        dy = rel_codes[:, 1::4] / wy
        dw = rel_codes[:, 2::4] / ww
        dh = rel_codes[:, 3::4] / wh

        # Prevent sending too large values into torch.exp()
        dw = torch.clamp(dw, max=self.bbox_xform_clip)
        dh = torch.clamp(dh, max=self.bbox_xform_clip)

        pred_ctr_x = dx * widths[:, None]  + ctr_x[:, None]
        pred_ctr_y = dy * heights[:, None] + ctr_y[:, None]
        pred_w = torch.exp(dw) * widths[:, None]
        pred_h = torch.exp(dh) * heights[:, None]

        pred_boxes = torch.zeros_like(rel_codes)
        pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
        pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
        pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
        pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h

        return pred_boxes

step - 4. 利用objectness、nms策略对anchor进行筛选，筛选剩下的即为RPN输出的proposal。

loss function

其中：

p_i 表示 anchor是一个object的预测概率；

p_star_i 表示 groundtruth(binary-label, 1:positive anchor; 0:negative anchor)；

t_i 表示pred_bbox参数{t_x, t_y, t_w, t_h};

t_star_i 表示gt_bbox{t_star_x, t_star_y, t_star_w, t_star_h};

L_cls(p_i, p_star_i)表示分类损失，采用log-loss；

L_reg(t_i, t_star_i)表示回归损失，采用Fast R-CNN中定义的smooth L1损失函数：

Outline

feature_maps = process(image)
ROIs         = RPN(image)     # Faster!
for ROI in ROIs
    patch   = roi_pooling(feature_maps, ROI)
    results = detector2(patch)

torchvision源码分析

Anchor 生成

class AnchorGenerator(nn.Module):
    """
    Module that generates anchors for a set of feature maps and image sizes.

    The module support computing anchors at multiple sizes and aspect ratios per feature map.

    sizes and aspect_ratios should have the same number of elements, and it should
    correspond to the number of feature maps.

    sizes[i] and aspect_ratios[i] can have an arbitrary number of elements,
    and AnchorGenerator will output a set of sizes[i] * aspect_ratios[i] anchors
    per spatial location for feature map i.

    Arguments:
        sizes (Tuple[Tuple[int]]):
        aspect_ratios (Tuple[Tuple[float]]):
    """

    def __init__(self, sizes=(128, 256, 512), aspect_ratios=(0.5, 1.0, 2.0)):

        super(AnchorGenerator, self).__init__()

        ...

        self.sizes = sizes
        self.aspect_ratios = aspect_ratios
        self.cell_anchors = None
        self._cache = {}

    @staticmethod
    def generate_anchors(scales, aspect_ratios, device="cpu"):
        '''
        Generate the base_anchor
        '''
        scales = torch.as_tensor(scales, dtype=torch.float32, device=device)
        aspect_ratios = torch.as_tensor(aspect_ratios, dtype=torch.float32, device=device)
        h_ratios = torch.sqrt(aspect_ratios)
        w_ratios = 1 / h_ratios

        ws = (w_ratios[:, None] * scales[None, :]).view(-1)
        hs = (h_ratios[:, None] * scales[None, :]).view(-1)

        base_anchors = torch.stack([-ws, -hs, ws, hs], dim=1) / 2 #center_point
        return base_anchors.round()


    def set_cell_anchors(self, device):
        ''' Generate the anchors '''

        if self.cell_anchors is not None:
            return self.cell_anchors

        cell_anchors = [
            self.generate_anchors(sizes, aspect_ratios, device)
            for sizes, aspect_ratios in zip(self.sizes, self.aspect_ratios)
        ]
        self.cell_anchors = cell_anchors


    def num_anchors_per_location(self):
        return [len(s) * len(a) for s, a in zip(self.sizes, self.aspect_ratios)]


    def grid_anchors(self, grid_sizes, strides):
        '''
        Generate the anchors according to base_anchor and stride
        grid_size : size of feature_map
        stride    : map_stride between img and feature_map
        '''
        anchors = list()
        for size, stride, base_anchors in zip(grid_sizes, strides, self.cell_anchors):

            grid_height, grid_width = size
            stride_height, stride_width = stride
            device = base_anchors.device
            shifts_x = torch.arange(0, grid_width, dtype=torch.float32, device=device) * stride_width
            shifts_y = torch.arange(0, grid_height,dtype=torch.float32, device=device) * stride_height
            shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)
            shift_x = shift_x.reshape(-1)
            shift_y = shift_y.reshape(-1)
            shifts  = torch.stack((shift_x, shift_y, shift_x, shift_y), dim=1)

            anchors.append((shifts.view(-1, 1, 4) + base_anchors.view(1, -1, 4)).reshape(-1, 4))

        return anchors


    def cached_grid_anchors(self, grid_sizes, strides):

        key = tuple(grid_sizes) + tuple(strides)
        if key in self._cache:
            return self._cache[key]
        anchors = self.grid_anchors(grid_sizes, strides)   #
        self._cache[key] = anchors
        return anchors


    def forward(self, image_list, feature_maps):
        '''
        step - 1. generate the base_anchor according to scales and aspect_ratios
        step - 2. generate the anchors over all feature maps for each type of image_size
        NOTE : there may be multi-scales feature_map
        '''

        # step - 1
        self.set_cell_anchors(feature_maps[0].device)

        # step - 2
        grid_sizes = tuple([feature_map.shape[-2:] for feature_map in feature_maps])
        image_size = image_list.tensors.shape[-2:]
        # calculate the map-stride between img_size and feature_map
        strides    = tuple((image_size[0] / g[0], image_size[1] / g[1]) for g in grid_sizes)
        anchors_over_all_feature_maps = self.cached_grid_anchors(grid_sizes, strides)

        anchors = []
        # deal with each instance in batch
        for i, (image_height, image_width) in enumerate(image_list.image_sizes):
            anchors_in_image = []
            for anchors_per_feature_map in anchors_over_all_feature_maps:
                anchors_in_image.append(anchors_per_feature_map)
            anchors.append(anchors_in_image)
        anchors = [torch.cat(anchors_per_image) for anchors_per_image in anchors]
        return anchors

RPN-header

class RPNHead(nn.Module):
    """
    Adds a simple RPN Head with classification and regression heads

    Arguments:
        in_channels (int): number of channels of the input feature
        num_anchors (int): number of anchors to be predicted
    """

    def __init__(self, in_channels, num_anchors):

        super(RPNHead, self).__init__()

        self.conv = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1)
        self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)
        self.bbox_pred  = nn.Conv2d(in_channels, num_anchors * 4, kernel_size=1, stride=1)
        
        # 初始化卷积核
        for l in self.children():
            torch.nn.init.normal_(l.weight, std=0.01)
            torch.nn.init.constant_(l.bias, 0)

    def forward(self, x):

        logits, bbox_reg = [], []
        for feature in x:
            t = F.relu(self.conv(feature))      # 得到out_channel=256，大小不变的feature_map 
            logits.append(self.cls_logits(t))   # 产生objectness-score
            bbox_reg.append(self.bbox_pred(t))  # 产生anchor_box的offset
        return logits, bbox_reg

Region Proposal Networks

class RegionProposalNetwork(torch.nn.Module):
    
    ...

    def forward(self, images, features, targets=None):
        """
        Arguments:
            images (ImageList): images for which we want to compute the predictions
            features (List[Tensor]): features computed from the images that are
                used for computing the predictions. Each tensor in the list
                correspond to different feature levels
            targets (List[Dict[Tensor]): ground-truth boxes present in the image (optional).
                If provided, each element in the dict should contain a field `boxes`,
                with the locations of the ground-truth boxes.

        Returns:
            boxes (List[Tensor]): the predicted boxes from the RPN, one Tensor per
                image.
            losses (Dict[Tensor]): the losses for the model during training. During
                testing, it is an empty dict.
        """
        # 主干网络抽得的feature_map
        features = list(features.values())
        
        # 基于该feature_map产生anchor
        anchors = self.anchor_generator(images, features)
        
        # 利用RPN-Header产生anchor对应的objectness、bbox_offset
        objectness, pred_bbox_deltas = self.head(features)
        
        # 这里的feature_map是过了FPN的，所以会有多个scale的feature_map, concat这些信息
        num_images = len(anchors)
        num_anchors_per_level = [obj[0].numel() for obj in objectness]
        objectness, pred_bbox_deltas = concat_box_prediction_layers(objectness, pred_bbox_deltas)
        
        # 利用RPN_header产生的bbox_offset来refine anchor，从而产生初始的proposal 
        proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
        proposals = proposals.view(num_images, -1, 4)
        
        # 利用bbox的objectness以及nms来筛选掉不靠谱的bbox
        boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, \
                                                  num_anchors_per_level)
        losses = {}
        if self.training:
            labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
            regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
            loss_objectness, loss_rpn_box_reg = self.compute_loss(objectness, pred_bbox_deltas, \
                                                    labels, regression_targets)
            losses = {
                "loss_objectness": loss_objectness,
                "loss_rpn_box_reg": loss_rpn_box_reg,
            }
        return boxes, losses

Experiment

Conclusion

RPN-class scores account for the accuracy of the highest ranked proposals;
High-quality proposals are mainly due to RPN-regressed positions, anchor boxes alone are not sufficient for accurate detection

Reference

[1]. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[2016-NIPS]

[2]. J.R.R.Uijlings, K.E.A. van de Sande, T. Gevers, A.W.M. Smeulders. Selective Search for Object Recognition[2013IJCV]

[3]. https://medium.com/@jonathan_hui/what-do-we-learn-from-region-based-object-detectors-faster-r-cnn-r-fcn-fpn-7e354377a7c9

ReLuJie

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
Faster-RCNN

Paper-infotitle : Faster R-CNN: Towards Real-Time Object Detectionwith Region Proposal Networks[2016-NIPS] author :Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Github : fasterrcnn-pytorch...
复制链接

扫一扫