R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN

R-CNN 系列的四篇文章如下:
1. R-CNN: https://arxiv.org/abs/1311.2524
2. Fast R-CNN: https://arxiv.org/abs/1504.08083
3. Faster R-CNN: https://arxiv.org/abs/1506.01497
4. Mask R-CNN: https://arxiv.org/abs/1703.06870

R-CNN

R-CNN: R-CNN utilizes Selective Search to generate region proposals, CNN to extract features, SVM to classify object, and linear regression to tighten the generated bounding boxes. However, its training tasks, including training CNN, SVM and linear regression, are complex. Even worse, it deals with four parts separately, which limits its processing speed. And it could not be used in real time object detection.

Fast R-CNN

Fast R-CNN: Fast R-CNN utilizes same methods in region proposal generating and feature extraction. But it uses softmax instead of SVM. It speeds up R-CNN since it avoids performing CNN on each proposal separately. It processes the whole image to generate the feature map only using CNN once, and then obtain corresponding features of each proposal from the feature map. Due to the size restriction, Fast R-CNN applies ROI pooling to normalize features for the following regression tasks. However, it’s still based on generating region proposals with Selective Search, which is the bottleneck of computation.

Faster R-CNN

Faster R-CNN: Faster R-CNN consists of two main modules, Region Proposal Network (RPN) for generating region proposals and Fast R-CNN using the proposed regions for classification, and finally implements an end-to-end network. Although Fast R-CNN is faster than R-CNN, Faster R-CNN is even more computing-efficient through the following steps. It generates region proposals on feature map instead of the original image using sliding window, and in each window center, it proposes 9 region candidates with 3 scales and 3 aspect ratios. RPN is also used to train classification and bounding box regression, and it shares convolutional layers between RPN and Fast R-CNN.

The most interesting point that hits me is the mechanism of anchors. It seems the reverse version of Spatial Pyramid Pooling (SPP). SPP is used for resizing inputs with different sizes to a single-scale output. Anchor is used to obtain inputs with multiple scales and aspect ratios from a single-scale output on the feature map with a single-sized sliding window. Then all these obtained inputs (a pyramid of anchors) would be fed into classification and regression tasks.

Mask R-CNN

Mask R-CNN detects objects and generates segmentation mask for each instance simultaneously based on Faster R-CNN architecture. The key innovations are adding a branch for mask prediction in parallel with classification and bounding box regression, and replacing the RoIPool with the RoIAlign. For the first point, Mask R-CNN performs classification and mask prediction in parallel, unlike methods whose segmentation precedes recognition. The network generates masks for each class, and uses classification result to select the output mask. From this point of view, Mask R-CNN decouples mask and class label prediction. For the second point, the authors found that the quantization issue from the RoIPool has small negative effects on classification but a large one on pixel-level mask prediction. Hence, they proposed an RoIAlign layer using bilinear interpolation to make extracted features align to the input. At last, Mask R-CNN shows its good generality, flexibility and accuracy in multiple tasks, including instance segmentation, bounding box object detection and person keypoint detection.

References:

[1] A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN
[2] https://zhuanlan.zhihu.com/p/25954683
[2] https://zhuanlan.zhihu.com/p/26655034
[2] https://zhuanlan.zhihu.com/p/32830206

  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值