深度学习之实例分割-Mask RCNN

最新推荐文章于 2024-07-26 10:23:17 发布

leo_whz

最新推荐文章于 2024-07-26 10:23:17 发布

阅读量2.3w

点赞数 5

分类专栏： object_detection segmentation 文章标签：深度学习

本文链接：https://blog.csdn.net/whz1861/article/details/78783597

版权

本文深入探讨了Mask R-CNN在实例分割中的应用，它基于Faster R-CNN，通过添加额外的分支预测每个RoI的实例分割。文章详细介绍了RoIAlign操作、网络架构和实验结果，表明Mask R-CNN在人体姿态估计和Cityscapes等任务上的强大性能。

摘要由CSDN通过智能技术生成

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.
The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

思想

基于Faster RCNN框架，在最后同分类和回归层，增加了实例分割任务【a small FCN applied to each RoI】
将Faster RCNN中的RoI Pooling替换成RoI Align操作
最终的特征层，采用FPN(Feature Pyramid Network)进行特征提取
采用ResNet101作为基础网络
RPN中的anchor采用5 scales和3 aspect ratios

注：

实例分割
- Mask RCNN在FasterRCNN最后扩展了分类和回归任务，增加了一个针对每一个RoI区域的分割任务。该任务是一个简单的FCN网络。
RoIAlign操作
- 因为RoIPool操作，太过于粗暴，导致特征层与原始图像上的对应关系误差太大【这是Fast/Faster R-CNN的主要问题】，所以提出了RoIAlign操作，可以保留空间位置的精度【preserves exact spatial locations】
- 该操作，非常只是修改了一点点，但是作用非常大，能够提高大概10%～50%的分割精度
解耦合
- 将分割任务和分类任务解耦合
  - RoI classification分支进行分类预测
  - FCN进行像素级别的多类别分类预测【分割】，其包括分割和分类两方面任务。
    - 最终FCN输出一个K层的mask，每一层为一类，Log输出，用0.5作为阈值进行二值化，产生背景和前景的分割Mask
灵活性
- 框架经过非常小的改动后，可以进行human pose estimation
- 将人体的每一个keypoint作为一个类别进行训练和检测
时间
- 该算法因为在Faster RCNN上增加一个非常小的任务，计算量增加的非常小，从而可以达到5fps的速度

Mask R-CNN

Faster R-CNN has two outputs for each caniateobject, a class label and a bounding-box offset; to this we add a third branch that outputs the object mask.
Mask R-CNN is thus a natural and intuitive idea. But the additional mask output is distinct from the class and box outputs, requiring extraction of much finer spatial layout of an object.