Tensorflow object detection API 源码阅读笔记：Mask R-CNN

最新推荐文章于 2021-06-01 09:40:44 发布

Wayne2019

最新推荐文章于 2021-06-01 09:40:44 发布

阅读量3.9k

点赞数 1

分类专栏： TensorFlow 文章标签： Tensorflow mask-r-cnn 物体检测计算机视觉深度学习

本文链接：https://blog.csdn.net/wayne2019/article/details/78780944

版权

本文详细探讨了Tensorflow Object Detection API中Mask R-CNN的实现，包括ROI Align的使用和mask分支的实现。尽管API目前未提供预训练的mask模型，但通过源码分析，可以了解到ROI Align相比于ROI Pooling的优势，如避免了坐标取整问题，采用双线性插值。同时，文中提到了API中对ROI Pooling的特别实现，利用了Tensorflow的‘crop and resize’操作来实现更精确的特征采样。

摘要由CSDN通过智能技术生成

这篇我们追寻Tensorflow object detection API 源码中Mask R-CNN的痕迹。先说结论： Tensorflow object detection API 实现了ROI align，实现了mask branch（略有不同）。目前没有提供mask的预训练模型。

检测系列的总结博客和知乎中非常多，如：

目标检测-RCNN系列

 Mask-RCNN技术解析

 CNN 在图像分割中的简史：从 R-CNN 到 Mask R-CNN

我们还是主要关注Tensorflow object detection API 的代码实现细节。

ROI align vs ROI pooling
ROI pooling由于取整的问题，得到的特征和ROI的坐标（原图上）不是完全对应的。ROI Pooling层解析
（Caffe）。原始的ROI pooling就是spp的特殊情况（先量化到格点上，然后再分块max pool，与spp一样每块可能大小有细微差别；而ROI Align不进行量化，直接均匀分块，然后用双线性插值求ROI中每个格点的值）。
FAQs: how to sample grid points within a cell?
• 4 regular points in 2x2 sub-cells
• other implementation could work

Tensorflow object detection API这里对ROI pooling的实现不一样: Additionally, instead of using the ROI Pooling layer and Position-sensitive ROI Pooling layers used by [31, 6], we use Tensorflow’s “crop and resize” operation which uses bilinear interpolation to resample part of an image onto a fixed sized grid. 代码为

"""
ROI就是features_to_crop（第一阶段特征提取器得到的feature map）上的一块crop，是依据RPN的proposal_boxes截取的。
"""
def _compute_second_stage_input_feature_maps(self, features_to_crop,