阅读理解思考 - Learning to Segment Object Candidates

最新推荐文章于 2023-06-21 14:25:28 发布

JacquesSeven

最新推荐文章于 2023-06-21 14:25:28 发布

阅读量1.5k

点赞数 1

分类专栏：阅理思文章标签： CNN CV

本文链接：https://blog.csdn.net/sinat_34539272/article/details/51061328

版权

阅理思专栏收录该内容

1 篇文章 0 订阅

订阅专栏

本文介绍了Facebook AI Research的研究成果DeepMask，一种不依赖低级特征，通过神经网络生成分割掩模而非边界框的区域提议方法。DeepMask采用VGG网络，通过两个并行分支学习无类别区分的分割掩模和物体存在概率。尽管实验结果显示其性能优秀，但仅关注分割可能限制了在检测任务上的创新。

摘要由CSDN通过智能技术生成

作者： Pedro O. Pinheiro, Ronan Collobert, Piotr Dollar
研究机构： Facebook AI Research
训练测试数据库： MS COCO（训练和测试）, PASCAL VOC（测试）
GPU及timing：Tesla K40m, 5days for training, VGG features 1s内， 1.6s MS COCO， 1.2s VOC

摘要中说现在的detection systems都是用两步：regions proposals，然后把这些proposals交给classifier来分类。没有依赖edges, superpixels，或者其它low-level的特征。

Introduction没有什么好说的。

Related Work说RCNN在detection上打败了一堆之前hand-designed features的方法(Selective Search，和a convnet classifier)。
关于region proposal，

For a more complete survey of object proposal methods, we recommend the recent survey from Hosang et al.
- Hosang, Jan, et al. “What makes for effective detection proposals?.” (2015).

作者又说了一遍之前的region proposal都rely on low-level segmentations.
和本文最像的是Multibox

Erhan, Dumitru, et al. “Scalable object detection using deep neural networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
Szegedy, Christian, et al. “Scalable, high-quality object detection.” arXiv preprint arXiv:1412.1441 (2014).

还有DeepBox， fastRCNN（region proposal networks）。但是不同的是

… our method generates segmentation masks instead of bounding boxes.

DeepMask Proposal分两个支路，但是两个都是object-agnostic。共享同一个网络（VGG），训练的时候是learned jointly（公式在文中有给出）。第一个支路最后输出是一张原尺寸大小的图，是segmentation mask；第二个最后输出是一个数值，表示是否包含一个物体(用y来表示)。y=1的要求比较严格，包含的物体必须在正中心（基本上在就行，但是不能偏移太多），且完全包含该物体。

Network Architecture VGG-A的输出(feature map)是个14x14x512（根据图1中所示）。这样出来还是包含一定的spatial特征。在Segmentation支路上，先用一个1X1的convolution layer，再接一个classification layer。这个classification layer包含了h x w个pixel classifier，也就是说每个最后输出的像素都有一个classifier，为了在分类时包含information in the entire feature map (global information)，每个classifier要和前一层全连接。但实际上，作者把classification layer变成了两个linear layers串联，减少参数，但是这样不会降低spatial information吗？文章后来还说75M的参数个数并且最后把hw都降低了4倍（图中），segmentation完了之后再upsample。第二条支路没什么好说的。Joint Learning 也没啥好说的（不懂）。

思考1 的确，分类的时候把global information考虑进去是比较重要，但是文中的方法应该还能改进的。我印象中，有把feature map叠加在一起的，一部分是global的，剩下的local的。

Full Scene Inference还是multiple locations and scales。

实验结果 结果很好。最后还比较deepmask和ss，来重新训练fast RCNN，来最后比较，也挺好。而且只有500个proposals，而ss大概是1k到2k。

思考2 大概文章的亮点还是segmentation，但是对于detection来说，文章也没有提出其它想法（除了用bounding box外），还需要另外的想法来实现。

references 引用还是有很多好文章，还没有看完。