Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun
Abstract
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features——using the recently popular terminology of neural networks with “attention” mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.
学习博客:1.Faster R-CNN论文翻译——中英文对照
3. 经典网络解读系列(一):RegionProposal+CNN (rcnn)
4.
RCNN
RCNN ( Regions with CNN )是继 DPM ( Deformable Parts Model )之后运用深度学习进行多目标检 测的代表性方法。 RCNN 的思想主要分为以下几个步骤(如图 3.16 所示):
步骤一,先将通过图像分割手段将原图分割成若干个小区域,然后检查这些小区域,若 相邻的两个小区域颜色直方图相近,纹理直方图相近,合并后区域面积不大,或者在 bounding box 中所占的面积比较大,就将其合并。该操作会在( RGB 、 HSV )等多个颜色空间中进行, 以减少遗漏候选区的可能性,最后得到的图片就是候选区,数量大约在 2000 张左右。
步骤二,将提取的候选区域进行预处理,尺寸统一缩放成 227* 227 大小,然后送入卷积 神经网络进行特征提取,得到 4096 维的特征向量。
步骤三,将每一个候选框进行类别判断。对于一个已经训练好的神经网络模型,每一个 类都有一个特征向量表达,现在将从候选框中提取的 4096 维特征经过一个线性的二分的 SVM 分类器进行分类,判别过程如图 3.16 中所示,原图的特征分类是否是飞机?否。是否是显示 器?否,是否是人?是。评判标准由原图提取的特征向量与某类的特征向量之间的距离来判 断。
步骤四,待分类结束后,将目标物体在输入图片中框出来。有时图片的检测结果准确, 但是定位未必非常准确,或许真实物体与候选框的重叠面积并不是非常一致。此时需要对候 选框进行修正,对每一个候选框包含实物面积进行打分,然后用 canny 算子进行边缘检测, 最后得到一个得分最高的候选框。