摘要:
之前基于区域的目标检测方法(Fast/Faster R-CNN)需要将重复性地将成千上百proposal输入到子网络。R-FCN网络近乎全图共享计算,避免前面方法的计算冗余。R-FCN提出使用位置敏感的得分谱(解决这个问题,大体意思是,图像分类需要不变性,目标检测需要对目标位置,形状等改变做出特征描述的改变。to address a dilemma between translation-invariance in image classification and translation-variance in object detection(实在不知道怎么准确翻译))。R-FCN使用的是分类的残差网络作为整个的backbones。性能如下We show competitive results on the PASCAL VOC datasets (e.g., 83.6% mAP on the 2007 set) with the 101-layer ResNet.Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2.5-20× faster than the Faster R-CNN counterpart.
介绍:
现在主流(prevalent)深度学习的目标检测通过RoI-pooling 层分为两个子网络:第一部分,共享的全卷积网络独立于RoIs。第二部分,不计算共享的单个RoI的子网络。这种分解的方式是受分类结构的影响(AlexNet and VGG Nets)。略
方法:
R-FCN使用RPN网络进行candidate RoIs,RPN和R-FCN是共享特征计算。The last convolutional layer produces a bank of k2 position-sensitive score maps for each category, and thus has a k2(C +1)-channel output layer with C object categories (+1 for background). The bank of k2 score maps correspond to a k ×k spatial grid describing relative positions. For example, with k×k = 3×3, the 9 score maps encode the cases of {top-left, top-center, top-right, …, bottom-right} of an object category.
R-FCN ends with a position-sensitive RoI pooling layer. This layer aggregates the outputs of the last convolutional layer and generates scores for each RoI.
Position-sensitive score maps & Position-sensitive RoI pooling. To explicitly encode position information into each RoI, we divide each RoI rectangle into k ×k bins by a regular grid. For an RoI rectangle of a size w×h, a bin is of a size≈ w/k × h/k . In our method, the last convolutional layer is constructed to produce k2 score maps for each category. Inside the (i, j)-th bin (0 ≤ i, j ≤ k −1), we define a position-sensitive RoI pooling operation that pools only over the (i, j)-th score map:它的计算是第c类的第i,j个bin里面进行平均pooling。