前提
ResNet做classification问题,效果很好。但是不能直接用到detection问题中去。作者认为这是分类问题的平移不变性以及检测问题的平移变换性导致的。
We propose position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection.
网络结构
简而言之,R-FCN是RPN+classification network, classifactrion network是如下结构:
ResNet + position-sensitive score maps + position-sensitive RoI pooling
The k2position-sensitive scores then vote on the ROI. In this paper we simply vote by averaging the scores, producing a (C+1)-dimensional vector for each ROI: rc(θ)=∑i,jrc(i,j|θ) . Then we compute the softmax responses across categories: sc(θ)=erc(θ)/∑Ci=0eri(θ) . They are used for evaluating the cross-entropy loss during training and for ranking the ROIs during inference.
优点
All learnable weight layers are convolutional and are computed on the entire image; the per-RoI computational cost is negligible.
Receiving arbitrary sizes of image
remove fully connected layer. 这个是极好的,一直觉得SPPNet还有ROI pooling其实还是有误差的,有压缩的。position-sensitive map做了类似CRAFT的工作与无形之中,针对每个类单独pooling,提高精度
- 3.3x3的vote机制,增加了鲁棒性。因为是针对一个物体进行二分类(是或者否)而不是进行全物体分类,所以3x3就挺好的了。