创新点:提出了两个新的模块 deformable convolution 和 deformable RoI pooling,通过学习的方式学习 offset 从而更高效地提取特征点用于卷积操作。
Deformable Convolution
如上图所示,整个流程
- 通过一个 conv 层得到一个与原始 feature map 同等大小的 offset field
- offset field 的 channel num 为卷积核数量的两倍,分别对应一个卷积核的 x 和 y 的偏移
- 由于 xy offset 通常为分数,在具体卷积时使用双线性插值来得到具体 offset 位置上的特征。并由于基于双线性插值,实现对 offset 的梯度反传,对 offset 进行学习。
Deformable RoI Pooling
RoI Pooling
如上图所示,整个流程
- 通过传统方式得到 pooled feature maps
- 之后经过一层 fc 得到归一化的 offset
- 基于检测框的宽和高,并乘以一个缩放系数 gamma,得到最终的 offset
- 基于得到的 offset 使用相同的双线性插值,然后基于新的特征重新得到 pooled feature maps
Position-Sensitive (PS) RoI Pooling
- 原始流程(下分支):通过一个全卷积层,得到同尺度的 k*k*(C+1) 的 score maps,k 是 RoI bin 的数目,C 是检测目标的类别
- 上分支:对于每个类别,每个 RoI,得到对应 xy 的归一化 offset
- 同样乘上 RoI 的长宽和参数,得到最终的 offset
其他相似的工作
- Spatial Transform Networks (STN):M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spatial transformer networks. In NIPS, 2015.
- Active Convolution:Y. Jeon and J. Kim. Active convolution: Learning the shape of convolution for image classification. In CVPR, 2017.
- Effective Receptive Field:W. Luo, Y. Li, R. Urtasun, and R. Zemel. Understanding the effective receptive field in deep convolutional neural networks. arXiv preprint arXiv:1701.04128, 2017.
- Atrous convolution:M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian. A real-time algorithm for signal analysis with the help of the wavelet transform. Wavelets: Time-Frequency Methods and Phase Space, page 289297, 1989.
- Deformable Part Models (DPM):P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. TPAMI, 2010
- DeepID-Net:W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang, C.-C. Loy, and X. Tang. Deepid-net: Deformable deep convolutional neural networks for object detection. In CVPR, 2015
- Spatial manipulation in RoI pooling:S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.
- Transformation invariant features and their learning
- Dynamic Filter:B. D. Brabandere, X. Jia, T. Tuytelaars, and L. V. Gool. Dynamic filter networks. In NIPS, 2016.
- Combination of low level filters:J. J. Koenderink and A. J. van Doom. Representation of lo-
cal geometry in the visual system. Biological Cybernetics, 55(6):367–375, Mar. 1987.
Deformable Convolution/RoI Pooling Backpropagation
Deformable Convolution