Deformable Convolutional Networks

最新推荐文章于 2024-02-21 12:56:31 发布

GondorFu

最新推荐文章于 2024-02-21 12:56:31 发布

阅读量234

点赞数

文章标签：人工智能深度学习

本文链接：https://blog.csdn.net/a40850273/article/details/125559017

版权

创新点：提出了两个新的模块 deformable convolution 和 deformable RoI pooling，通过学习的方式学习 offset 从而更高效地提取特征点用于卷积操作。

Deformable Convolution

如上图所示，整个流程

通过一个 conv 层得到一个与原始 feature map 同等大小的 offset field
offset field 的 channel num 为卷积核数量的两倍，分别对应一个卷积核的 x 和 y 的偏移
由于 xy offset 通常为分数，在具体卷积时使用双线性插值来得到具体 offset 位置上的特征。并由于基于双线性插值，实现对 offset 的梯度反传，对 offset 进行学习。

Deformable RoI Pooling

RoI Pooling

如上图所示，整个流程

通过传统方式得到 pooled feature maps
之后经过一层 fc 得到归一化的 offset
基于检测框的宽和高，并乘以一个缩放系数 gamma，得到最终的 offset
基于得到的 offset 使用相同的双线性插值，然后基于新的特征重新得到 pooled feature maps

Position-Sensitive (PS) RoI Pooling

原始流程（下分支）：通过一个全卷积层，得到同尺度的 k*k*(C+1) 的 score maps，k 是 RoI bin 的数目，C 是检测目标的类别
上分支：对于每个类别，每个 RoI，得到对应 xy 的归一化 offset
同样乘上 RoI 的长宽和参数，得到最终的 offset

其他相似的工作

Spatial Transform Networks (STN)：M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spatial transformer networks. In NIPS, 2015.
Active Convolution：Y. Jeon and J. Kim. Active convolution: Learning the shape of convolution for image classification. In CVPR, 2017.
Effective Receptive Field：W. Luo, Y. Li, R. Urtasun, and R. Zemel. Understanding the effective receptive field in deep convolutional neural networks. arXiv preprint arXiv:1701.04128, 2017.
Atrous convolution：M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian. A real-time algorithm for signal analysis with the help of the wavelet transform. Wavelets: Time-Frequency Methods and Phase Space, page 289297, 1989.
Deformable Part Models (DPM)：P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. TPAMI, 2010
DeepID-Net：W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang, C.-C. Loy, and X. Tang. Deepid-net: Deformable deep convolutional neural networks for object detection. In CVPR, 2015
Spatial manipulation in RoI pooling：S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.
Transformation invariant features and their learning
Dynamic Filter：B. D. Brabandere, X. Jia, T. Tuytelaars, and L. V. Gool. Dynamic filter networks. In NIPS, 2016.
Combination of low level filters：J. J. Koenderink and A. J. van Doom. Representation of lo-
cal geometry in the visual system. Biological Cybernetics, 55(6):367–375, Mar. 1987.