Spatial Transformer Networks

最新推荐文章于 2021-12-25 21:49:05 发布

weixin_30539625

最新推荐文章于 2021-12-25 21:49:05 发布

阅读量118

点赞数

原文链接：http://www.cnblogs.com/huangxiao2015/p/5687037.html

版权

Spatial Transformer Networks

参考文献：Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015.

Abstract

该文章提出了一种新的可学习的空间变换网络(STN)，该网络不需要关键点标定，能够根据分类或者其他任务自适应地将数据进行空间变换和对齐（包括平移、缩放、旋转以及其它几何变换等）。在输入数据的空间差异较大的情况下，这个网络可以加入到现有的卷积神经网络中，使得神经网络能够主动对特征进行变换，学习数据在各种空间变换下的不变性，提高分类的准确性。

Spatial Transformers

STN主要可以分为三个部分：1) Localisation Network. 2) Grid Generator. 3） Sampler。Localisation Network用来计算空间变换的参数$\theta$, Grid Generator用来求解输入图像$U\in R^{H\times W \times C}$与输出图像$V\in R^{H'\times W' \times C}$之间的变换关系$\tau_{\theta}$，Sampler根据输入图像$U$和变换关系$\tau_{\theta}$生成最终的输出图像。

Localisation Network

该网络输出变换$\tau_{\theta}$的参数$\theta$，$\theta=f_{loc}(U)$，$f_{loc}()$可以使任何形式的网络，如全连接网络或者一个卷积网络，但是最后应该包含回归层来产生变换参数$\theta$。

Parameterised Sampling Grid

假设$U$每个像素的坐标为$(x_i^s,y_i^s)$，$V$的每个像素坐标为$(x_i^t,y_i^t)$，其射影空间坐标为$(x_i^t,y_i^t,1)$，空间变换函数$\tau_{\theta}$为仿射变换函数，$(x_i^s,y_i^s)$和$(x_i^t,y_i^t)$的对应关系可以写为：

$$ \begin{pmatrix}x_i^s \\ y_i^s\end{pmatrix}=\tau_{\theta}(G_i)=A_{\theta}\begin{pmatrix}x_i^t \\ y_i^t \\ 1\end{pmatrix}=\begin{pmatrix}\theta_{11}&\theta_{12}&\theta_{13}\\ \theta_{21}&\theta_{22}&\theta_{23}\end{pmatrix}\begin{pmatrix}x_i^t \\ y_i^t \\ 1\end{pmatrix} $$

其中$A_{\theta}$为变换矩阵，对于2D仿射变换仅需要6个参数。 #### Sampler 由于放射变换后的$(x^s,y^s)$不一定为整数，而像素位置坐标必须为整数，因此直接简单按照$(x^s,y^s)$从源像素数组中复制像素值是不行的。为了解决像素值缺失问题，必须进行差值，论文中采用的是双线性插值方法。在计算得到$\tau_{\theta}$后，就可以由下列公式得到$V_i^c$。

$$ V_i^c=\sum_v^H \sum_m^W U_{nm}^c \max(0,1-|x_i^s-m|)\max(0,1-|y_i^s-n|)\;where\;i\in [1,H',W'],c\in [1,3] $$

虽然该方法写法简单，但是自循环很多，所以源码采用下列办法，直接计算四个点进行插值。

Bilinear Interplation 如果$(x^s,y^s)$是实数坐标，那么先取整的$P_{11}$，然后沿轴扩展$d$个坐标单位，得到$P_{21}、P_{12}、P_{22}$。一般程序中取$d=1$，式中分母全被消去，再利用图中双线性插值式进行插值，得到$Pixel(x^s,y^s)$的近似值。
Back Propagation 根据图示反向传播路径，那么$V_i^c$对$U$,$x^s$和$y^s$进行求导：

$$ \frac{\partial V_i^c}{\partial U_{nm}^c}=\sum_v^H \sum_m^W \max(0,1-|x_i^s-m|)\max(0,1-|y_i^s-n|) $$

$$ \frac{\partial V_i^c}{\partial x_i^s}=\sum_v^H \sum_m^W U_{nm}^c\max(0,1-|x_i^s-m|)\max(0,1-|y_i^s-n|)\begin{cases} 0 & |m-x_i^s|\geq 1 \\ 1 & m\geq x_i^s\\ -1 & m < x_i^s \end{cases} $$

$\frac{\partial V_i^c}{\partial y_i^s}$与$\frac{\partial V_i^c}{\partial x_i^s}$类似。对$\theta$进行求导得：

$$ \frac{\partial V_i^c}{\partial \theta}=\begin{pmatrix} \frac{\partial V_i^c}{\partial x_i^s} \cdot \frac{\partial x_i^s}{\partial \theta} \\ \frac{\partial V_i^c}{\partial y_i^s} \cdot \frac{\partial y_i^s}{\partial \theta} \end{pmatrix} $$

上面几个部分结合起来就形成了完整的STN。

参考博客：

http://blog.csdn.net/shaoxiaohu1/article/details/51809605
http://www.cnblogs.com/neopenx/p/4851806.html

转载于:https://www.cnblogs.com/huangxiao2015/p/5687037.html

weixin_30539625

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spatial Transformer Networks

Spatial Transformer Networks参考文献：Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015.Abstract...
复制链接

扫一扫