论文阅读 | Spatial Transformer Networks

最新推荐文章于 2022-03-09 19:54:22 发布

weixin_30569153

最新推荐文章于 2022-03-09 19:54:22 发布

阅读量204

点赞数

文章标签：人工智能

原文链接：http://www.cnblogs.com/Zak-NoS/p/10941776.html

版权

max-pooling作用　在一定程度上帮助ＣＮＮ处理空间不变性
在这里插入图片描述

Spatial Transformers

Spatial Transformers 机制分为三个部分:
１．localisational network,输入特征映射，输出Spatial Transformation 参数，
２．用这些参数创造sampling grid,将输入映射通过转化变为transformed map

Localisation Network

Localisation Network将输入特征\(U\in R^{H*W*C}\) ,\(\theta\)是输出，transformation　\(\tau_{\theta}\)是变换参数应用在feature map 上,\(\theta=f_{loc}(U)\),\(\theta\)根据转换类型是可变的，
Localisation Network　\(f_{loc}(U)\),可以是全连接，也可以是ＣＮＮ，但都必须有a final regression layer　来产生transformation parameter \(\theta\)

Parameterised Sampling Grid

每个输出像素通过应用一个中心输入feature map的一个特定的位置的采样kernel，
output pixels依赖于一个grid G,形成输出\(V\in R^{H^{'}*W{'}*C}\)
在这里插入图片描述
a 是 regular grid，\(I\)是单位转移参数，ｂ是an affine transformation

与图形学位置纹理变换相同，将原始坐标经过变换矩阵处理后，转换为目标坐标

用于attention的变换矩阵

Differentiable Image Sampling

在这里插入图片描述
\(\Phi_{x}\),\(\Phi_{y}\)是用于图像插值基本sampling kernel \(k()\)的参数，\(V_{i}^{c}\)是输出值channel c 像素i在\((x_{i}^{t},y_{i}^{t})\),对每个通道的采样是一致的
height and width normalised coordinates

理论上，任何满足对ｘ，ｙ可导的
在这里插入图片描述
上式求偏导：