STN -Spatial Transformer Networks 空间转换网络

最新推荐文章于 2024-11-04 19:10:54 发布

761527200

最新推荐文章于 2024-11-04 19:10:54 发布

阅读量1.3k

点赞数 10

文章标签： CNN Spatial transformer STN

本文链接：https://blog.csdn.net/qq_39426225/article/details/90482099

版权

本文指出CNN网络识别数据缺少空间转换能力，作者提出STN（空间转换网络）解决该问题。介绍了新的可学习模块空间转换器，可插入已有卷积结构提供空间转换能力。阐述了空间转换网络的三个主要部分，完成前向传播，最终提升了识别准确率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.Comprehensive narrative（综述部分）

（1）CNN网络有一个显著的缺点就是对于识别数据缺少一定的空间转换能力，比如你正着，斜着，倒着看你自己的水杯都可以知道这是你的水杯而CNN却不一定行。如下图：

你一定知道这全部是数字2
在这里插入图片描述
你一定知道这全部是数字4 ！

基于上述的原因，本文作者给出了STN（空间转换网络）针对解决CNN缺少的空间转换能力

2.Abstract

本文摘要主要讲述了CNN缺少使输入的数据保持空间不变，作者给出了新的可学习的模块-空间转换器。这个模块可以直接插入已经存在的卷积结构并提供给该网络结构进行空间转换的能力，此外，该模块不需要额外的监督训练和修改优化。原文如下：

Convolutional Neural Networks define an exceptionally powerful class of models,but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. We show that the use of spatial transformers results in models which learn invariance to translation, scale, rotation and more generic warping, resulting in state-of-the-art performance on several benchmarks, and for a number of classes of transformations.

3.Introduction

（1）虽然池化层提供了一些空间变换能力，但是由于池化感受野窗口大小一般只有2×2大小的窗口只会对深层的卷积和特征有作用。
（2）空间转换可以在整个特征图上进行进行缩放，修剪，旋转和一些非刚性变换。
（3）空间转换换可以对图像进行高相关性选取（类似Attention）和规范化（后文有涉及）。
（4）空间转换可以组装进CNN中使很多任务受益：1.图像分类 2.共同定位 3.空间Attention

4.Spatial transformers (Essential Point)

在空间转换网络中主要分以下3个部分（1）Localisation Network
（2）Parameterized Sampling Grid （3）Differentable Image Sampling

（1）localisation network 主要完成回归仿射转换矩阵theta（theta中包括旋转，平移，缩放等参数），其中该网络即可以是全连接网络也可以是卷积网络

（2）Parameterized Sampling Grid 主要生成和图片像素点一致的采样网格，并与theta矩阵相乘逐渐学习到完全对应倾斜识别物

（1）Differentable Image Sampling 主要是通过获取采样点对应的原图像像素点形成 V 特征图完成对 V 特征图的输出
在这里插入图片描述

（1）Localisation Network

Localisation Network 这个部分对应着是回归预测θ仿射变换系数,其中θ为一个6维参数用于对特征图进行转换。

The localisation network takes the input feature map $U ∈ R^{H×W×C}$ with width W, height H and C channels and outputs θ, the parameters of the transformation $T_θ$ to be applied to the feature map: θ = f loc (U). The size of θ can vary depending on the transformation type that is parameterised, e.g. for an affine transformation θ is 6-dimensional as in (1).

#读取图片
input_img = np.concatenate([img1, img2, img3, img4], axis=0)

B, H, W, C = input_img.shape

print("Input Img Shape: {}".format(input_img.shape))

# identity transform
theta = np.array([[1., 0, 0], [0, 1., 0]])

x = tf.placeholder(tf.float32, [None, H, W, C])

with tf.variable_scope('spatial_transformer'):
    theta = theta.astype('float32')
    theta = theta.flatten()

    # 定义可优化参数变形θ的权重和偏置
    loc_in = H*W*C
    loc_out = 6
    W_loc = tf.Variable(tf.zeros([loc_in, loc_out]), name='W_loc')
    b_loc = tf.Variable(initial_value=theta, name='b_loc')
    
    # fc_loc就是文中所提及的可训练变形参数θ，B为样本的batch_szie
    # θ的shape=[B,H*W*C] * [H*W*C,6]+[6]=[B,6]
    
    fc_loc = tf.matmul(tf.zeros([B, loc_in]), W_loc) + b_loc

接下来传入image/features map 以及theta参数进入Spatial Transformer

def spatial_transformer_network(input_fmap, theta, out_dims=None, **kwargs):
   
    # grab input dimensions
    B = tf.shape(input_fmap)[0]
    H = tf.shape(input_fmap)[1]
    W = tf.shape(input_fmap)[2]

    # reshape theta to (B, 2, 3)
    theta = tf.reshape(theta, [B, 2, 3])

    # generate grids of same size or upsample/downsample if specified
    # 如果有降采样或重采样的要求传入 out_dims 一般是elese语句：
    if out_dims:
        out_H = out_dims[0]
        out_W = out_dims[1]
        
        #进入Parameterised Sampling Grid
        batch_grids = affine_grid_generator(out_H, out_W, theta)
    else:
        batch_grids = affine_grid_generator(H, W, theta)

    x_s = batch_grids[:, 0, :, :]
    y_s = batch_grids[:, 1, :, :]

    # sample input with grid to get output
    out_fmap = bilinear_sampler(input_fmap, x_s, y_s)

    return out_fmap

（2）Parameterised Sampling Grid

对于仿射变换，如果直接由仿射变换系数θ对输入(x,y)求解得到输出坐标点( $x^{target}$ , $y^{target}$ )是非整数的，因此需要对考虑逆向仿射变换。所谓逆向仿射变换就是首先根据仿射变换输出的大小，生成输出的坐标网格点(下面代码中有涉及).例如Ｖ的大小为10×1010×10时，我们便可以得到一个10×1010×10大小的坐标位置点矩阵，接下来就是要对该坐标位置点进行仿射变换，仿射变换公式及示意图如下：