Spatial Transform个人理解和总结

最新推荐文章于 2024-04-02 10:58:24 发布

小武哥Pod

最新推荐文章于 2024-04-02 10:58:24 发布

阅读量1k

点赞数

分类专栏：文章文章标签：深度学习

本文链接：https://blog.csdn.net/u014686388/article/details/105764525

版权

文章专栏收录该内容

1 篇文章 0 订阅

订阅专栏

在常见的机器视觉研究中，大家更关注分类、检测、分割等任务，对图像的配准、形变关注度不够，其实deformation这块有很多有意思的东西，尤其在人脸、医学图像等领域可以做出好多有意思的东西。

在这里先放结论：好用，真的好用！

这里围观一下deepmind的这篇研究。

However, due to the typically small spatial support for max-pooling (e.g. 2×2 pixels) this spatial invariance is only realised over a deep hierarchy of max-pooling and convolutions, and the intermediate feature maps (convolutional layer activations) in a CNN are not actually invariant to large transformations of the input data [6, 22]. This limitation of CNNs is due to having only a limited, pre-deﬁned pooling mechanism for dealing with variations in the spatial arrangement of data.

原文中有简单说到invariance，其实也是CNN中需要考虑数据的各种不变性，这里暂时不多解释，而max_pooling恰好存在一定的限制。

原文中提到形变网络的几种应用：

1.图像分类。这个很好理解了，把所有不规则图像或者物体标准化，然后可以通过很简单的规则或者训练达到识别图像的目的，相关研究有很多，比如人脸的表情识别，医学图像的altas-based segmentation。

2. co-localisation。从一组图像中learning出同样感兴趣的目标的过程，即是做了locating。

3. spatial attention。从图像配准的角度考虑就是变换矩阵；从attention角度考虑可以理解为spatial注意力。其中有提到：

A key beneﬁ to fusing attention is that transformed (and so attended), lower resolution inputs can be used in favour of higher resolution raw inputs, resulting in increased computational efﬁciency.

网络主要分为以下几个部分：（原文中为3个）

1.参数拟合，学习输入的特征图，来拟合出一个thelta，这个thelta就是坐标变换参数。6个参数，表示坐标的平移和旋转，线性。

2.网格变化生成，在坐标变换的同时加入像素采样，可以插值，可以临近，可线性可非线性。

小武哥Pod

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spatial Transform个人理解和总结

在常见的机器视觉研究中，大家更关注分类、检测、分割等任务，对图像的配准、形变关注度不够，其实deformation这块有很多有意思的东西，尤其在人脸、医学图像等领域可以做出好多有意思的东西。在这里先放结论：好用，真的好用！这里围观一下deepmind的这篇研究。However, due to the typically small spatial support for max...
复制链接

扫一扫