论文阅读: 图像分类中的注意力机制(attention)

最新推荐文章于 2024-07-29 20:33:45 发布

zhuiqiuk

最新推荐文章于 2024-07-29 20:33:45 发布

阅读量2.1k

点赞数 1

分类专栏： CNN-net-structure

CNN-net-structure 专栏收录该内容

31 篇文章 1 订阅

订阅专栏

本文简要总结一下attention机制在图像分类任务中的应用。attention作为一种机制，有其认知神经或者生物学原理: 注意力的认知神经机制是什么？
如何从生物学的角度来定义注意力？

在计算机视觉领域，注意力机制有各种不同形式的实现，可以大致分为soft attention和hard attention[1]。Soft attention的典型例子是stn[3]，Residual Attention Network[5]和Two-level Attention[2]，这种注意力机制是可微的，可以通过反向传播训练。而Hard attention 需要预测关注的区域，通常使用强化学习来训练，例子见[1]中列举的参考文献。

[2]中集成了三种类型的attention: 提供候选patch的bottom-up, 依据object筛选相关patch的object-level top-down和定位区分性部件的part-level top-down。

[5]这篇文章写得很不错，值得细读。提出了一种Residual attention network，是attention module的堆叠。在每个module中均使用bottom-up top-down结构（参考Stacked hourglass networks[7]）。The bottom-up top-down structure mimics the fast feedforward and feedback attention process。利用残差机制使得网络深度可以进一步扩展。网络结构如下图：

[6]提出了SENet，这个模型获得了ImageNet 2017竞赛 Image Classification任务的冠军。可以看做是channel维度的attention (gating)，参见作者本人的分享：CVPR | ImageNet冠军模型SE-Net详解！
SE block如下图：

[4]提出了一种Deep Attention Selective Network (dasNet)。在训练完成后，通过强化学习（Separable Natural Evolution Strategies）来动态改变attention。具体来说，attention调整的是每个conv filter的权重（和SENet一样有木有，都是channel维度）。policy是一个neural network，RL部分的算法如下：

其中每次while循环代表一次SNES迭代，M表示训练好的CNN，u和Sigma是policy参数对应的分布超参，p是采样p个policy参数，n是随机抽取n张图片。具体看文章，算法解释的很清楚。

最后人体姿态估计[8]等领域也有一些很有趣的文章，后面有必要再细看。值得一提的是，本文介绍的大多数attention (gating) 技巧都可以直接加入现有的网络架构，通过合理设计初始化和训练步骤也可以利用现有网络的预训练参数。这大大扩展了这些技巧的适用范围。

Zhao B, Wu X, Feng J, et al. Diversified visual attention networks
for fine-grained object classification[J]. arXiv preprint
arXiv:1606.08572, 2016.
Xiao T, Xu Y, Yang K, et al. The application of two-level attention models in deep convolutional neural network for fine-grained image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 842-850.
Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks[C]//Advances in Neural Information Processing Systems. 2015: 2017-2025.
Stollenga M F, Masci J, Gomez F, et al. Deep networks with internal selective attention through feedback connections[C]//Advances in Neural Information Processing Systems. 2014: 3545-3553.
Wang F, Jiang M, Qian C, et al. Residual Attention Network for Image Classification[J]. arXiv preprint arXiv:1704.06904, 2017.
Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks[J]. arXiv preprint arXiv:1709.01507, 2017.
Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 483-499.
Chu X, Yang W, Ouyang W, et al. Multi-context attention for human pose estimation[J]. arXiv preprint arXiv:1702.07432, 2017.
---------------------
作者：Wayne2019
来源：CSDN
原文：https://blog.csdn.net/Wayne2019/article/details/78488142
版权声明：本文为博主原创文章，转载请附上博文链接！