[论文笔记] Non-local Neural Networks

最新推荐文章于 2023-07-20 11:05:23 发布

_陈麒_

最新推荐文章于 2023-07-20 11:05:23 发布

阅读量1k

点赞数 2

分类专栏：悲惨的读论文生涯文章标签：悲惨的读论文生涯 non-local network

本文链接：https://blog.csdn.net/qq_40079310/article/details/90903383

版权

悲惨的读论文生涯专栏收录该内容

11 篇文章 0 订阅

订阅专栏

[论文笔记] Non-local Neural Networks

论文链接

第一遍初读

回答三个基本问题

这篇论文主要表达的是什么？作者写这篇论文的目的是什么？或者说这篇论文做出了什么成果？

成果：we present non-local operations as a generic family of building blocks for capturing long-range dependencies（我感到这个英文表达意思比较精准，就没有自己翻译）
这篇论文是如何论证的？作者使用了哪些论据来支持观点或者成果。

首先，作者先举出Convolutional and recurrent operations 的缺点-低效性，优化难还有迭代建立模型带来的信息来回传输难。

其次，表示，non-local operations 可以解决上面的部分缺点，而且表明了这种操作的优点- 直接计算 long-range dependecies, 仅需要一些layer就可以得到最优解还有输出和输入大小相同。

最后，作者展示了这种操作在音频分类中的高效性。此外，由于这是种 a generi family family of building blocks ,故在其他领域如静态图片识别，物体检测，分割和姿势估计上取得了很好的效果。
相关的工作有哪些？在这篇论文之前有什么已经成立或者获得的成果？作者的观点/成果主要是建立在哪个的基础上？作者的观点/成果和别人的突出在哪里？

建立的基础
1. Non-local image processing
2. Graphical models
  
  不同点： In contrast, our method is a simpler feedforward block for computing non-local filtering.
  Unlike these methods that were developed for segmentation, our general-purpose component is applied for classification and detection.
3. Feedforward modeling for sequences
4. Self-attention
  
  不同点： this sense our work bridges self-attention for machine translation to the more general class of non-local filtering operations that are applicable to image and video problems in computer vision.
5. Interaction network
6. Video classification architectures
优点，上面第二问提到了，就不多叙述了。

第二篇精读点

Related Work

Non-local image processing

Non-local means [4] is a classical filtering algorithm that computes a weighted mean of
all pixels in an image。
Graphical models

Long-range dependencies can be modeled by graphical models such as conditional random fields
(CRF) .In contrast, our method is a simpler feedforward block for computing non-local filtering.Unlike these methods that were developed for segmentation,our general-purpose component is applied for classification and detection.
Feedforward modeling for sequences

These feedforward models areamenable to parallelized implementations and can be more efficient than widely used recurrent models
Self-attention

Our work is related to the recent selfattention [49] method for machine translation.

in this sense our work bridges self-attention for machine translation to the more general class of non-local filtering operations that are applicable to image and video problems in computer vision
Interaction networks

Interaction Networks (IN) [2, 52] were proposed recently for modeling physical systems
Video classification architectures

f(),g()的思考和Non-local Block解析

在这里插入图片描述
上面的公式中，输入是x，输出是y，i和j分别代表输入的某个空间位置，x_i是一个向量，维数跟x的channel数一样，f是一个计算任意两点相似关系的函数，g是一个映射函数，将一个点映射成一个向量，可以看成是计算一个点的特征。也就是说，为了计算输出层的一个点，需要将输入的每个点都考虑一遍

其中有趣的是，作者通过实验表明，非局部模型对这些f()和g()函数的选择并不敏感，这表明一般的非局部行为是观察到的改进的主要原因。

下面来介绍下文中提到的f()和g()

为了简化问题，作者简单地设置g函数为一个1*1的卷积。相似性度量函数f的选择有多种：

Gaussian:
Embedded Gaussian:
Dot Product:
Concatenation:

其中后俩种Dot Product 和 Concatenation 是为了说明the attentional behavior (due to softmax) is not essential in the applications we study. **

然后作者对Eq(1)进行封装到一个可以合并到许多现有架构中的non-local block**，公式如上。

下面接用一个图来解释non - local block 该图来源自博客

在这里插入图片描述

然后介绍了一些可以减少计算量的窍门

思考

local含义

Local这个词主要是针对感受野(receptive field)来说的。以卷积操作为例，它的感受野大小就是卷积核大小，而我们一般都选用3 x 3，5 x 5之类的卷积核，它们只考虑局部区域，因此都是local的运算。同理，池化(Pooling)也是。相反的，non-local指的就是感受野可以很大，而不是一个局部领域。