论文阅读：Understanding the Effective Receptive Field in Deep Convolutional Neural Networks

最新推荐文章于 2025-01-06 16:32:34 发布

贾小树

最新推荐文章于 2025-01-06 16:32:34 发布

阅读量6k

点赞数 12

分类专栏：论文阅读深度学习

本文链接：https://blog.csdn.net/j879159541/article/details/101211358

版权

论文阅读同时被 2 个专栏收录

74 篇文章

订阅专栏

深度学习

5 篇文章

订阅专栏

文章目录

论文总述
2、感受野的定义
3、理论感受野大小的计算
4、3种增加感受野的操作
5、感受野中心像素的值对输出feature map 的response影响比边缘的像素更大
6、如何知道feature map上的点受谁的影响
7、 Comparing the effect of number of layers, random weight initialization and nonlinear activation on the ERF.
8、ERF √n absolute growth and 1/√n relative shrinkage
9、How the ERF evolves during training
10、Reduce the Gaussian Damage（增大有效感受野）
11、ERF小于RF的一种解释maybe

论文总述

这篇论文主要研究的是有效的感受野，发现有效的感受野只是理论感受野的一部分，并且呈高斯分布，理论感受野可以大于原图，但有效的感受野一般都小于原图尺寸，另一个有意思的地方是有效感受野的大小经过训练之后是可以变大的，论文中有实验表明。

记录下这篇论文，并不是因为学到很多东西（当然这篇论文里有些东西，但理解不了），而是因为感受野这个东西很重要，好多网路都是设计了有效的感受野，让其可以覆盖目标足够的有效信息，然后训练，网络才会work，如果感受野没有达到那么大，但你又非要让他学到那么东西，那网络应该很难work，SSD中的先验框以及faster rcnn中的anchor应该都有这个意思，自己瞎猜的，这篇论文里提到在图像分割里感受野更重要，需要更合适；另一个原因是这篇论文证明有效的感受野呈高斯分布时的理论推导用到了傅里叶变换，这是我目前看到的数学公式最多的CNN的论文。

另一点就是最近在看SiamDW时，里面也说到了感受野的大小很重要，一般最后一层对应着模板图像Z的60%到80%，这感受野就挺合适，虽然理论上可以大于原图，但不能这么做，这样跟踪效果不好。

2、感受野的定义

One of the basic concepts in deep CNNs is the receptive field, or field of view, of a unit in a certain layer in the network. Unlike in fully connected networks, where the value of each unit depends on the
entire input to the network, a unit in convolutional networks only depends on a region of the input.
This region in the input is the receptive field for that unit.

在卷积神经网络CNN中，决定某一层输出结果中一个元素所对应的输入层的区域大小，被称作感受野receptive field。

3、理论感受野大小的计算

i) 这种方式为从后往前推，即计算哪层的感受野，就把它当做网络的最后一层，然后往前推导；
在这里插入图片描述

【注】：

最后一层（卷积层或池化层）输出特征图感受野的大小等于卷积核的大小。
第i层卷积层的感受野大小和第i层的卷积核大小和步长有关系，同时也与（i+1）层感受野大小有关。
计算感受野的大小时忽略了图像边缘的影响，即不考虑padding的大小。（？）

ii) 这种是从前往后推，逐层计算感受野的大小
在这里插入图片描述

在这里插入图片描述

nout表示feature map的尺寸，j表示jump，原图的jump=1，rout表示感受野大小，start为输出feature map感受野的中心位置。

iii） 一步到位，只计算感受野，也是从前往后推

在这里插入图片描述

4、3种增加感受野的操作

i) 增加网络的层数，加深网络深度。理论上可以线性地增大感受野，每一层产生的感受野的增量为卷积核的大小。
ii) 下采样: 下采样可以成倍地增大感受野。
iii) dilate ：dilate操作可以成倍地增大感受野

Here we consider the effect of some standard CNN approaches on the effective receptive field.
Dropout is a popular technique to prevent overfitting; we show that dropout does not change the
Gaussian ERF shape. Subsampling and dilated convolutions turn out to be effective ways to increase
receptive field size quickly. Skip-connections on the other hand make ERFs smaller. We present the
analysis for all these cases in the Appendix.

5、感受野中心像素的值对输出feature map 的response影响比边缘的像素更大

In particular, we discover that not all pixels in a receptive field contribute
equally to an output unit’s response. Intuitively it is easy to see that pixels at the center of a receptive field have a much larger impact on an output. In the forward pass, central pixels can propagate
information to the output through many different paths, while the pixels in the outer area of the
receptive field have very few paths to propagate its impact. In the backward pass, gradients from an
output unit are propagated across all the paths, and therefore the central pixels have a much larger
magnitude for the gradient from that output.

6、如何知道feature map上的点受谁的影响

The measure of impact we use in this paper is the partial derivative ∂y0,0/∂x0
i,j . It measures how
much y0,0 changes as x0
i,j changes by a small amount; it is therefore a natural measure of the
importance of x0
i,j with respect to y0,0. However, this measure depends not only on the weights of
the network, but are in most cases also input-dependent, so most of our results will be presented in
terms of expectations over input distribution.

用偏导数的值来measure

7、 Comparing the effect of number of layers, random weight initialization and nonlinear activation on the ERF.

在这里插入图片描述

8、ERF √n absolute growth and 1/√n relative shrinkage

在这里插入图片描述

In Fig. 2, we show the change of ERF size and
the relative ratio of ERF over theoretical RF w.r.t number of convolution layers.

增加的是有效的感受野，下降的是有效的感受野相对理论感受野的比率。

9、How the ERF evolves during training

For both tasks, we
adopt the ResNet architecture which makes extensive use of skip-connections. As the analysis shows,
the ERF of this network should be significantly smaller than the theoretical receptive field. This is
indeed what we have observed initially. Intriguingly, as the networks learns, the ERF gets bigger, and
at the end of training is significantly larger than the initial ERF.

In Fig. 3 we show the effective
receptive field on the 32×32 image space at the beginning of training (with randomly initialized
weights) and at the end of training when it reaches best validation accuracy. Note that the theoretical
receptive field of our network is actually 74 × 74, bigger than the image size, but the ERF is still not
able to fully fill the image. Comparing the results before and after training, we see that the effective
receptive field has grown significantly.

10、Reduce the Gaussian Damage（增大有效感受野）

New Initialization.

One simple way to increase the effective receptive field is to manipulate the
initial weights. We propose a new random weight initialization scheme that makes the weights at the
center of the convolution kernel to have a smaller scale, and the weights on the outside to be larger;
this diffuses the concentration on the center out to the periphery

We note that no matter what we do to change w(m), the effective receptive field is still distributed （只能部分解决，收益比较小）
like a Gaussian so the above proposal only solves the problem partially.

Architectural changes.

A potentially better approach is to make architectural changes to the CNNs,
which may change the ERF in more fundamental ways. For example, instead of connecting each unit
in a CNN to a local rectangular convolution window, we can sparsely connect each unit to a larger （也许可以不用矩形的卷积核）
area in the lower layer using the same number of connections. Dilated convolution [21] belongs to（膨胀卷积）
this category, but we may push even further and use sparse connections that are not grid-like.

11、ERF小于RF的一种解释maybe

In our analysis we have established that the effective
receptive field in deep CNNs actually grows a lot slower than we used to think. This indicates
that a lot of local information is still preserved even after many convolution layers.

However, if the ERF is smaller than the RF, this suggests that
representations may retain position information, and also raises an interesting question concerning （有可能是保留了位置信息）
changes in the size of these fields during development.

#参考文献

1、Paper Reading: 理解感受野

2、卷积神经网络中感受野的详细介绍

3、卷积神经网络中的感受野计算详细指南

4、对CNN感受野一些理解