1. 感受野
The receptive field is defined as the region in the input space that a particular CNN’s feature is looking at (i.e. be affected by).
感受野是在输入空间内,CNN提取的特征的来源区域的大小。
It is defined as the window size of input voxels that affects one particular output voxel.
感受野可以用其中心位置和尺寸来描述。
在一个感受野内,越靠近中心位置的像素对CNN特征的贡献越多。这说明CNN特征不仅具有感受野,而且在感受野内部还会根据距离施加权重。
上图给出了一些CNN特征图(Feature Map)的示例。输入图像尺寸5x5,卷积核3x3,padding=1,步长为2。第一次卷积得到3x3的结果,如图中绿色图所示。再卷积一次得到2x2的结果,如图中黄色图所示。
卷积前后图像的尺寸计算公式为:
图1中左列是很常见的CNN特征图的表示方式,可以直接看出特征图的数量,但不能看出感受野的中心位置,也不能看出感受野的大小。
图1中右列将CNN特征图的尺寸扩展成与输入图像相等,每个特征都对应于卷积计算时的卷积核中心位置,并且画出了每个特征所对应的感受野的大小。
我们用下图所示的一维卷积过程做进一步说明:
上图中 k = 3 , s = 1 k=3, s=1 k=3,s=1,第一次卷积时,每一个卷积的输出结果都包含输入图像的三个位置信息。第二次卷积时,每一个结果包含了输入图像的五个位置信息。继续增加卷积层数,会增大感受野。
2. 计算方法
上图,式1计算的是卷积后特征的数量。
式2计算相邻特征的间距。等于输入特征的间距 j i n j_{in} jin乘以跳过的特征的数量 s s s。
式3计算感受野的尺寸。等于被卷积核多覆盖的区域 ( k − 1 ) ∗ j i n (k-1)*j_{in} (k−1)∗jin加上原本输入特征的感受野 r i n r_{in} rin。这里 ( k − 1 ) ∗ j i n (k-1)*j_{in} (k−1)∗jin是由 2 ∗ k − 1 2 ∗ j i n 2*\dfrac{k-1}{2}*j_{in} 2∗2k−1∗jin计算而来,即卷积核在边缘位置的扩展部分的大小。
式4计算第一个特征感受野的中心位置。start是左上角特征的中心坐标。
CNN网络的第一层是输入层,总有 n = i m a g e s i z e n = image~size n=image size, r = 1 r=1 r=1, j = 1 j=1 j=1, s t a r t = 0.5 start = 0.5 start=0.5。如下图所示:
上图计算了两次卷积后,特征图的感受野以及中心位置。
3. Python 代码
下面的代码可以方便计算:
# [filter size, stride, padding]
#Assume the two dimensions are the same
#Each kernel requires the following parameters:
# - k_i: kernel size
# - s_i: stride
# - p_i: padding (if padding is uneven, right padding will higher than left padding; "SAME" option in tensorflow)
#
#Each layer i requires the following parameters to be fully represented:
# - n_i: number of feature (data layer has n_1 = imagesize )
# - j_i: distance (projected to image pixel distance) between center of two adjacent features
# - r_i: receptive field of a feature in layer i
# - start_i: position of the first feature's receptive field in layer i (idx start from 0, negative means the center fall into padding)
import math
convnet = [[11,4,0],[3,2,0],[5,1,2],[3,2,0],[3,1,1],[3,1,1],[3,1,1],[3,2,0],[6,1,0], [1, 1, 0]]
layer_names = ['conv1','pool1','conv2','pool2','conv3','conv4','conv5','pool5','fc6-conv', 'fc7-conv']
imsize = 227
def outFromIn(conv, layerIn):
n_in = layerIn[0]
j_in = layerIn[1]
r_in = layerIn[2]
start_in = layerIn[3]
k = conv[0]
s = conv[1]
p = conv[2]
n_out = math.floor((n_in - k + 2*p)/s) + 1
actualP = (n_out-1)*s - n_in + k
pR = math.ceil(actualP/2)
pL = math.floor(actualP/2)
j_out = j_in * s
r_out = r_in + (k - 1)*j_in
start_out = start_in + ((k-1)/2 - pL)*j_in
return n_out, j_out, r_out, start_out
def printLayer(layer, layer_name):
print(layer_name + ":")
print("\t n features: %s \n \t jump: %s \n \t receptive size: %s \t start: %s " % (layer[0], layer[1], layer[2], layer[3]))
layerInfos = []
if __name__ == '__main__':
#first layer is the data layer (image) with n_0 = image size; j_0 = 1; r_0 = 1; and start_0 = 0.5
print ("-------Net summary------")
currentLayer = [imsize, 1, 1, 0.5]
printLayer(currentLayer, "input image")
for i in range(len(convnet)):
currentLayer = outFromIn(convnet[i], currentLayer)
layerInfos.append(currentLayer)
printLayer(currentLayer, layer_names[i])
print ("------------------------")
layer_name = raw_input ("Layer name where the feature in: ")
layer_idx = layer_names.index(layer_name)
idx_x = int(raw_input ("index of the feature in x dimension (from 0)"))
idx_y = int(raw_input ("index of the feature in y dimension (from 0)"))
n = layerInfos[layer_idx][0]
j = layerInfos[layer_idx][1]
r = layerInfos[layer_idx][2]
start = layerInfos[layer_idx][3]
assert(idx_x < n)
assert(idx_y < n)
print ("receptive field: (%s, %s)" % (r, r))
print ("center: (%s, %s)" % (start+idx_x*j, start+idx_y*j))
输出结果:
4. 反卷积的感受野计算方法
反卷积使得特征间的间距 j j j缩小,但仍然会使感受野增大。
对于一个 k = 2 , p