# A guide to receptive field arithmetic for CNN

CNN的两大神器之一，局部感知野（另一个是权值共享）。一般认为，人类的对外界的感知都是从局部到全局的，而对于图像来说，也是局部的像素在空间上的联系较为紧密（比如离得近的像素可能具有相同的颜色纹理），距离较远的像素相关性较弱。因而，每个神经元没有必要对全局进行感知，只需要对局部进行感知，然后在高层将这些局部信息组合起来，就得到了全局信息。所以说，局部感知野就是指卷积层的神经元只和上一层的feature
map局部相联系。

## Receptive Field Arithmetic

• 当前感受野size：r$r$
• 相邻特征的距离（jump）：j$j$
• 左上角feature的中心坐标：start$start$

Note that the center coordinate of a feature is defined to be the center coordinate of its receptive field, as shown in the fixed-sized CNN feature map above.

• 第一个等式，基于输入特征数量和卷积特性，计算输出特征数量。
• 第二个等式，计算输出特征之间的jump，等于输入map的jump乘以输入特征数。
• 第三个等式，计算输出feature map的感受野size，等于覆盖了k个输入特征(k1)×jin$(k-1)\times j_{in}$加其他被输入feature感受野覆盖区域。
• 第四个等式，计算第一个输出feature的感受野center position。等于第一个输入特征的中心位置加上从第一个输入特征到第一个卷积核(k1)/2×jin$(k-1)/2\times j_{in}$减去padding space p×jin$p\times j_{in}$。（乘以jump是为了得到真实的距离/空间）

n=img_size
r=1
j=1
start=0.5

## python example

# [filter size, stride, padding]
#Assume the two dimensions are the same
#Each kernel requires the following parameters:
# - k_i: kernel size
# - s_i: stride
#
#Each layer i requires the following parameters to be fully represented:
# - n_i: number of feature (data layer has n_1 = imagesize )
# - j_i: distance (projected to image pixel distance) between center of two adjacent features
# - r_i: receptive field of a feature in layer i
# - start_i: position of the first feature's receptive field in layer i (idx start from 0, negative means the center fall into padding)

import math
convnet =   [[11,4,0],[3,2,0],[5,1,2],[3,2,0],[3,1,1],[3,1,1],[3,1,1],[3,2,0],[6,1,0], [1, 1, 0]]
layer_names = ['conv1','pool1','conv2','pool2','conv3','conv4','conv5','pool5','fc6-conv', 'fc7-conv']
imsize = 227

def outFromIn(conv, layerIn):
n_in = layerIn[0]
j_in = layerIn[1]
r_in = layerIn[2]
start_in = layerIn[3]
k = conv[0]
s = conv[1]
p = conv[2]

n_out = math.floor((n_in - k + 2*p)/s) + 1
actualP = (n_out-1)*s - n_in + k
pR = math.ceil(actualP/2)
pL = math.floor(actualP/2)

j_out = j_in * s
r_out = r_in + (k - 1)*j_in
start_out = start_in + ((k-1)/2 - pL)*j_in
return n_out, j_out, r_out, start_out

def printLayer(layer, layer_name):
print(layer_name + ":")
print("\t n features: %s \n \t jump: %s \n \t receptive size: %s \t start: %s " % (layer[0], layer[1], layer[2], layer[3]))

layerInfos = []
if __name__ == '__main__':
#first layer is the data layer (image) with n_0 = image size; j_0 = 1; r_0 = 1; and start_0 = 0.5
print ("-------Net summary------")
currentLayer = [imsize, 1, 1, 0.5]
printLayer(currentLayer, "input image")
for i in range(len(convnet)):
currentLayer = outFromIn(convnet[i], currentLayer)
layerInfos.append(currentLayer)
printLayer(currentLayer, layer_names[i])
print ("------------------------")
layer_name = raw_input ("Layer name where the feature in: ")
layer_idx = layer_names.index(layer_name)
idx_x = int(raw_input ("index of the feature in x dimension (from 0)"))
idx_y = int(raw_input ("index of the feature in y dimension (from 0)"))

n = layerInfos[layer_idx][0]
j = layerInfos[layer_idx][1]
r = layerInfos[layer_idx][2]
start = layerInfos[layer_idx][3]
assert(idx_x < n)
assert(idx_y < n)

print ("receptive field: (%s, %s)" % (r, r))
print ("center: (%s, %s)" % (start+idx_x*j, start+idx_y*j))

©️2019 CSDN 皮肤主题: 编程工作室 设计师: CSDN官方博客