[CNN感知野]A guide to receptive field arithmetic for CNN

最新推荐文章于 2024-05-23 07:22:40 发布

BojackHorseman

最新推荐文章于 2024-05-23 07:22:40 发布

阅读量5.6k

点赞数 6

分类专栏： deep-learning cnn

deep-learning 同时被 2 个专栏收录

14 篇文章 1 订阅

订阅专栏

cnn

6 篇文章 0 订阅

订阅专栏

A guide to receptive field arithmetic for CNN

原文：click here

CNN的两大神器之一，局部感知野（另一个是权值共享）。一般认为，人类的对外界的感知都是从局部到全局的，而对于图像来说，也是局部的像素在空间上的联系较为紧密（比如离得近的像素可能具有相同的颜色纹理），距离较远的像素相关性较弱。因而，每个神经元没有必要对全局进行感知，只需要对局部进行感知，然后在高层将这些局部信息组合起来，就得到了全局信息。所以说，局部感知野就是指卷积层的神经元只和上一层的feature
map局部相联系。

感知野在CNN中是一个非常重要的概念之一。现有state-of-art的目标识别方法都是围绕这一点展开模型构建的。这个guide介绍了一种新的视觉化CNN中揭露了感知信息的feature map的方法，并且有完整的可以用于任何CNN架构的计算。作者同时给出了一个简单的程序来演示如何计算。

pre-reading：A guide to convolution arithmetic for deep learning

The fixed-sized CNN feature map visulization

感知野：特定CNN特征的输入空间中的一个区域。
一个感知野的特征可以被它中心location和他的size完整的描述。

这里写图片描述
通过卷积（kernel_size k=3x3，padding_size p=1x1, stride s = 2x2），我们可以得到输出的featuremap 3x3（绿色），采用相同的卷积核在3x3的featuremap上继续卷积，我们可以得到2x2的featuremap（橙色）。每个维度输出feature map的数量可以用下列公式计算：

这里写图片描述

在本文中，为了简化，假设CNN结构是对称的，输入的图片也是square的。所以两个维度对于所有的变量都有相同的值，如果CNN结构或者输入图片不对称，可以分开计算feature map的属性。

如图一，左边代表一般可视化CNN feature map 的方法。在此方法中，通过观察feature map我们可以知道他包含了多少特征，但是我们很难知道每个特征具体在哪里（感受野center）、那个区域有多大（感受野size）。右边展示了固定size的CNN可视化，从而能够使所有尺度的feature map大小保持不变且都和输入图像一致。每个特征用center location标记，因为所有feature map当中的feature都有同样大小的感受野size，我们可以简单的画出bounding box来代表这一个感受野。feature map和输入层相同size，那么我们就不必将bounding box一直向底层映射了。

这里写图片描述

图二，另一个例子，采用相同的卷积核但是应用于一个大一点的输入图片（7x7）。同样的，我们可以画出固定size的CNN feature map。图二的感知野size增长很快，第二层feature layer就几乎覆盖了整个输入图像。This is an important insight which was used to improve the design of a deep CNN.

Receptive Field Arithmetic

为了计算每一层的感受野，除了每一维度的特征数量n，还需要一些额外的信息：

当前感受野size： $r$
相邻特征的距离（jump）： $j$
左上角feature的中心坐标： $start$

Note that the center coordinate of a feature is defined to be the center coordinate of its receptive field, as shown in the fixed-sized CNN feature map above.
输出层计算公式：

这里写图片描述

第一个等式，基于输入特征数量和卷积特性，计算输出特征数量。
第二个等式，计算输出特征之间的jump，等于输入map的jump乘以输入特征数。
第三个等式，计算输出feature map的感受野size，等于覆盖了k个输入特征 $(k-1)\times j_{in}$ 加其他被输入feature感受野覆盖区域。
第四个等式，计算第一个输出feature的感受野center position。等于第一个输入特征的中心位置加上从第一个输入特征到第一个卷积核 $(k-1)/2\times j_{in}$ 减去padding space $p\times j_{in}$ 。（乘以jump是为了得到真实的距离/空间）

第一层为输入层，则有

n = i m g_s i z e

$n=img\_size$

r = 1

$r=1$

j = 1

$j=1$

s t a r t = 0.5

$start=0.5$
通过上述公式可知，坐标系原点选在左上角feature的center，递归的调用上述四个公式，得到CNN中所有feature map的感受野信息。

这里写图片描述

python example

输入任何feature map的名字和在map中的索引，返回对应感受野的size及location。
这里写图片描述

# [filter size, stride, padding]
#Assume the two dimensions are the same
#Each kernel requires the following parameters:
# - k_i: kernel size
# - s_i: stride
# - p_i: padding (if padding is uneven, right padding will higher than left padding; "SAME" option in tensorflow)
# 
#Each layer i requires the following parameters to be fully represented: 
# - n_i: number of feature (data layer has n_1 = imagesize )
# - j_i: distance (projected to image pixel distance) between center of two adjacent features
# - r_i: receptive field of a feature in layer i
# - start_i: position of the first feature's receptive field in layer i (idx start from 0, negative means the center fall into padding)

import math
convnet =   [[11,4,0],[3,2,0],[5,1,2],[3,2,0],[3,1,1],[3,1,1],[3,1,1],[3,2,0],[6,1,0], [1, 1, 0]]
layer_names = ['conv1','pool1','conv2','pool2','conv3','conv4','conv5','pool5','fc6-conv', 'fc7-conv']
imsize = 227

def outFromIn(conv, layerIn):
  n_in = layerIn[0]
  j_in = layerIn[1]
  r_in = layerIn[2]
  start_in = layerIn[3]
  k = conv[0]
  s = conv[1]
  p = conv[2]

  n_out = math.floor((n_in - k + 2*p)/s) + 1
  actualP = (n_out-1)*s - n_in + k 
  pR = math.ceil(actualP/2)
  pL = math.floor(actualP/2)

  j_out = j_in * s
  r_out = r_in + (k - 1)*j_in
  start_out = start_in + ((k-1)/2 - pL)*j_in
  return n_out, j_out, r_out, start_out

def printLayer(layer, layer_name):
  print(layer_name + ":")
  print("\t n features: %s \n \t jump: %s \n \t receptive size: %s \t start: %s " % (layer[0], layer[1], layer[2], layer[3]))

layerInfos = []
if __name__ == '__main__':
#first layer is the data layer (image) with n_0 = image size; j_0 = 1; r_0 = 1; and start_0 = 0.5
  print ("-------Net summary------")
  currentLayer = [imsize, 1, 1, 0.5]
  printLayer(currentLayer, "input image")
  for i in range(len(convnet)):
    currentLayer = outFromIn(convnet[i], currentLayer)
    layerInfos.append(currentLayer)
    printLayer(currentLayer, layer_names[i])
  print ("------------------------")
  layer_name = raw_input ("Layer name where the feature in: ")
  layer_idx = layer_names.index(layer_name)
  idx_x = int(raw_input ("index of the feature in x dimension (from 0)"))
  idx_y = int(raw_input ("index of the feature in y dimension (from 0)"))

  n = layerInfos[layer_idx][0]
  j = layerInfos[layer_idx][1]
  r = layerInfos[layer_idx][2]
  start = layerInfos[layer_idx][3]
  assert(idx_x < n)
  assert(idx_y < n)

  print ("receptive field: (%s, %s)" % (r, r))
  print ("center: (%s, %s)" % (start+idx_x*j, start+idx_y*j))