pytorch vgg网络思考

为什么使用3x3的小的卷积核,是否能够达到5x5, 7x7,11x11卷积核的效果

  1. 之所以使用3x3,而非2x2,1x1的核,是因为(3x3)是捕获图像中左/右,上/下,中心概念的最小尺寸
  2. 两层3x3的卷积层的叠加的感受野,相当于一个5x5的卷积的感受野;三层3x3的卷积的感受野,相当于一个7x7的卷积的感受野(限定条件,每个卷积具有相同的padding = [1,1], stride = [1,1]),感受野的计算公式,见下节
  3. 有参数量的计算公式可知,两个3x3的卷积本身的参数量为3x3x2 = 18,一个5x5卷积的参数量为25;三个卷积3x3卷积的参数量为27,一个7x7卷积的参数量为49

感受野

感受野的计算公式

感受野计算网站实现了对每层卷积层计算感受野大小。

计算公式定义如下:

参数意义英文
nfeature map的数量num of features
p卷积核尺寸convolution kernel size
k卷积核padding尺寸convolution padding size
s卷积核stride尺寸convolution stride size
r感受野的大小receptive field size
j两个相邻特征值之间的跨度jump(distance between two consecutive features)
start特征图中第一个特征的在原图上中心坐标center coordinate of the first feature

feature map输出尺寸公式:
n o = n i + 2 p − k s n_o=\frac{n_i+2p-k}{s} no=sni+2pk

特征图跨度公式:
j o = j i × s j_o = j_i\times s jo=ji×s

感受野公式:
r o = r i + ( k − 1 ) × j i r_o=r_i+(k-1)\times j_i ro=ri+(k1)×ji

我们来计算下两个3x3的卷积,和5x5的卷积,在输入图片input_size = 224, p = 1, s = 1的情况下的感受野输出

初始值:

#对输入图片,由以下值,即初始的感受野大小和特征之间的跨度都为1
imsize = 224
j_i = 1
r_i = 1

两个3x3卷积的感受野大小:

  • 第一个3x3卷积的感受野大小

r o = r i + ( k − 1 ) × j i = 1 + ( 3 − 1 ) × 1 = 3 r_o =r_i+(k-1)\times j_i =1 + (3-1)\times 1 =3 ro=ri+(k1)×ji=1+(31)×1=3

  • 第二个3x3卷积的感受野输出

r o = r i + ( k − 1 ) × j i = 3 + ( 3 − 1 ) × 1 = 5 r_o =r_i+(k-1)\times j_i =3 + (3-1)\times 1 =5 ro=ri+(k1)×ji=3+(31)×1=5

单个5x5卷积的感受野大小:

r o = r i + ( k − 1 ) × j i = 1 + ( 5 − 1 ) × 1 = 5 r_o =r_i+(k-1)\times j_i =1 + (5-1)\times 1 =5 ro=ri+(k1)×ji=1+(51)×1=5

由此可见,在步长stride和padding一致的条件下,两个3x3的卷积和一个5x5的卷积,感受野是一样的
同样可以计算3个3x3的卷积的感受野大小为7,同一个7x7的卷积的感受野相同

感兴趣的区域计算方式

每层卷积处理后,会输出对应的feature_map,每个fature_map上每一个特征点会对应一个感受野大小,那这个感受野区域在原图的位置,又是怎么计算的?

感受野的大小我们已经知道,如果再求出该感受野区域在原图中映射的中心点,就可以以求出感受野区域了

计算公式定义如下:

参数意义英文
start特征图中第一个特征的在原图上中心坐标center coordinate of the first feature

注意,以上start是指每个特征图中,第一个特征点在原图中的感受野区域的中心点,计算公式为:

s t a r t o = s t a r t i + ( k − 1 2 − p ) × j i start_o = start_i + (\frac{k-1}{2}-p)\times j_i starto=starti+(2k1p)×ji

还是以两个3x3的卷积计算为例,计算下每层处理后,每个特征图的第一个特征点在原图上的对应位置:

初始值:

#对输入图片,由以下值,即初始的感受野大小和特征之间的跨度都为1
imsize = 224
j_i = 1
r_i = 1
start = 0.5

第一层卷积处理后的,feature_map大小为3x3,第一个feature点的start为:
s t a r t o = s t a r t i + ( k − 1 2 − p ) × j i = 0.5 + + ( 3 − 1 2 − 1 ) × 1 = 0.5 start_o = start_i + (\frac{k-1}{2}-p)\times j_i = 0.5 + + (\frac{3-1}{2}-1)\times 1 =0.5 starto=starti+(2k1p)×ji=0.5++(2311)×1=0.5

j o = j i × s = 1 j_o = j_i\times s = 1 jo=ji×s=1

同理,第二层处理后,feature_map大小为5x5,第一个feature点的start为:

s t a r t o = s t a r t i + ( k − 1 2 − p ) × j i = 0.5 + + ( 3 − 1 2 − 1 ) × 1 = 0.5 start_o = start_i + (\frac{k-1}{2}-p)\times j_i = 0.5 + + (\frac{3-1}{2}-1)\times 1 =0.5 starto=starti+(2k1p)×ji=0.5++(2311)×1=0.5

j o = j i × s = 1 j_o = j_i\times s = 1 jo=ji×s=1

如果,现在要计算,第二层输出的feature_map中,第[2,2]个特征点对应的中心点坐标,使用以下公示计算

其他特征点对应的区域,按照以下公示进行转换

start_i_x 
= start_0-1_x + feature_i_x * j 
= 0.5 + 2*1 = 2.5

start_i_x 
= start_0_y + feature_i_y * j 
= 0.5 + 2*1 = 2.5

其中,start_0表示第当前特征图一个特征点在原图中的感受野的中心位置,feature_i表示当前特征图中,第i个特征点的点位,j表示当前特征图相邻特征点之间的跨度

感受野计算实现代码

# [filter size, stride, padding]
#Assume the two dimensions are the same
#Each kernel requires the following parameters:
# - k_i: kernel size
# - s_i: stride
# - p_i: padding (if padding is uneven, right padding will higher than left padding; "SAME" option in tensorflow)
# 
#Each layer i requires the following parameters to be fully represented: 
# - n_i: number of feature (data layer has n_1 = imagesize )
# - j_i: distance (projected to image pixel distance) between center of two adjacent features
# - r_i: receptive field of a feature in layer i
# - start_i: position of the first feature's receptive field in layer i (idx start from 0, negative means the center fall into padding)

import math


def outFromIn(conv, layerIn):
  n_in = layerIn[0]
  j_in = layerIn[1]
  r_in = layerIn[2]
  start_in = layerIn[3]
  k = conv[0]
  s = conv[1]
  p = conv[2]
  
  n_out = math.floor((n_in - k + 2*p)/s) + 1
  actualP = (n_out-1)*s - n_in + k 
  pR = math.ceil(actualP/2)
  pL = math.floor(actualP/2)
  
  j_out = j_in * s
  r_out = r_in + (k - 1)*j_in
  start_out = start_in + ((k-1)/2 - pL)*j_in
  return n_out, j_out, r_out, start_out
  
def printLayer(layer, layer_name):
  print(layer_name + ":")
  print("\t n features: %s \n \t jump: %s \n \t receptive size: %s \t start: %s " % (layer[0], layer[1], layer[2], layer[3]))


def showReceptiveField(currentLayer, convnet, layer_names):
  layerInfos = []
  printLayer(currentLayer, "input image")
  for i in range(len(convnet)):
    currentLayer = outFromIn(convnet[i], currentLayer)
    layerInfos.append(currentLayer)
    printLayer(currentLayer, layer_names[i])
  print ("------------------------")
  layer_name = input ("Layer name where the feature in: ") # 这里要修改
  layer_idx = layer_names.index(layer_name)
  idx_x = int(input ("index of the feature in x dimension (from 0)")) # 这里要修改
  idx_y = int(input ("index of the feature in y dimension (from 0)")) # 这里要修改

  n = layerInfos[layer_idx][0]
  j = layerInfos[layer_idx][1]
  r = layerInfos[layer_idx][2]
  start = layerInfos[layer_idx][3]
  assert(idx_x < n)
  assert(idx_y < n)

  print ("receptive field: (%s, %s)" % (r, r))
  print ("center: (%s, %s)" % (start+idx_x*j, start+idx_y*j))


if __name__ == '__main__':
  #first layer is the data layer (image) with n_0 = image size; j_0 = 1; r_0 = 1; and start_0 = 0.5
  imsize = 224
  currentLayer = [imsize, 1, 1, 0.5]


  print ("-------Net summary 3x3 ------")
  convnet_3x3 = [[3,1,1],[3,1,1]]
  layer_names_3x3 = ['conv1','conv2']
  showReceptiveField(currentLayer, convnet_3x3,layer_names_3x3)


  print ("-------Net summary 5x5 ------")
  convnet_5x5 = [[5,1,1]]
  layer_names_5x5 = ['conv1']
  showReceptiveField(currentLayer, convnet_5x5,layer_names_5x5)



  convnet =   [[11,4,0],[3,2,0],[5,1,2],[3,2,0],[3,1,1],[3,1,1],[3,1,1],[3,2,0],[6,1,0], [1, 1, 0]]
  layer_names = ['conv1','pool1','conv2','pool2','conv3','conv4','conv5','pool5','fc6-conv', 'fc7-conv']
  showReceptiveField(currentLayer, convnet,layer_names)


两层3x3卷积,在224的输入上的计算结果:

-------Net summary 3x3 ------
input image:
	 n features: 224
 	 jump: 1
 	 receptive size: 1 	 start: 0.5
conv1:
	 n features: 224
 	 jump: 1
 	 receptive size: 3 	 start: 0.5
conv2:
	 n features: 224
 	 jump: 1
 	 receptive size: 5 	 start: 0.5
------------------------
Layer name where the feature in: conv2
index of the feature in x dimension (from 0)2
index of the feature in y dimension (from 0)2
receptive field: (5, 5)
center: (2.5, 2.5)

相关的理论技术请参考网站A guide to receptive field arithmetic for Convolutional Neural Networks

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值