一、理论感受野
1、定义
- 卷积神经网络输出特征图上的像素点
在原始图像上所能看到区域的大小
,输出特征会受感受野区域内的像素点的影响 - 图像的空间联系是局部的,就像人是通过一个
局部的感受野
去感受外界图像一样,每一个神经元都不需要对全局图像做感受,每个神经元只感受局部的图像区域,然后在更高层,将这些感受不同局部的神经元综合起来就可以得到全局的信息了
2、计算公式
- 普通(空洞)卷积感受野的计算:
- R F l + 1 = R F l + ( k e r n e l − s i z e l + 1 − 1 ) × f e a t u r e − s t r i d e l × d i l a t i o n l + 1 R F_{l+1}=R F_{l}+\left(kernel_{-}size_{l+1}-1\right) \times {feature_{-}stride}_{l} \times dilation_{l+1} RFl+1=RFl+(kernel−sizel+1−1)×feature−stridel×dilationl+1
- R F RF RF 表示特征感受野的大小, l l l 表示层数, f e a t u r e − s t r i d e l = ∏ i = 1 l s t r i d e i {feature_{-}stride}_{l}=\prod_{i=1}^{l} {stride}_{i} feature−stridel=∏i=1lstridei
- l = 0 l=0 l=0 表示输入层,输入图像的每个单元的感受野为 R F 0 = 1 RF_{0} = 1 RF0=1, f e a t u r e − s t r i d e 0 = 1 {feature_{-}stride}_{0}=1 feature−stride0=1,因为每个像素只能看到自己
- 普通卷积所有层的
dilation
为 1,空洞卷积某些层的dilation
大于 1 - Note:
- 第一层卷积输出特征图像素的感受野大小等于滤波器的大小(dilation=1 时)
- 计算感受野大小时,忽略了图像边缘的影响,即不考虑 padding 的大小
- 一些结论:
- 经过 Conv 1*1 不会改变感受野,ReLU/BN/dropout 等元素级操作不会改变感受野
- 当经过 多路分支 时,按照 最大感受野分支 计算, short-cut 操作不会改变感受野
- stride 为 1 的卷积层 线性增加 感受野,深度网络可以通过堆叠多层卷积增加感受野;stride 为2 的下采样层 乘性增加 感受野,但受限于输入分辨率不能随意增加;stride 为 1 的卷积层加在网络 后面位置,会比加在前面位置 获得更大感受野
3、实用小工具
- 感受野计算的是否正确可使用 此链接:Receptive Field Calculator 提供的工具进行验证
- tf 官方感受野计算代码:https://github.com/tensorflow/tensorflow/tree/v2.0.0-rc2/tensorflow/contrib/receptive_field
4、作用
- 感受野的值可以用来大致判断每一层的抽象层次:
- 感受野越大表示其能接触到的原始图像范围就越大,也意味着可能蕴含更为
全局、语义层次更高的特征
- 感受野越小则表示其所包含的特征越趋向于
局部和细节
- 感受野越大表示其能接触到的原始图像范围就越大,也意味着可能蕴含更为
- 辅助网络的设计:
- 一般任务:要求感受野越大越好,如图像分类中最后卷积层的感受野要大于输入图像,网络深度越深感受野越大性能越好
- 目标检测:设置 anchor 要严格对应感受野,anchor 太大或偏离感受野都会严重影响检测性能
- 语义分割:要求输出像素的感受野足够的大,确保做出决策时没有忽略重要信息,一般也是越深越好
- 多个小卷积代替一个大卷积层,在
加深网络深度
(增强了网络容量和复杂度)的同时减少参数的个数
:-
小卷积核(如 3 × 3 3 \times 3 3×3)通过多层叠加可取得与大卷积核(如 7 × 7 7 \times 7 7×7)同等规模的感受野
-
多层小卷积核和大卷积核具有同等感受野及减少参数的直观理解
-
二、有效感受野及其与 anchor 的关系
1、有效感受野
- 实际上,上述的计算是理论感受野,而特征的有效感受野是
远小于
理论感受野的 - 且越靠近感受野
中心
的值被使用次数越多
,靠近边缘的值使用次数越少
2、感受野和 anchor 的关系
- 经典目标检测和最新目标跟踪中,都用到了
RPN(region proposal network)
,anchor
是RPN
的基础,感受野
是anchor
的基础 - 目标检测中,anchor 大小及比例的设置应该跟该层特征的
有效感受野相匹配
,anchor 比有效感受野大太多不好,小太多也不好 - Equal-proportion interval principle: 感受野的大小设置为步长的 n(
4
) 倍
三、代码实现
# -*- coding: utf-8 -*-
"""
@File : compute_receptive_field.py
@Version : 1.0
"""
import numpy as np
# 需指定每层的 kernel_size, stride, pad, dilation, channel_num
net_struct = {
'vgg16': {'net': [[3, 1, 1, 1, 64], [3, 1, 1, 1, 64], [2, 2, 0, 1, 64], [3, 1, 1, 1, 128], [3, 1, 1, 1, 128],
[2, 2, 0, 1, 128],
[3, 1, 1, 1, 256], [3, 1, 1, 1, 256], [3, 1, 1, 1, 256], [2, 2, 0, 1, 256], [3, 1, 1, 1, 512],
[3, 1, 1, 1, 512],
[3, 1, 1, 1, 512], [2, 2, 0, 1, 512], [3, 1, 1, 1, 512], [3, 1, 1, 1, 512], [3, 1, 1, 1, 512],
[2, 2, 0, 1, 512]],
'name': ['conv1_1', 'conv1_2', 'pool1', 'conv2_1', 'conv2_2', 'pool2', 'conv3_1', 'conv3_2', 'conv3_3',
'pool3', 'conv4_1', 'conv4_2', 'conv4_3', 'pool4', 'conv5_1', 'conv5_2', 'conv5_3', 'pool5']},
'vgg16_ssd': {'net': [[3, 1, 1, 1, 64], [3, 1, 1, 1, 64], [2, 2, 0, 1, 64], [3, 1, 1, 1, 128], [3, 1, 1, 1, 128],
[2, 2, 0, 1, 128], [3, 1, 1, 1, 256], [3, 1, 1, 1, 256], [3, 1, 1, 1, 256], [2, 2, 0, 1, 256],
[3, 1, 1, 1, 512], [3, 1, 1, 1, 512], [3, 1, 1, 1, 512], [2, 2, 0, 1, 512], [3, 1, 1, 1, 512],
[3, 1, 1, 1, 512], [3, 1, 1, 1, 512], [3, 1, 1, 1, 512], [3, 1, 6, 6, 1024], [1, 1, 0, 1, 1024],
[1, 1, 0, 1, 256], [3, 2, 1, 1, 512], [1, 1, 0, 1, 128], [3, 2, 1, 1, 256], [1, 1, 0, 1, 128],
[3, 1, 0, 1, 256], [1, 1, 0, 1, 128], [3, 1, 0, 1, 256]],
'name': ['conv1_1', 'conv1_2', 'pool1', 'conv2_1', 'conv2_2', 'pool2', 'conv3_1', 'conv3_2', 'conv3_3',
'pool3', 'conv4_1', 'conv4_2', 'conv4_3', 'pool4', 'conv5_1', 'conv5_2', 'conv5_3', 'pool5',
'fc6', 'fc7', 'conv6_1', 'conv6_2', 'conv7_1', 'conv7_2', 'conv8_1', 'conv8_2', 'conv9_1',
'conv9_2']}
}
# 输出特征图大小的计算(down-top calculation)
def out_from_in(img_size_f, net_p, net_n, layer_len):
total_stride = 1
in_size = img_size_f
for layer in range(layer_len):
k_size, stride, pad, dilation, _ = net_p[layer]
net_name = net_n[layer]
if 'pool' in net_name:
out_size = np.ceil(1.0 * (in_size + 2 * pad - dilation * (k_size - 1) - 1) / stride).astype(np.int32) + 1
else:
out_size = np.floor(1.0 * (in_size + 2 * pad - dilation * (k_size - 1) - 1) / stride).astype(np.int32) + 1
in_size = out_size
total_stride = total_stride * stride
return out_size, total_stride
# 输出特征图上每个像素感受野的计算(top-down calculation)
def in_from_out(net_p, layer_len):
RF = 1
for layer in reversed(range(layer_len)):
k_size, stride, pad, dilation, _ = net_p[layer]
RF = ((RF - 1) * stride) + dilation * (k_size - 1) + 1
return RF
if __name__ == '__main__':
img_size = 300
print("layer output sizes given image = %dx%d" % (img_size, img_size))
for net in net_struct.keys():
print('************net structrue name is %s**************' % net)
for i in range(len(net_struct[net]['net'])):
p = out_from_in(img_size, net_struct[net]['net'], net_struct[net]['name'], i + 1)
rf = in_from_out(net_struct[net]['net'], i + 1)
print(
"layer_name = {:<8} output_size = {:<4} total_stride = {:<3} output_channel = {:<4} rf_size = {:<4}".format(
net_struct[net]['name'][i], p[0], p[1], net_struct[net]['net'][i][4], rf)
)
# vgg16 输出结果如下
layer output sizes given image = 300x300
************net structrue name is vgg16**************
layer_name = conv1_1 output_size = 300 total_stride = 1 output_channel = 64 rf_size = 3
layer_name = conv1_2 output_size = 300 total_stride = 1 output_channel = 64 rf_size = 5
layer_name = pool1 output_size = 150 total_stride = 2 output_channel = 64 rf_size = 6
layer_name = conv2_1 output_size = 150 total_stride = 2 output_channel = 128 rf_size = 10
layer_name = conv2_2 output_size = 150 total_stride = 2 output_channel = 128 rf_size = 14
layer_name = pool2 output_size = 75 total_stride = 4 output_channel = 128 rf_size = 16
layer_name = conv3_1 output_size = 75 total_stride = 4 output_channel = 256 rf_size = 24
layer_name = conv3_2 output_size = 75 total_stride = 4 output_channel = 256 rf_size = 32
layer_name = conv3_3 output_size = 75 total_stride = 4 output_channel = 256 rf_size = 40
layer_name = pool3 output_size = 38 total_stride = 8 output_channel = 256 rf_size = 44
layer_name = conv4_1 output_size = 38 total_stride = 8 output_channel = 512 rf_size = 60
layer_name = conv4_2 output_size = 38 total_stride = 8 output_channel = 512 rf_size = 76
layer_name = conv4_3 output_size = 38 total_stride = 8 output_channel = 512 rf_size = 92
layer_name = pool4 output_size = 19 total_stride = 16 output_channel = 512 rf_size = 100
layer_name = conv5_1 output_size = 19 total_stride = 16 output_channel = 512 rf_size = 132
layer_name = conv5_2 output_size = 19 total_stride = 16 output_channel = 512 rf_size = 164
layer_name = conv5_3 output_size = 19 total_stride = 16 output_channel = 512 rf_size = 196
layer_name = pool5 output_size = 10 total_stride = 32 output_channel = 512 rf_size = 212
# vgg16_ssd 输出结果如下
************net structrue name is vgg16_ssd**************
layer_name = conv1_1 output_size = 300 total_stride = 1 output_channel = 64 rf_size = 3
layer_name = conv1_2 output_size = 300 total_stride = 1 output_channel = 64 rf_size = 5
layer_name = pool1 output_size = 150 total_stride = 2 output_channel = 64 rf_size = 6
layer_name = conv2_1 output_size = 150 total_stride = 2 output_channel = 128 rf_size = 10
layer_name = conv2_2 output_size = 150 total_stride = 2 output_channel = 128 rf_size = 14
layer_name = pool2 output_size = 75 total_stride = 4 output_channel = 128 rf_size = 16
layer_name = conv3_1 output_size = 75 total_stride = 4 output_channel = 256 rf_size = 24
layer_name = conv3_2 output_size = 75 total_stride = 4 output_channel = 256 rf_size = 32
layer_name = conv3_3 output_size = 75 total_stride = 4 output_channel = 256 rf_size = 40
layer_name = pool3 output_size = 38 total_stride = 8 output_channel = 256 rf_size = 44
layer_name = conv4_1 output_size = 38 total_stride = 8 output_channel = 512 rf_size = 60
layer_name = conv4_2 output_size = 38 total_stride = 8 output_channel = 512 rf_size = 76
layer_name = conv4_3 output_size = 38 total_stride = 8 output_channel = 512 rf_size = 92
layer_name = pool4 output_size = 19 total_stride = 16 output_channel = 512 rf_size = 100
layer_name = conv5_1 output_size = 19 total_stride = 16 output_channel = 512 rf_size = 132
layer_name = conv5_2 output_size = 19 total_stride = 16 output_channel = 512 rf_size = 164
layer_name = conv5_3 output_size = 19 total_stride = 16 output_channel = 512 rf_size = 196
layer_name = pool5 output_size = 19 total_stride = 16 output_channel = 512 rf_size = 228
layer_name = fc6 output_size = 19 total_stride = 16 output_channel = 1024 rf_size = 420
layer_name = fc7 output_size = 19 total_stride = 16 output_channel = 1024 rf_size = 420
layer_name = conv6_1 output_size = 19 total_stride = 16 output_channel = 256 rf_size = 420
layer_name = conv6_2 output_size = 10 total_stride = 32 output_channel = 512 rf_size = 452
layer_name = conv7_1 output_size = 10 total_stride = 32 output_channel = 128 rf_size = 452
layer_name = conv7_2 output_size = 5 total_stride = 64 output_channel = 256 rf_size = 516
layer_name = conv8_1 output_size = 5 total_stride = 64 output_channel = 128 rf_size = 516
layer_name = conv8_2 output_size = 3 total_stride = 64 output_channel = 256 rf_size = 644
layer_name = conv9_1 output_size = 3 total_stride = 64 output_channel = 128 rf_size = 644
layer_name = conv9_2 output_size = 1 total_stride = 64 output_channel = 256 rf_size = 772
四、参考资料
1、关于感受野的总结
2、卷积神经网络的感受野
3、特征图尺寸和感受野计算详解
4、A guide to receptive field arithmetic for Convolutional Neural Networks
5、Convolutional Pose Machines
6、S3FD: Single Shot Scale-invariant Face Detector