DilatedDilated可以叫空洞卷积或者扩张卷积。
空洞卷积诞生于图像分割领域,图像输入到网络中经过CNNCNN横空出世。
在讲空洞卷积都会用到原论文中的一张图来说明
在空洞卷积中有个重要的参数叫raterate,这个参数代表了空洞的大小。
要理解空洞概念和如何操作可以从两个角度去看。
1)从原图角度,所谓空洞就是在原图上做采样。采样的频率是根据rate参数来设置的,当rate为1时候,就是原图不丢失任何信息采样,此时卷积操作就是标准的卷积操作,当rate>1,比如2的时候,就是在原图上每隔一(rate-1)个像素采样,如图b,可以把红色的点想象成在原图上的采样点,然后将采样后的图像与kernel做卷积,这样做其实变相增大了感受野。
2)从kernel角度去看空洞的话就是扩大kernel的尺寸,在kernel中,相邻点之间插入rate-1个零,然后将扩大的kernel和原图做卷积 ,这样还是增大了感受野。
在VGG网络中就证明了使用小卷积核叠加来取代大卷积核可以起到减少参数同时达到大卷积核同样大小感受野的功效。但是通过叠加小卷积核来扩大感受野只能线性增长,公式为(kernelSize−1)∗layers+1(kernelSize−1)∗layers+1,,也就是线性增长,而空洞卷积可以以指数级增长感受野。
标准卷积方式
空洞卷积方式
空洞卷积在全卷积网络(FCN)(FCN),当它和双线性插值一起使用时可以替代转置卷积。空洞卷积可以在kernel有效增大感受野的同时不增加模型参数或者计算量。在图像需要全局信息或者语音文本需要较长的sequence信息依赖的问题中,都能较好的应用空洞卷积。在图像分割,语音合成WaveNet,机器翻译ByteNet中都有空洞卷积的身影。
在之前的一篇博文中,我稍微总结了deconv/转置卷积概念和用法,现在把deconv和Dilated conv在一起比较一下。
deconvdeconv为1)。
对于标准的k*k的卷积核,stridestride,分三种情况分析:
1)s>1s>1操作扩大原图,然后再卷积,这样得到的结果图会变大。
DilatedDilated填充扩大后来卷积,以达到增大感受野的效果。
在TensorflowTensorflow。
tf.nn.atrous__conv2dtf.nn.atrous__conv2d
在tf.nn.conv2d函数中有一个参数叫dilations,同样可以是可以实现空洞卷积的效果在tf.nn.conv2d函数中有一个参数叫dilations,同样可以是可以实现空洞卷积的效果
2018/4/2更新:
在实际使用中发现atrous_conv2d和conv2d对于空洞后卷积输出的shape描述不清楚,自己搜资料发现输出的shape不光和padding有关,还与rate有关。输出shape计算思路如下,首先看padding,如果padding是SAME,那么不管rate是多少,都按照这个来算。如果padding是VALID,那么也是按照这个来算,只不过这儿的filter_size需要根据rate来重新算,也就是说空洞是加在卷积核上的,我们先对卷积核填充0,得到新的卷积核大小filter_height = heght+(height-1)*(rate-1),宽同理。将新的filter送到上面VALID模式下计算卷积输出就是最后的输出了。实际代码输出感受下
import tensorflow as tf
import numpy as np
input_img_np = np.random.random((1, 256, 256, 1)).astype(np.float32)
kernel = np.random.random((6,6,1,1)).astype(np.float32)
with tf.Session() as sess:
concrete_input_op = tf.constant(input_img_np)
concrete_output_op = tf.nn.convolution(concrete_input_op, kernel, padding='SAME', dilation_rate=np.array([2, 2]))
concrete_output = sess.run(concrete_output_op)
print('convolution + CONCRETE + SAME')
print('concrete_input_op: ', concrete_input_op.get_shape())
print('concrete_output_op: ', concrete_output_op.get_shape())
print('concrete_output:', concrete_output.shape)
assert(concrete_input_op.get_shape() == concrete_output_op.get_shape())
undef_input_op = tf.placeholder(tf.float32, shape=(None, 256, 256, 1))
undef_output_op = tf.nn.convolution(undef_input_op, kernel, padding='SAME', dilation_rate=np.array([2, 2]))
undef_output = sess.run(undef_output_op, feed_dict={undef_input_op: input_img_np})
print('convolution + UNDEF + SAME')
print('undef_input_op: ', undef_input_op.get_shape())
print('undef_output_op: ', undef_output_op.get_shape())
print('undef_output:', undef_output.shape)
# This assert will correctly fail even though the shapes are ok because shapes are only partially known
# assert(undef_input_op.get_shape() == undef_output_op.get_shape())
valid_concrete_input_op = tf.constant(input_img_np)
valid_concrete_output_op = tf.nn.convolution(valid_concrete_input_op, kernel, padding='VALID', dilation_rate=np.array([2, 2]))
valid_concrete_output = sess.run(valid_concrete_output_op)
print('convolution + CONCRETE + VALID')
print('valid_concrete_input_op: ', valid_concrete_input_op.get_shape())
print('valid_concrete_output_op: ', valid_concrete_output_op.get_shape())
print('valid_concrete_output:', valid_concrete_output.shape)
valid_undef_input_op = tf.placeholder(tf.float32, shape=(None, 256, 256, 1))
valid_undef_output_op = tf.nn.convolution(valid_undef_input_op, kernel, padding='VALID', dilation_rate=np.array([2, 2]))
valid_undef_output = sess.run(valid_undef_output_op, feed_dict={valid_undef_input_op: input_img_np})
print('convolution + UNDEF + VALID')
print('valid_undef_input_op: ', valid_undef_input_op.get_shape())
print('valid_undef_output_op: ', valid_undef_output_op.get_shape())
print('valid_undef_output:', valid_undef_output.shape)
# This assert will correctly fail even though the shapes are ok because shapes are only partially known
# assert(undef_input_op.get_shape() == undef_output_op.get_shape())
############################################################################
# Now atrous
concrete_input_op = tf.constant(input_img_np)
concrete_output_op = tf.nn.atrous_conv2d(concrete_input_op, kernel, padding='SAME', rate=2)
concrete_output = sess.run(concrete_output_op)
print('atrous_conv2d + CONCRETE + SAME')
print('concrete_input_op: ', concrete_input_op.get_shape())
print('concrete_output_op: ', concrete_output_op.get_shape())
print('concrete_output_op: ', concrete_output_op.get_shape())
print('concrete_output:', concrete_output.shape)
assert(concrete_input_op.get_shape() == concrete_output_op.get_shape())
undef_input_op = tf.placeholder(tf.float32, shape=(None, 256, 256, 1))
undef_output_op = tf.nn.atrous_conv2d(undef_input_op, kernel, padding='SAME', rate=2)
undef_output = sess.run(undef_output_op, feed_dict={undef_input_op: input_img_np})
print('atrous_conv2d + UNDEF + SAME')
print('undef_input_op: ', undef_input_op.get_shape())
print('undef_output_op: ', undef_output_op.get_shape())
print('undef_output:', undef_output.shape)
# This assert will correctly fail even though the shapes are ok because shapes are only partially known
# assert(undef_input_op.get_shape() == undef_output_op.get_shape())
valid_concrete_input_op = tf.constant(input_img_np)
valid_concrete_output_op = tf.nn.atrous_conv2d(valid_concrete_input_op, kernel, padding='VALID', rate=2)
valid_concrete_output = sess.run(valid_concrete_output_op)
print('atrous_conv2d + CONCRETE + VALID')
print('valid_concrete_input_op: ', valid_concrete_input_op.get_shape())
print('valid_concrete_output_op: ', valid_concrete_output_op.get_shape())
print('valid_concrete_output:', valid_concrete_output.shape)
valid_undef_input_op = tf.placeholder(tf.float32, shape=(None, 256, 256, 1))
valid_undef_output_op = tf.nn.atrous_conv2d(valid_undef_input_op, kernel, padding='VALID', rate=2)
valid_undef_output = sess.run(valid_undef_output_op, feed_dict={valid_undef_input_op: input_img_np})
print('atrous_conv2d + UNDEF + VALID')
print('valid_undef_input_op: ', valid_undef_input_op.get_shape())
print('valid_undef_output_op: ', valid_undef_output_op.get_shape())
print('valid_undef_output:', valid_undef_output.shape)
# This assert will correctly fail even though the shapes are ok because shapes are only partially known
# assert(undef_input_op.get_shape() == undef_output_op.get_shape())
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
convolution + CONCRETE + SAME
('concrete_input_op: ', TensorShape([Dimension(1), Dimension(256), Dimension(256), Dimension(1)]))
('concrete_output_op: ', TensorShape([Dimension(1), Dimension(256), Dimension(256), Dimension(1)]))
('concrete_output:', (1, 256, 256, 1))
convolution + UNDEF + SAME
('undef_input_op: ', TensorShape([Dimension(None), Dimension(256), Dimension(256), Dimension(1)]))
('undef_output_op: ', TensorShape([Dimension(None), Dimension(256), Dimension(256), Dimension(1)]))
('undef_output:', (1, 256, 256, 1))
convolution + CONCRETE + VALID
('valid_concrete_input_op: ', TensorShape([Dimension(1), Dimension(256), Dimension(256), Dimension(1)]))
('valid_concrete_output_op: ', TensorShape([Dimension(1), Dimension(246), Dimension(246), Dimension(1)]))
('valid_concrete_output:', (1, 246, 246, 1))
convolution + UNDEF + VALID
('valid_undef_input_op: ', TensorShape([Dimension(None), Dimension(256), Dimension(256), Dimension(1)]))
('valid_undef_output_op: ', TensorShape([Dimension(None), Dimension(246), Dimension(246), Dimension(1)]))
('valid_undef_output:', (1, 246, 246, 1))
atrous_conv2d + CONCRETE + SAME
('concrete_input_op: ', TensorShape([Dimension(1), Dimension(256), Dimension(256), Dimension(1)]))
('concrete_output_op: ', TensorShape([Dimension(1), Dimension(256), Dimension(256), Dimension(1)]))
('concrete_output_op: ', TensorShape([Dimension(1), Dimension(256), Dimension(256), Dimension(1)]))
('concrete_output:', (1, 256, 256, 1))
atrous_conv2d + UNDEF + SAME
('undef_input_op: ', TensorShape([Dimension(None), Dimension(256), Dimension(256), Dimension(1)]))
('undef_output_op: ', TensorShape([Dimension(None), Dimension(None), Dimension(None), Dimension(1)]))
('undef_output:', (1, 256, 256, 1))
atrous_conv2d + CONCRETE + VALID
('valid_concrete_input_op: ', TensorShape([Dimension(1), Dimension(256), Dimension(256), Dimension(1)]))
('valid_concrete_output_op: ', TensorShape([Dimension(1), Dimension(246), Dimension(246), Dimension(1)]))
('valid_concrete_output:', (1, 246, 246, 1))
atrous_conv2d + UNDEF + VALID
('valid_undef_input_op: ', TensorShape([Dimension(None), Dimension(256), Dimension(256), Dimension(1)]))
('valid_undef_output_op: ', TensorShape([Dimension(None), Dimension(None), Dimension(None), Dimension(1)]))
('valid_undef_output:', (1, 246, 246, 1))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
参考文献:
Tensorflow官方文档
知乎:如何理解空洞卷积(dilated convolution)?
https://github.com/tensorflow/tensorflow/issues/4742