Tensorflow不定长卷积与解卷积
在用CNN处理某些图像或时序输入时,需要考虑输入长度不固定的情况。例如在做场景文字识别时,输入的图像是已经被检测出来的长方形的包含文字的图像,这些 “检测框” 的长度不一。一般有两种做法,第一种从数据下手,将输入 padding 或 resize,所谓 padding 即给定一个固定长度,将短于该长度的样本补零,将长于该长度的样本截断或丢弃,所谓 resize 就是对样本上下采样,周期信号还可以简单重复或截取。第二种从模型下手,采用能接受任意长度输入的模型,比如RNN。实际上,CNN结构中限制输入大小的是FC层,如果去掉FC层,也可以使CNN接受任意长度输入。
本文给出如何用Tensorflow实现接受不定长输入的卷积层和解卷积层。
卷积层:接受任意尺寸输入
卷积层tf.nn.conv2d本身就可接受任意尺寸的输入,真正制约的实际上是FC层。
def conv2d(x, channel, k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02, padding='VALID', name='conv2d'):
with tf.variable_scope(name):
w = tf.get_variable('weights', [k_h, k_w, x.get_shape()[-1], channel],
initializer=tf.truncated_normal_initializer(stddev=stddev))
biases = tf.get_variable('biases', shape=[channel], initializer=tf.zeros_initializer())
conv = tf.nn.conv2d(x, w, strides=[1, d_h, d_w, 1], padding=padding)
conv = tf.nn.bias_add(conv, biases)
return conv
测试如下,对于不同尺寸(H,W)
的输入,卷积层都能得到正确的输出。
import numpy as np
import tensorflow as tf
x = tf.placeholder(shape=(64,None,None,1), dtype=tf.float32)
y = conv2d(x, 16, 2,2,1,1, name='conv')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# test for different pair of (H,W)
for H in range(25, 30):
for W in range(25, 30):
out = sess.run(y, feed_dict={x:np.random.normal(size=(64,H,W,1))})
print "H={}, W={}, ".format(H, W), out.shape
H=25, W=25, (64, 24, 24, 16)
H=25, W=26, (64, 24, 25, 16)
H=25, W=27, (64, 24, 26, 16)
H=25, W=28, (64, 24, 27, 16)
H=25, W=29, (64, 24, 28, 16)
...
卷积层:处理同batch中padding过的不等长样本
虽然输入卷积层的数据往往是[N,H,W,C]大小(即[batch_size, height, width, channel])的Tensor,但实际上可能是做过padding的不等长样本。若有样本实际长度为 L<N L<NL<N,则对其后半段长度为 N−L N-LN−L 的部分的卷积结果是没有意义的,不应参与输出或训练。为了解决这个问题,可以用Masking将卷积结果的一部分遮盖掉,因此需要根据卷积核与步长定义一个计算卷积输出尺寸的函数。
方便起见,定义一层卷积,卷积核大小(2,2),步长(2,2),由此定义函数get_conv_lens来计算卷积输出的尺寸,接着用tf.sequence_mask得到掩膜,再与卷积结果相乘。由最终结果可见,超过预定长度的部分都变成了0,因此在计算梯度时,这部分数据将被忽略不计。
def get_conv_lens(lengths):
return tf.floor_div(lengths - 1, 2)
x = tf.placeholder(shape=(3,None,None,1), dtype=tf.float32)
lens = tf.placeholder(shape=(3), dtype=tf.int32)
'''Masking'''
conv_lens = get_conv_lens(lens)
mask = tf.sequence_mask(conv_lens)
mask = tf.expand_dims(tf.expand_dims(mask, -1), -1)
mask = tf.to_float(mask)
y = conv2d(x, channel=1, k_w=2, k_h=2, d_w=2, d_h=2, name='conv')
y = tf.multiply(y, mask)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
feed_dict = {
x:np.random.normal(size=(3, 9, 9, 1)),
lens:[5, 7, 9]
}
out, l = sess.run([y, conv_lens], feed_dict=feed_dict)
print 'conv_output_lens:', l
for i in range(3):
print "Sample {}, len:{}".format(i, l[i])
print out[i, :, :, 0]
conv_output_lens: [2 3 4]
Sample 0, len:2
[[-0.01515051 -0.05866513 -0.03959468 -0.04031102]
[ 0.01430748 -0.01260181 0.06486305 0.01679482]
[ 0. -0. 0. -0. ]
[ 0. -0. 0. -0. ]]
Sample 1, len:3
[[-0.00693851 -0.06252004 0.04867405 0.00893762]
[-0.01766482 -0.01576694 0.00474036 0.05953841]
[ 0.02577864 -0.05794765 0.07342847 0.02103793]
[-0. -0. -0. 0. ]]
Sample 2, len:4
[[ 0.0050444 -0.01620883 -0.01921686 -0.01786101]
[ 0.02900701 0.02657226 -0.00322832 -0.07596755]
[-0.03624581 0.05622911 -0.00773423 0.04726247]
[ 0.0512427 0.01688698 -0.03030321 0.01135093]]
解卷积层:接受任意尺寸输入
解卷积层tf.nn.conv2d_transpose要求给定输出尺寸output_shape,为了实现接受任意尺寸输入,output_shape的中间两维应当是非定值,根据卷积核与步长,动态计算解卷积后的尺寸。这两个值可以是Tensor,但不能是None。因此不能用get_shape().as_list()获取输入尺寸再计算,它只会返回None,可以用tf.shape(),它会返回一个Tensor。
def deconv2d(x, channel, k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02, name='deconv2d'):
# output_shape: the output_shape of deconv op
def get_deconv_lens(H, k, d):
return tf.multiply(H, d) + k - 1
shape = tf.shape(x)
H, W = shape[1], shape[2]
N, _, _, C = x.get_shape().as_list()
with tf.variable_scope(name):
w = tf.get_variable('weights', [k_h, k_w, channel, x.get_shape()[-1]], initializer=tf.random_normal_initializer(stddev=stddev))
biases = tf.get_variable('biases', shape=[channel], initializer=tf.zeros_initializer())
H1 = get_deconv_lens(H, k_h, d_h)
W1 = get_deconv_lens(W, k_w, d_w)
deconv = tf.nn.conv2d_transpose(x, w, output_shape=[N, H1, W1, channel], strides=[1, d_h, d_w, 1], padding='VALID')
deconv = tf.nn.bias_add(deconv, biases)
return deconv
测试如下,对于不同尺寸(H,W)
的输入,解卷积层都能得到正确的输出。
x = tf.placeholder(shape=(3,None,None,1), dtype=tf.float32)
y = deconv2d(x, k_w=2, k_h=2, d_w=2, d_h=2, channel=18)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for H in range(4, 9):
for W in range(4, 9):
feed_dict = {x:np.random.normal(size=(3, H, W, 1))}
out = sess.run(y, feed_dict=feed_dict)
print "H={}, W={}, ".format(H,W), out.shape
H=4, W=4, (3, 9, 9, 18)
H=4, W=5, (3, 9, 11, 18)
H=4, W=6, (3, 9, 13, 18)
H=4, W=7, (3, 9, 15, 18)
H=4, W=8, (3, 9, 17, 18)
...
解卷积层:处理同batch中padding过的不等长样本
def get_deconv_lens(lens):
return tf.multiply(lens, 2) + 1
x = tf.placeholder(shape=(3,None,None,1), dtype=tf.float32)
lens = tf.placeholder(shape=(3), dtype=tf.int32)
deconv_lens = get_deconv_lens(lens)
mask = tf.sequence_mask(deconv_lens)
mask = tf.expand_dims(tf.expand_dims(mask, -1), -1)
mask = tf.to_float(mask)
y = deconv2d(x, k_h=2, k_w=2, d_h=2, d_w=1, channel=1)
y = tf.multiply(y, mask)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
H, W = 3, 3
feed_dict = {
x:np.random.normal(size=(3, H, W, 1)),
lens:[1,2,3]}
out, ls = sess.run([y, deconv_lens], feed_dict=feed_dict)
print "H={}, W={}, ".format(H, W), out.shape
print "deconv_output_lengths:", ls
for i in range(3):
print "Sample {}: len={}".format(i, ls[i])
print out[i, :, :, 0]
H=3, W=3, (3, 7, 4, 1)
deconv_output_lengths: [3 5 7]
Sample 0: len=3
[[-0.0066564 -0.0110612 0.01348369 0.02262646]
[-0.00557604 0.0001205 0.01187689 -0.00099025]
[ 0.00473566 0.01897985 0.01987015 0.0026021 ]
[ 0. 0. 0. -0. ]
[-0. 0. 0. -0. ]
[-0. 0. -0. 0. ]
[ 0. 0. 0. 0. ]]
Sample 1: len=5
[[-0.01708268 -0.02356728 0.01687649 0.01737293]
[-0.01431008 0.00434665 0.00883375 -0.00076033]
[ 0.01680406 0.04338101 0.01744103 -0.01432209]
[ 0.01407669 0.01264412 -0.00865467 0.00062681]
[-0.01829475 -0.02353049 0.01568151 0.0104046 ]
[-0. 0. 0. -0. ]
[ 0. 0. 0. 0. ]]
Sample 2: len=7
[[ 4.8716185e-03 -5.5075656e-03 -1.9949302e-02 2.1265401e-03]
[ 4.0809335e-03 -1.1483297e-02 2.0447900e-03 -9.3068171e-05]
[-1.1571520e-02 -5.5464655e-03 2.3397136e-02 4.2484370e-03]
[-9.6934121e-03 1.1671140e-02 1.3168794e-03 -1.8593312e-04]
[-1.0423118e-02 -3.6943398e-02 -2.9132590e-02 5.2677775e-03]
[-8.7314006e-03 -1.6249334e-02 4.1774949e-03 -2.3054463e-04]
[ 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00]]
一些经验与心得
1. 如果想组成类似U-Net的结构
不定长输入是允许的,但需要将输入padding成一些固定的长度,可取的值并不是连续的,这个在多层Conv-Deconv中尤其显著。该问题本质上是由卷积层tf.nn.conv2d在做步长大于1的卷积时,遇到无法整除的情况时,输出尺寸向上取整导致的。
2. 多层Conv或Deconv的Masking
不需要逐层masking,一次性计算好最后的长度lengths,一步到位得到mask,再乘上输出即可。
3. Conv如何得到定长的输出
在分类任务中常需要后续接一个FC跟Softmax,假设Conv的输出是[N, T, W, C],这里T是变长的,可理解为timestep,将输出做Mask-meanpooling,即将masking之后的结果在T这一维度叠加,得到[N, 1, W, C]大小的Tensor,然后在除以长度lengths,最后做flatten展开即可。
---------------------
作者:蕉叉熵
来源:CSDN
原文:https://blog.csdn.net/songbinxu/article/details/84983220
版权声明:本文为博主原创文章,转载请附上博文链接!