tf.nn.dynamic_rnn 代码举例详解一看就明白了

最新推荐文章于 2020-08-26 15:31:29 发布

weixin_42713739

最新推荐文章于 2020-08-26 15:31:29 发布

阅读量1k

点赞数 1

分类专栏： tensorflow

本文链接：https://blog.csdn.net/weixin_42713739/article/details/103700670

版权

tensorflow 专栏收录该内容

20 篇文章 5 订阅

订阅专栏

先上源码：

tf.nn.dynamic_rnn(
    cell,
    inputs,
    sequence_length=None,
    initial_state=None,
    dtype=None,
    parallel_iterations=None,
    swap_memory=False,
    time_major=False,
    scope=None
)

batch_size: 是输入的这批数据的数量。一般是矩阵的第一个维度的大小。

max_time: 就是这批数据中序列的最长长度，如果输入的三个句子，那max_time对应的就是最长句子的单词数量，cell.output_size其实就是rnn cell中神经元的个数

outputs : outputs是一个tensor，如果time_major 为True，outputs形状为 [max_time, batch_size, cell.output_size ]（要求rnn输入与rnn输出形状保持一致）,如果time_major==False（默认），outputs形状为 [ batch_size, max_time, cell.output_size ]

state： state是一个tensor。state是最终的状态，也就是序列中最后一个cell输出的状态。一般情况下state的形状为 [batch_size, cell.output_size ]，但当输入的cell为BasicLSTMCell时，state的形状为[2，batch_size, cell.output_size ]，其中2也对应着LSTM中的cell state和hidden state

state含义

对于第一问题“state”形状为什么会发生变化呢？

我们以LSTM和GRU分别为tf.nn.dynamic_rnn的输入cell类型为例，当cell为LSTM，state形状为[2，batch_size, cell.output_size ]；当cell为GRU时，state形状为[batch_size, cell.output_size ]。其原因是因为LSTM和GRU的结构本身不同，如下面两个图所示，这是LSTM的cell结构，每个cell会有两个输出C_t 和 H_t，上面这个图是输出C_t，代表哪些信息应该被记住哪些应该被遗忘；下面这个图是输出H_t，代表这个cell的最终输出，LSTM的state是由C_t 和 H_t组成的。

在这里插入图片描述

当cell为GRU时，state就只有一个了，原因是GRU将C_t 和 H_t进行了简化，将其合并成了H_t，如下图所示，GRU将遗忘门和输入门合并成了更新门，另外cell不在有细胞状态cell state，只有hidden state。

在这里插入图片描述
对于第二个问题outputs和state有什么关系？
结论上来说，如果cell为LSTM，那 state是个tuple，分别代表C_t 和 H_t，其中 H_t与outputs中的对应的最后一个时刻的输出相等，假设state形状为[ 2，batch_size, cell.output_size ]，outputs形状为 [ batch_size, max_time, cell.output_size ]，那么state[ 1, batch_size, : ] == outputs[ batch_size, -1, : ]；如果cell为GRU，那么同理，state其实就是 H_t，state ==outputs[ -1 ]

代码举例

单个隐藏层

import tensorflow as tf
import numpy as np
 
n_steps = 2
n_inputs = 3
n_neurons = 5 #神经元的个数

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)

seq_length = tf.placeholder(tf.int32, [None])
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32,
                                    sequence_length=seq_length)

init = tf.global_variables_initializer()

X_batch = np.array([
    # step 0     step 1
    [[0, 1, 2], [9, 8, 7]],  # instance 1
    [[3, 4, 5], [0, 0, 0]],  # instance 2 (padded with zero vectors)
    [[6, 7, 8], [6, 5, 4]],  # instance 3
    [[9, 0, 1], [3, 2, 1]],  # instance 4
])
print(X_batch.shape) #[4,2,3]
seq_length_batch = np.array([2, 1, 2, 2]) #必须是一维矩阵并且元素的个数必须是X_batch矩阵的第一维的大小

with tf.Session() as sess:
    init.run()
    outputs_val, states_val = sess.run(
        [outputs, states], feed_dict={X: X_batch, seq_length: seq_length_batch})
    print("outputs_val.shape:", outputs_val.shape)
    print("states_val.shape:", states_val.shape)
    print("outputs_val:", outputs_val) 
    print("states_val:", states_val)

输出结果是：

(4, 2, 3)
outputs_val.shape: (4, 2, 5)
states_val.shape: (4, 5)
outputs_val: [[[ 0.50321156 -0.41240135 -0.84440345 -0.02686742 -0.8964557 ]
  [ 0.9573825  -0.99886006 -0.9999997   0.991583   -1.        ]]

 [[ 0.92471015 -0.93656176 -0.9996478   0.5771828  -0.9999348 ]
  [ 0.          0.          0.          0.          0.        ]]

 [[ 0.99078256 -0.9948545  -0.99999946  0.87246966 -0.99999994]
  [ 0.58877105 -0.9849114  -0.9999669   0.9589857  -0.99999225]]

 [[-0.0043383  -0.9785873  -0.99291325 -0.7907419  -0.99364054]
  [-0.48218966 -0.94694626 -0.8577129   0.88309634 -0.9994416 ]]]
  
states_val: [[ 0.9573825  -0.99886006 -0.9999997   0.991583   -1.        ]
 [ 0.92471015 -0.93656176 -0.9996478   0.5771828  -0.9999348 ]
 [ 0.58877105 -0.9849114  -0.9999669   0.9589857  -0.99999225]
 [-0.48218966 -0.94694626 -0.8577129   0.88309634 -0.9994416 ]]

首先输入X是一个 [batch_size，step，input_size] = [4，2，3] 的tensor，注意我们这里调用的是BasicRNNCell(同GRUCell)，只有一层循环网络，经过tf.nn.dynamic_rnn后outputs的形状为 [ 4, 2, 5 ]，state形状为 [ 4, 5 ]。可以看到 state 与对应的outputs的最后一行是相等的。

多个隐藏层的情况

n_steps = 2
n_inputs = 3
n_neurons = 5
n_layers = 3

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
seq_length = tf.placeholder(tf.int32, [None])

layers = [tf.contrib.rnn.BasicRNNCell(num_units=n_neurons,
                                      activation=tf.nn.relu)
          for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32, sequence_length=seq_length)

init = tf.global_variables_initializer()

X_batch = np.array([
    # step 0     step 1
    [[0, 1, 2], [9, 8, 7]],  # instance 1
    [[3, 4, 5], [0, 0, 0]],  # instance 2 (padded with zero vectors)
    [[6, 7, 8], [6, 5, 4]],  # instance 3
    [[9, 0, 1], [3, 2, 1]],  # instance 4
])

seq_length_batch = np.array([2, 1, 2, 2])

with tf.Session() as sess:
    init.run()
    outputs_val, states_val = sess.run(
        [outputs, states], feed_dict={X: X_batch, seq_length: seq_length_batch})
    print("outputs_val.shape:", outputs)
    print("states_val.shape:", states)
    print("outputs_val:", outputs_val)
    print("states_val:", states_val)

输出结果：

outputs_val.shape: Tensor("rnn/transpose_1:0", shape=(?, 2, 5), dtype=float32)
states_val.shape: (<tf.Tensor 'rnn/while/Exit_3:0' shape=(?, 5) dtype=float32>, 
                              <tf.Tensor 'rnn/while/Exit_4:0' shape=(?, 5) dtype=float32>, 
                              <tf.Tensor 'rnn/while/Exit_5:0' shape=(?, 5) dtype=float32>)
                              
outputs_val: [[[0.         0.         0.2550385  0.12092578 0.40954068]
  [0.         0.         1.896489   0.6294941  3.8094907 ]]

 [[0.         0.         1.0293288  0.4813519  1.824276  ]
  [0.         0.         0.         0.         0.        ]]

 [[0.         0.         1.803619   0.84177816 3.2390108 ]
  [0.         0.78937817 0.         0.         4.249421  ]]

 [[0.         0.         2.3075228  0.9682029  3.3698235 ]
  [0.         2.1534934  0.         0.         2.5193954 ]]]
  
states_val: (array([[0.        , 5.516548  , 5.836305  , 0.        , 1.5091596 ],
       [0.        , 2.882547  , 2.6163056 , 0.        , 0.77435017],
       [0.        , 1.4631326 , 1.4441929 , 0.        , 2.7281032 ],
       [3.9093783 , 0.        , 0.        , 0.        , 3.2143312 ]],
      dtype=float32), 
      array([[0.        , 1.8190625 , 2.4112606 , 0.        , 2.3922706 ],
       [0.        , 1.048449  , 1.0651987 , 0.        , 1.2866479 ],
       [2.540713  , 0.        , 2.0854363 , 0.        , 0.15023205],
       [1.324711  , 0.        , 0.        , 0.        , 0.        ]],
      dtype=float32), 
      array([[0.        , 0.        , 1.896489  , 0.6294941 , 3.8094907 ],
       [0.        , 0.        , 1.0293288 , 0.4813519 , 1.824276  ],
       [0.        , 0.78937817, 0.        , 0.        , 4.249421  ],
       [0.        , 2.1534934 , 0.        , 0.        , 2.5193954 ]],
      dtype=float32))

我们说过，outputs是最后一层的输出，即 [batch_size，step，n_neurons] = [4，2，5]
states是每一层的最后一个step的输出，即三个结构为 [batch_size，n_neurons] = [4，5] 的tensor
继续观察数据，states中的最后一个array，正好是outputs的最后那个step的输出

BasicLSTMCell 多个隐藏层

n_steps = 2
n_inputs = 3
n_neurons = 5
n_layers = 3

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
seq_length = tf.placeholder(tf.int32, [None])

#cell = tf.contrib.rnn.BasicLSTMCell(num_units=rnn_hidden_size, state_is_tuple=True)
layers = [tf.contrib.rnn.BasicLSTMCell(num_units=n_neurons,
                                      activation=tf.nn.relu)
          for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32, sequence_length=seq_length)

init = tf.global_variables_initializer()

X_batch = np.array([
    # step 0     step 1
    [[0, 1, 2], [9, 8, 7]],  # instance 1
    [[3, 4, 5], [0, 0, 0]],  # instance 2 (padded with zero vectors)
    [[6, 7, 8], [6, 5, 4]],  # instance 3
    [[9, 0, 1], [3, 2, 1]],  # instance 4
])

seq_length_batch = np.array([2, 1, 2, 2])

with tf.Session() as sess:
    init.run()
    outputs_val, states_val = sess.run(
        [outputs, states], feed_dict={X: X_batch, seq_length: seq_length_batch})
    print("outputs_val.shape:", outputs)
    print("states_val.shape:", states)
    print("outputs_val:", outputs_val)
    print("states_val:", states_val)

输出结果：

outputs_val.shape: Tensor("rnn/transpose_1:0", shape=(?, 2, 5), dtype=float32)

states_val.shape:
 (LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_3:0' shape=(?, 5) dtype=float32>, 
 							 h=<tf.Tensor 'rnn/while/Exit_4:0' shape=(?, 5) dtype=float32>),                      
 LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_5:0' shape=(?, 5) dtype=float32>, 
 							h=<tf.Tensor 'rnn/while/Exit_6:0' shape=(?, 5) dtype=float32>), 
 LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_7:0' shape=(?, 5) dtype=float32>, 
 							h=<tf.Tensor 'rnn/while/Exit_8:0' shape=(?, 5) dtype=float32>))
 							
outputs_val: [[[0.00209509 0.00182596 0.         0.00067289 0.00347673]
  [0.04328459 0.07312859 0.         0.01560373 0.06793488]]

 [[0.01704519 0.02663972 0.         0.00636475 0.02475912]
  [0.         0.         0.         0.         0.        ]]

 [[0.03605811 0.0760659  0.         0.01464115 0.04976365]
  [0.09704518 0.3786289  0.         0.05542209 0.04016687]]

 [[0.         0.         0.         0.         0.        ]
  [0.01460283 0.05219102 0.         0.00475517 0.        ]]]
  
states_val: (LSTMStateTuple(c=array([[4.2300582e+00, 6.1886315e-03, 1.9541669e+00, 4.7516477e-01,
        7.1493559e+00],
       [2.2236171e+00, 0.0000000e+00, 4.6114996e-01, 0.0000000e+00,
        2.5368762e+00],
       [3.3151534e+00, 0.0000000e+00, 9.3500906e-01, 1.7955723e+00,
        9.6494026e+00],
       [1.9809818e+00, 0.0000000e+00, 3.0963078e+00, 1.6501746e+00,
        4.6886072e+00]], dtype=float32), h=array([[4.2053738e+00, 3.8800889e-03, 1.9386519e+00, 4.7059658e-01,
        3.4508269e+00],
       [2.0354471e+00, 0.0000000e+00, 4.3142736e-01, 0.0000000e+00,
        1.7041360e+00],
       [3.1316614e+00, 0.0000000e+00, 7.4865651e-01, 1.7757093e+00,
        6.9275556e+00],
       [1.4433841e+00, 0.0000000e+00, 2.3309155e+00, 1.5163686e+00,
        2.4487319e+00]], dtype=float32)),
        
         LSTMStateTuple(c=array([[0.        , 0.89624435, 1.9348668 , 0.699529  , 0.        ],
       [0.        , 0.41140077, 0.85579544, 0.31884393, 0.        ],
       [0.        , 2.424478  , 3.900877  , 3.1388435 , 0.        ],
       [0.        , 0.4629493 , 0.251346  , 0.54938227, 0.        ]],
      dtype=float32), h=array([[0.        , 0.5113401 , 0.8787495 , 0.3594259 , 0.        ],
       [0.        , 0.21974793, 0.37868136, 0.14333102, 0.        ],
       [0.        , 1.6213391 , 1.3959492 , 1.379813  , 0.        ],
       [0.        , 0.27001444, 0.14599895, 0.3638716 , 0.        ]],
      dtype=float32)), 
      
      LSTMStateTuple(c=array([[0.07640141, 0.12756428, 0.        , 0.02928707, 0.11843267],
       [0.03226396, 0.0501599 , 0.        , 0.01238208, 0.04656338],
       [0.14331082, 0.5718589 , 0.        , 0.09317783, 0.0627808 ],
       [0.02764782, 0.0983929 , 0.        , 0.0092742 , 0.        ]],
      dtype=float32), h=array([[0.04328459, 0.07312859, 0.        , 0.01560373, 0.06793488],
       [0.01704519, 0.02663972, 0.        , 0.00636475, 0.02475912],
       [0.09704518, 0.3786289 , 0.        , 0.05542209, 0.04016687],
       [0.01460283, 0.05219102, 0.        , 0.00475517, 0.        ]],
      dtype=float32)))

我们的states包含三个LSTMStateTuple，每一个表示每一层的最后一个step的输出，这个输出有两个信息，一个是h表示短期记忆信息，一个是c表示长期记忆信息。维度都是[batch_size，n_neurons] = [4，5]，states的最后一个LSTMStateTuple中的h就是outputs的最后一个step的输出

参考文章：
https://blog.csdn.net/junjun150013652/article/details/81331448
https://blog.csdn.net/u010960155/article/details/81707498
https://blog.csdn.net/zhylhy520/article/details/82631736

weixin_42713739

关注

1
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
tf.nn.dynamic_rnn 代码举例详解一看就明白了

先上源码：tf.nn.dynamic_rnn( cell, inputs, sequence_length=None, initial_state=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=No...
复制链接

扫一扫