tf.nn.dynamic_rnn 代码举例详解 一看就明白了

先上源码:

tf.nn.dynamic_rnn(
    cell,
    inputs,
    sequence_length=None,
    initial_state=None,
    dtype=None,
    parallel_iterations=None,
    swap_memory=False,
    time_major=False,
    scope=None
)

batch_size: 是输入的这批数据的数量。一般是矩阵的第一个维度的大小。

max_time: 就是这批数据中序列的最长长度,如果输入的三个句子,那max_time对应的就是最长句 子的单词数量,cell.output_size其实就是rnn cell中神经元的个数

outputs : outputs是一个tensor,如果time_major 为True,outputs形状为 [max_time, batch_size, cell.output_size ](要求rnn输入与rnn输出形状保持一致),如果time_major==False(默认),outputs形状为 [ batch_size, max_time, cell.output_size ]

state: state是一个tensor。state是最终的状态,也就是序列中最后一个cell输出的状态。一般情况下state的形状为 [batch_size, cell.output_size ],但当输入的cell为BasicLSTMCell时,state的形状为[2,batch_size, cell.output_size ],其中2也对应着LSTM中的cell state和hidden state

state含义

对于第一问题“state”形状为什么会发生变化呢?

我们以LSTM和GRU分别为tf.nn.dynamic_rnn的输入cell类型为例,当cell为LSTM,state形状为[2,batch_size, cell.output_size ];当cell为GRU时,state形状为[batch_size, cell.output_size ]。其原因是因为LSTM和GRU的结构本身不同,如下面两个图所示,这是LSTM的cell结构,每个cell会有两个输出Ct 和 Ht,上面这个图是输出Ct代表哪些信息应该被记住哪些应该被遗忘; 下面这个图是输出Ht,代表这个cell的最终输出,LSTM的state是由Ct 和 Ht组成的。

在这里插入图片描述
在这里插入图片描述
当cell为GRU时,state就只有一个了,原因是GRU将Ct 和 Ht进行了简化,将其合并成了Ht,如下图所示,GRU将遗忘门和输入门合并成了更新门,另外cell不在有细胞状态cell state,只有hidden state。

在这里插入图片描述
对于第二个问题outputs和state有什么关系?
结论上来说,如果cell为LSTM,那 state是个tuple,分别代表Ct 和 Ht,其中 Ht与outputs中的对应的最后一个时刻的输出相等,假设state形状为[ 2,batch_size, cell.output_size ],outputs形状为 [ batch_size, max_time, cell.output_size ],那么state[ 1, batch_size, : ] == outputs[ batch_size, -1, : ];如果cell为GRU,那么同理,state其实就是 Ht,state ==outputs[ -1 ]

代码举例

单个隐藏层

import tensorflow as tf
import numpy as np
 
n_steps = 2
n_inputs = 3
n_neurons = 5 #神经元的个数

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)

seq_length = tf.placeholder(tf.int32, [None])
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32,
                                    sequence_length=seq_length)

init = tf.global_variables_initializer()

X_batch = np.array([
    # step 0     step 1
    [[0, 1, 2], [9, 8, 7]],  # instance 1
    [[3, 4, 5], [0, 0, 0]],  # instance 2 (padded with zero vectors)
    [[6, 7, 8], [6, 5, 4]],  # instance 3
    [[9, 0, 1], [3, 2, 1]],  # instance 4
])
print(X_batch.shape) #[4,2,3]
seq_length_batch = np.array([2, 1, 2, 2]) #必须是一维矩阵并且元素的个数必须是X_batch矩阵的第一维的大小

with tf.Session() as sess:
    init.run()
    outputs_val, states_val = sess.run(
        [outputs, states], feed_dict={X: X_batch, seq_length: seq_length_batch})
    print("outputs_val.shape:", outputs_val.shape)
    print("states_val.shape:", states_val.shape)
    print("outputs_val:", outputs_val) 
    print("states_val:", states_val)

输出结果是:

(4, 2, 3)
outputs_val.shape: (4, 2, 5)
states_val.shape: (4, 5)
outputs_val: [[[ 0.50321156 -0.41240135 -0.84440345 -0.02686742 -0.8964557 ]
  [ 0.9573825  -0.99886006 -0.9999997   0.991583   -1.        ]]

 [[ 0.92471015 -0.93656176 -0.9996478   0.5771828  -0.9999348 ]
  [ 0.          0.          0.          0.          0.        ]]

 [[ 0.99078256 -0.9948545  -0.99999946  0.87246966 -0.99999994]
  [ 0.58877105 -0.9849114  -0.9999669   0.9589857  -0.99999225]]

 [[-0.0043383  -0.9785873  -0.99291325 -0.7907419  -0.99364054]
  [-0.48218966 -0.94694626 -0.8577129   0.88309634 -0.9994416 ]]]
  
states_val: [[ 0.9573825  -0.99886006 -0.9999997   0.991583   -1.        ]
 [ 0.92471015 -0.93656176 -0.9996478   0.5771828  -0.9999348 ]
 [ 0.58877105 -0.9849114  -0.9999669   0.9589857  -0.99999225]
 [-0.48218966 -0.94694626 -0.8577129   0.88309634 -0.9994416 ]]

首先输入X是一个 [batch_size,step,input_size] = [4,2,3] 的tensor,注意我们这里调用的是BasicRNNCell(同GRUCell),只有一层循环网络,经过tf.nn.dynamic_rnn后outputs的形状为 [ 4, 2, 5 ],state形状为 [ 4, 5 ]。可以看到 state 与 对应的outputs的最后一行是相等的。

多个隐藏层的情况

n_steps = 2
n_inputs = 3
n_neurons = 5
n_layers = 3

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
seq_length = tf.placeholder(tf.int32, [None])

layers = [tf.contrib.rnn.BasicRNNCell(num_units=n_neurons,
                                      activation=tf.nn.relu)
          for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32, sequence_length=seq_length)

init = tf.global_variables_initializer()

X_batch = np.array([
    # step 0     step 1
    [[0, 1, 2], [9, 8, 7]],  # instance 1
    [[3, 4, 5], [0, 0, 0]],  # instance 2 (padded with zero vectors)
    [[6, 7, 8], [6, 5, 4]],  # instance 3
    [[9, 0, 1], [3, 2, 1]],  # instance 4
])

seq_length_batch = np.array([2, 1, 2, 2])

with tf.Session() as sess:
    init.run()
    outputs_val, states_val = sess.run(
        [outputs, states], feed_dict={X: X_batch, seq_length: seq_length_batch})
    print("outputs_val.shape:", outputs)
    print("states_val.shape:", states)
    print("outputs_val:", outputs_val)
    print("states_val:", states_val)

输出结果:

outputs_val.shape: Tensor("rnn/transpose_1:0", shape=(?, 2, 5), dtype=float32)
states_val.shape: (<tf.Tensor 'rnn/while/Exit_3:0' shape=(?, 5) dtype=float32>, 
                              <tf.Tensor 'rnn/while/Exit_4:0' shape=(?, 5) dtype=float32>, 
                              <tf.Tensor 'rnn/while/Exit_5:0' shape=(?, 5) dtype=float32>)
                              
outputs_val: [[[0.         0.         0.2550385  0.12092578 0.40954068]
  [0.         0.         1.896489   0.6294941  3.8094907 ]]

 [[0.         0.         1.0293288  0.4813519  1.824276  ]
  [0.         0.         0.         0.         0.        ]]

 [[0.         0.         1.803619   0.84177816 3.2390108 ]
  [0.         0.78937817 0.         0.         4.249421  ]]

 [[0.         0.         2.3075228  0.9682029  3.3698235 ]
  [0.         2.1534934  0.         0.         2.5193954 ]]]
  
states_val: (array([[0.        , 5.516548  , 5.836305  , 0.        , 1.5091596 ],
       [0.        , 2.882547  , 2.6163056 , 0.        , 0.77435017],
       [0.        , 1.4631326 , 1.4441929 , 0.        , 2.7281032 ],
       [3.9093783 , 0.        , 0.        , 0.        , 3.2143312 ]],
      dtype=float32), 
      array([[0.        , 1.8190625 , 2.4112606 , 0.        , 2.3922706 ],
       [0.        , 1.048449  , 1.0651987 , 0.        , 1.2866479 ],
       [2.540713  , 0.        , 2.0854363 , 0.        , 0.15023205],
       [1.324711  , 0.        , 0.        , 0.        , 0.        ]],
      dtype=float32), 
      array([[0.        , 0.        , 1.896489  , 0.6294941 , 3.8094907 ],
       [0.        , 0.        , 1.0293288 , 0.4813519 , 1.824276  ],
       [0.        , 0.78937817, 0.        , 0.        , 4.249421  ],
       [0.        , 2.1534934 , 0.        , 0.        , 2.5193954 ]],
      dtype=float32))

我们说过,outputs是最后一层的输出,即 [batch_size,step,n_neurons] = [4,2,5]
states是每一层的最后一个step的输出,即三个结构为 [batch_size,n_neurons] = [4,5] 的tensor
继续观察数据,states中的最后一个array,正好是outputs的最后那个step的输出

BasicLSTMCell 多个隐藏层

n_steps = 2
n_inputs = 3
n_neurons = 5
n_layers = 3

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
seq_length = tf.placeholder(tf.int32, [None])

#cell = tf.contrib.rnn.BasicLSTMCell(num_units=rnn_hidden_size, state_is_tuple=True)
layers = [tf.contrib.rnn.BasicLSTMCell(num_units=n_neurons,
                                      activation=tf.nn.relu)
          for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32, sequence_length=seq_length)

init = tf.global_variables_initializer()

X_batch = np.array([
    # step 0     step 1
    [[0, 1, 2], [9, 8, 7]],  # instance 1
    [[3, 4, 5], [0, 0, 0]],  # instance 2 (padded with zero vectors)
    [[6, 7, 8], [6, 5, 4]],  # instance 3
    [[9, 0, 1], [3, 2, 1]],  # instance 4
])

seq_length_batch = np.array([2, 1, 2, 2])

with tf.Session() as sess:
    init.run()
    outputs_val, states_val = sess.run(
        [outputs, states], feed_dict={X: X_batch, seq_length: seq_length_batch})
    print("outputs_val.shape:", outputs)
    print("states_val.shape:", states)
    print("outputs_val:", outputs_val)
    print("states_val:", states_val)

输出结果:

outputs_val.shape: Tensor("rnn/transpose_1:0", shape=(?, 2, 5), dtype=float32)

states_val.shape:
 (LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_3:0' shape=(?, 5) dtype=float32>, 
 							 h=<tf.Tensor 'rnn/while/Exit_4:0' shape=(?, 5) dtype=float32>),                      
 LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_5:0' shape=(?, 5) dtype=float32>, 
 							h=<tf.Tensor 'rnn/while/Exit_6:0' shape=(?, 5) dtype=float32>), 
 LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_7:0' shape=(?, 5) dtype=float32>, 
 							h=<tf.Tensor 'rnn/while/Exit_8:0' shape=(?, 5) dtype=float32>))
 							
outputs_val: [[[0.00209509 0.00182596 0.         0.00067289 0.00347673]
  [0.04328459 0.07312859 0.         0.01560373 0.06793488]]

 [[0.01704519 0.02663972 0.         0.00636475 0.02475912]
  [0.         0.         0.         0.         0.        ]]

 [[0.03605811 0.0760659  0.         0.01464115 0.04976365]
  [0.09704518 0.3786289  0.         0.05542209 0.04016687]]

 [[0.         0.         0.         0.         0.        ]
  [0.01460283 0.05219102 0.         0.00475517 0.        ]]]
  
states_val: (LSTMStateTuple(c=array([[4.2300582e+00, 6.1886315e-03, 1.9541669e+00, 4.7516477e-01,
        7.1493559e+00],
       [2.2236171e+00, 0.0000000e+00, 4.6114996e-01, 0.0000000e+00,
        2.5368762e+00],
       [3.3151534e+00, 0.0000000e+00, 9.3500906e-01, 1.7955723e+00,
        9.6494026e+00],
       [1.9809818e+00, 0.0000000e+00, 3.0963078e+00, 1.6501746e+00,
        4.6886072e+00]], dtype=float32), h=array([[4.2053738e+00, 3.8800889e-03, 1.9386519e+00, 4.7059658e-01,
        3.4508269e+00],
       [2.0354471e+00, 0.0000000e+00, 4.3142736e-01, 0.0000000e+00,
        1.7041360e+00],
       [3.1316614e+00, 0.0000000e+00, 7.4865651e-01, 1.7757093e+00,
        6.9275556e+00],
       [1.4433841e+00, 0.0000000e+00, 2.3309155e+00, 1.5163686e+00,
        2.4487319e+00]], dtype=float32)),
        
         LSTMStateTuple(c=array([[0.        , 0.89624435, 1.9348668 , 0.699529  , 0.        ],
       [0.        , 0.41140077, 0.85579544, 0.31884393, 0.        ],
       [0.        , 2.424478  , 3.900877  , 3.1388435 , 0.        ],
       [0.        , 0.4629493 , 0.251346  , 0.54938227, 0.        ]],
      dtype=float32), h=array([[0.        , 0.5113401 , 0.8787495 , 0.3594259 , 0.        ],
       [0.        , 0.21974793, 0.37868136, 0.14333102, 0.        ],
       [0.        , 1.6213391 , 1.3959492 , 1.379813  , 0.        ],
       [0.        , 0.27001444, 0.14599895, 0.3638716 , 0.        ]],
      dtype=float32)), 
      
      LSTMStateTuple(c=array([[0.07640141, 0.12756428, 0.        , 0.02928707, 0.11843267],
       [0.03226396, 0.0501599 , 0.        , 0.01238208, 0.04656338],
       [0.14331082, 0.5718589 , 0.        , 0.09317783, 0.0627808 ],
       [0.02764782, 0.0983929 , 0.        , 0.0092742 , 0.        ]],
      dtype=float32), h=array([[0.04328459, 0.07312859, 0.        , 0.01560373, 0.06793488],
       [0.01704519, 0.02663972, 0.        , 0.00636475, 0.02475912],
       [0.09704518, 0.3786289 , 0.        , 0.05542209, 0.04016687],
       [0.01460283, 0.05219102, 0.        , 0.00475517, 0.        ]],
      dtype=float32)))

我们的states包含三个LSTMStateTuple,每一个表示每一层的最后一个step的输出,这个输出有两个信息,一个是h表示短期记忆信息,一个是c表示长期记忆信息。维度都是[batch_size,n_neurons] = [4,5],states的最后一个LSTMStateTuple中的h就是outputs的最后一个step的输出

参考文章:
https://blog.csdn.net/junjun150013652/article/details/81331448
https://blog.csdn.net/u010960155/article/details/81707498
https://blog.csdn.net/zhylhy520/article/details/82631736

  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值