TensorFlow 中 RNN&LSTM 的使用

一、RNN&LSTM 基类

1、RNN 基类

class tf.contrib.rnn.BasicRNNCell(num_units, activation=None, reuse=None, name=None)
输入参数:

  • num_units: int, the number of units in the RNN cell.
  • activation: Nonlinearity to use. Default: tanh.
  • reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
  • name: String, the name of the layer. Layers with the same name will share weights, but to avoid mistakes we require reuse=True in such cases.

输出:

  • 一个隐层神经元数量为 num_units 的 RNN 基本单元(实例化的 cell)

常用属性:

  • state_size:size(s) of state(s) used by this cell,等于隐层神经元数量
  • output_size: size of outputs produced by this cell
  • 注意: 在此函数中,state_size 永远等于 output_size

常用方法:

  • call(inputs, state) 返回两个一模一样的隐层状态值
  • zero_state(batch_size, dtype) 返回一个形状为 [batch_size, state_size] 的全零张量
  • 代码示例
import tensorflow as tf

cell = tf.contrib.rnn.BasicRNNCell(num_units=128)
print(cell.state_size) # 128

inputs = tf.placeholder(shape=[32, 100], dtype=tf.float32)  # 32 是 batch_size
h0 = cell.zero_state(batch_size=32, dtype=tf.float32) # 通过 zero_state 得到一个全 0 的初始状态,形状为(batch_size, state_size)

output, h1 = cell.call(inputs=inputs, state=h0)   # 调用 call 函数, 在时间序列上推进一步
print(h1.shape) # (32, 128)
output == h1  # True

2、LSTM 基类

class tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None, name=None)
输入参数:

  • num_units: int, the number of units in the RNN cell.
  • forget_bias: float, The bias added to forget gates. Must set to 0.0 manually when restoring from CudnnLSTM-trained checkpoints.
  • 这里写图片描述
  • state_is_tuple: If True, accepted and returned states are 2-tuples of the c_state and m_state
  • activation: Nonlinearity to use. Default: tanh.
  • reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
  • name: String, the name of the layer. Layers with the same name will share weights, but to avoid mistakes we require reuse=True in such cases.

输出:

  • 一个隐层神经元数量为 num_units 的 LSTM 基本单元(实例化的 lstm_cell)
  • state_size:size(s) of state(s) used by this cell,等于隐层神经元数量
  • output_size: size of outputs produced by this cell.
  • 注意: 在此函数中,state_size 永远等于 output_size

常用方法:

  • call(inputs, state) 返回一个是 new_h,一个是 new_state(LSTMStateTuple:包含 c 和 h)
  • zero_state(batch_size, dtype) 返回一个形状为 [batch_size, state_size] 的全零张量,注意此时state_size 是 LSTMStateTuple(c=num_units , h=num_units)
  • BasicLSTMCell 的 call 函数定义
    • 返回的隐状态new_cnew_h 的组合,而 output 就是单独的 new_h
    • 如果我们处理的是分类问题,那么我们还需要对 new_h 添加单独的 Softmax 层才能得到最后的分类概率输出
new_c = c * sigmoid(f + self._forget_bias) + sigmoid(i) * self._activation(j)
new_h = self._activation(new_c) * sigmoid(o)

if self._state_is_tuple:
  new_state = LSTMStateTuple(new_c, new_h)
else:
  new_state = array_ops.concat([new_c, new_h], 1)
return new_h, new_state
  • 代码示例
import tensorflow as tf

lstm_cell = tf.contrib.rnn.BasicRNNCell(num_units=128)
print(lstm_cell.output_size)  # 128
print(lstm_cell.state_size)   # LSTMStateTuple(c=128, h=128)  

inputs = tf.placeholder(shape=[32, 100], dtype=tf.float32)  # 32 是 batch_size
h0 = lstm_cell.zero_state(batch_size=32, dtype=tf.float32) 
print(h0)
# LSTMStateTuple(c=<tf.Tensor 'BasicLSTMCellZeroState/zeros:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'BasicLSTMCellZeroState/zeros_1:0' shape=(32, 128) dtype=float32>)


new_h, new_state = lstm_cell.call(inputs=inputs, state=h0)   # 调用 call 函数, 在时间序列上推进一步
print(new_h.shape)  # (32, 128)
print(new_state.h)  # Tensor("mul_2:0", shape=(32, 128), dtype=float32)
print(new_state.c)  # Tensor("add_1:0", shape=(32, 128), dtype=float32)

二、一次执行多步:tf.nn.dynamic_rnn

目的:解决基础的 RNNCell 每次只能在时间上前进了一步的缺点。
函数:TF 提供了一个 tf.nn.dynamic_rnn 函数,使用该函数就相当于调用了 n 次call函数。即通过 (h0,x1,x2,.,xn) ( h 0 , x 1 , x 2 , … . , x n ) 直接得 (h1,h2,hn) ( h 1 , h 2 … , h n )


1、 RNN

tf.nn.dynamic_rnn(cell, inputs, initial_state=None, sequence_length=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)


输入参数:

  • cell: 一个 RNNCell 实例对象
  • inputs: RNN 的输入序列
  • initial_state: RNN 的初始状态, If cell.state_size is an integer, this must be a Tensor of appropriate type and shape [batch_size, cell.state_size]. If cell.state_size is a tuple, this should be a tuple of tensors having shapes [batch_size, s] for s in cell.state_size.
  • sequence_length: 形状为 [batch_size], 其中的每一个值为 sequence length(即 time_steps), eg:sequence_length=tf.fill([batch_size], time_steps)
  • time_major: 默认为 False,输入和输出张量的形状为 [batch_size, max_time, depth];当取 True 的时候, it avoids transposes at the beginning and end of the RNN calculation,输入和输出张量的形状为 [max_time, batch_size, depth]
  • scope: VariableScope for the created subgraph; defaults to “rnn”.

输出 (outputs, state) :

  • outputs:是 time_steps 步里所有的输出,形状为 [batch_size, max_time, cell.output_size]
  • state:是最后一步的隐状态,形状为batch_size, cell.state_size

time_major=False 时计算图中的 transpose 可视化:

这里写图片描述

2、 BLSTM

tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, initial_state_fw=None, initial_state_bw=None, sequence_length=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)


输入参数:

  • 只比上面 1 中多了一个反向的 LSTMCell 实例对象和反向的初始状态;输入 inputs 相同,只是信息是双向传递的

输出 (outputs, output_states) :

  • outputs
    • 输出是 time_steps 步里所有的输出, 它是一个元组 (output_fw, output_bw) 包含了前向和后向的输出结果,每一个结果的形状为 [batch_size, max_time, cell_fw.output_size]
    • It returns a tuple instead of a single concatenated Tensor. If the concatenated one is preferred, the forward and backward outputs can be concatenated as tf.concat(outputs, 2)
  • output_states:: 是一个元组 (output_state_fw, output_state_bw) ,包含前向和后向的最后一步的状态

三、堆叠多层:MultiRNNCell

很多时候,单层 RNN 的能力有限,我们需要多层的 RNN,在 TensorFlow 中,可以使用 tf.nn.rnn_cell.MultiRNNCell 函数对RNNCell 进行堆叠。

# 创建 2 个 LSTMCell,隐层神经元的数量分别为 128256
rnn_layers = [tf.nn.rnn_cell.LSTMCell(size) for size in [128, 256]]

# create a RNN cell composed sequentially of a number of RNNCells
multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)

# 'outputs' is a tensor of shape [batch_size, max_time, 256]
# 'state' is a N-tuple where N is the number of LSTMCells containing a
# tf.contrib.rnn.LSTMStateTuple for each cell
outputs, state = tf.nn.dynamic_rnn(cell=multi_rnn_cell,
                                   inputs=data,
                                   dtype=tf.float32)

四、参考资料

1、TensorFlow中RNN实现的正确打开方式
2、https://www.tensorflow.org/api_guides/python/contrib.rnn
3、https://www.tensorflow.org/api_guides/python/nn#Recurrent_Neural_Networks

已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页