# TensorFlow 中 RNN&LSTM 的使用

## 一、RNN&LSTM 基类

### 1、RNN 基类

class tf.contrib.rnn.BasicRNNCell(num_units, activation=None, reuse=None, name=None)

• num_units： int, the number of units in the RNN cell.
• activation： Nonlinearity to use. Default: tanh.
• reuse： (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
• name： String, the name of the layer. Layers with the same name will share weights, but to avoid mistakes we require reuse=True in such cases.

• 一个隐层神经元数量为 num_units 的 RNN 基本单元（实例化的 cell）

• state_size：size(s) of state(s) used by this cell，等于隐层神经元数量
• output_size： size of outputs produced by this cell
• 注意： 在此函数中，state_size 永远等于 output_size

• call(inputs, state) 返回两个一模一样的隐层状态值
• zero_state(batch_size, dtype) 返回一个形状为 [batch_size, state_size] 的全零张量
• 代码示例
import tensorflow as tf

cell = tf.contrib.rnn.BasicRNNCell(num_units=128)
print(cell.state_size) # 128

inputs = tf.placeholder(shape=[32, 100], dtype=tf.float32)  # 32 是 batch_size
h0 = cell.zero_state(batch_size=32, dtype=tf.float32) # 通过 zero_state 得到一个全 0 的初始状态，形状为(batch_size, state_size)

output, h1 = cell.call(inputs=inputs, state=h0)   # 调用 call 函数, 在时间序列上推进一步
print(h1.shape) # (32, 128)
output == h1  # True

### 2、LSTM 基类

class tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None, name=None)

• num_units： int, the number of units in the RNN cell.
• forget_bias: float, The bias added to forget gates. Must set to 0.0 manually when restoring from CudnnLSTM-trained checkpoints.
• state_is_tuple: If True, accepted and returned states are 2-tuples of the c_state and m_state
• activation： Nonlinearity to use. Default: tanh.
• reuse： (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
• name： String, the name of the layer. Layers with the same name will share weights, but to avoid mistakes we require reuse=True in such cases.

• 一个隐层神经元数量为 num_units 的 LSTM 基本单元（实例化的 lstm_cell）
• state_size：size(s) of state(s) used by this cell，等于隐层神经元数量
• output_size： size of outputs produced by this cell.
• 注意： 在此函数中，state_size 永远等于 output_size

• call(inputs, state) 返回一个是 new_h，一个是 new_state（LSTMStateTuple：包含 c 和 h）
• zero_state(batch_size, dtype) 返回一个形状为 [batch_size, state_size] 的全零张量，注意此时state_size 是 LSTMStateTuple(c=num_units , h=num_units)
• BasicLSTMCell 的 call 函数定义
• 返回的隐状态new_cnew_h 的组合，而 output 就是单独的 new_h
• 如果我们处理的是分类问题，那么我们还需要对 new_h 添加单独的 Softmax 层才能得到最后的分类概率输出
new_c = c * sigmoid(f + self._forget_bias) + sigmoid(i) * self._activation(j)
new_h = self._activation(new_c) * sigmoid(o)

if self._state_is_tuple:
new_state = LSTMStateTuple(new_c, new_h)
else:
new_state = array_ops.concat([new_c, new_h], 1)
return new_h, new_state
• 代码示例
import tensorflow as tf

lstm_cell = tf.contrib.rnn.BasicRNNCell(num_units=128)
print(lstm_cell.output_size)  # 128
print(lstm_cell.state_size)   # LSTMStateTuple(c=128, h=128)

inputs = tf.placeholder(shape=[32, 100], dtype=tf.float32)  # 32 是 batch_size
h0 = lstm_cell.zero_state(batch_size=32, dtype=tf.float32)
print(h0)
# LSTMStateTuple(c=<tf.Tensor 'BasicLSTMCellZeroState/zeros:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'BasicLSTMCellZeroState/zeros_1:0' shape=(32, 128) dtype=float32>)

new_h, new_state = lstm_cell.call(inputs=inputs, state=h0)   # 调用 call 函数, 在时间序列上推进一步
print(new_h.shape)  # (32, 128)
print(new_state.h)  # Tensor("mul_2:0", shape=(32, 128), dtype=float32)
print(new_state.c)  # Tensor("add_1:0", shape=(32, 128), dtype=float32)

## 二、一次执行多步：tf.nn.dynamic_rnn

### 1、 RNN

tf.nn.dynamic_rnn(cell, inputs, initial_state=None, sequence_length=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)

• cell： 一个 RNNCell 实例对象
• inputs： RNN 的输入序列
• initial_state： RNN 的初始状态， If cell.state_size is an integer, this must be a Tensor of appropriate type and shape [batch_size, cell.state_size]. If cell.state_size is a tuple, this should be a tuple of tensors having shapes [batch_size, s] for s in cell.state_size.
• sequence_length： 形状为 [batch_size]， 其中的每一个值为 sequence length（即 time_steps）， eg：sequence_length=tf.fill([batch_size], time_steps)
• time_major： 默认为 False，输入和输出张量的形状为 [batch_size, max_time, depth]；当取 True 的时候， it avoids transposes at the beginning and end of the RNN calculation，输入和输出张量的形状为 [max_time, batch_size, depth]
• scope： VariableScope for the created subgraph; defaults to “rnn”.

• outputs：是 time_steps 步里所有的输出，形状为 [batch_size, max_time, cell.output_size]
• state：是最后一步的隐状态，形状为batch_size, cell.state_size

time_major=False 时计算图中的 transpose 可视化：

### 2、 BLSTM

tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, initial_state_fw=None, initial_state_bw=None, sequence_length=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)

• 只比上面 1 中多了一个反向的 LSTMCell 实例对象和反向的初始状态；输入 inputs 相同，只是信息是双向传递的

• outputs
• 输出是 time_steps 步里所有的输出， 它是一个元组 (output_fw, output_bw) 包含了前向和后向的输出结果，每一个结果的形状为 [batch_size, max_time, cell_fw.output_size]
• It returns a tuple instead of a single concatenated Tensor. If the concatenated one is preferred, the forward and backward outputs can be concatenated as tf.concat(outputs, 2)
• output_states：: 是一个元组 (output_state_fw, output_state_bw) ，包含前向和后向的最后一步的状态

## 三、堆叠多层：MultiRNNCell

# 创建 2 个 LSTMCell，隐层神经元的数量分别为 128 和 256
rnn_layers = [tf.nn.rnn_cell.LSTMCell(size) for size in [128, 256]]

# create a RNN cell composed sequentially of a number of RNNCells
multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)

# 'outputs' is a tensor of shape [batch_size, max_time, 256]
# 'state' is a N-tuple where N is the number of LSTMCells containing a
# tf.contrib.rnn.LSTMStateTuple for each cell
outputs, state = tf.nn.dynamic_rnn(cell=multi_rnn_cell,
inputs=data,
dtype=tf.float32)

