TensorFlow 中 RNN&LSTM 的使用

一、RNN&LSTM 基类

1、RNN 基类

class tf.contrib.rnn.BasicRNNCell(num_units, activation=None, reuse=None, name=None)
输入参数:

  • num_units: int, the number of units in the RNN cell.
  • activation: Nonlinearity to use. Default: tanh.
  • reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
  • name: String, the name of the layer. Layers with the same name will share weights, but to avoid mistakes we require reuse=True in such cases.

输出:

  • 一个隐层神经元数量为 num_units 的 RNN 基本单元(实例化的 cell)

常用属性:

  • state_size:size(s) of state(s) used by this cell,等于隐层神经元数量
  • output_size: size of outputs produced by this cell
  • 注意: 在此函数中,state_size 永远等于 output_size

常用方法:

  • call(inputs, state) 返回两个一模一样的隐层状态值
  • zero_state(batch_size, dtype) 返回一个形状为 [batch_size, state_size] 的全零张量
  • 代码示例
import tensorflow as tf

cell = tf.contrib.rnn.BasicRNNCell(num_units=128)
print(cell.state_size) # 128

inputs = tf.placeholder(shape=[32, 100], dtype=tf.float32)  # 32 是 batch_size
h0 = cell.zero_state(batch_size=32, dtype=tf.float32) # 通过 zero_state 得到一个全 0 的初始状态,形状为(batch_size, state_size)

output, h1 = cell.call(inputs=inputs, state=h0)   # 调用 call 函数, 在时间序列上推进一步
print(h1.shape) # (32, 128)
output == h1  # True

2、LSTM 基类

class tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None, name=None)
输入参数:

  • num_units: int, the number of units in the RNN cell.
  • forget_bias: float, The bias added to forget gates. Must set to 0.0 manually when restoring from CudnnLSTM-trained checkpoints.
  • 这里写图片描述
  • state_is_tuple: If True, accepted and returned states are 2-tuples of the c_state and m_state
  • activation: Nonlinearity to use. Default: tanh.
  • reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
  • name: String, the name of the layer. Layers with the same name will share weights, but to avoid mistakes we require reuse=True in such cases.

输出:

  • 一个隐层神经元数量为 num_units 的 LSTM 基本单元(实例化的 lstm_cell)
  • state_size:size(s) of state(s) used by this cell,等于隐层神经元数量
  • output_size: size of outputs produced by this cell.
  • 注意: 在此函数中,state_size 永远等于 output_size

常用方法:

  • call(inputs, state) 返回一个是 new_h,一个是 new_state(LSTMStateTuple:包含 c 和 h)
  • zero_state(batch_size, dtype) 返回一个形状为 [batch_size, state_size] 的全零张量,注意此时state_size 是 LSTMStateTuple(c=num_units , h=num_units)
  • BasicLSTMCell 的 call 函数定义
    • 返回的隐状态new_cnew_h 的组合,而 output 就是单独的 new_h
    • 如果我们处理的是分类问题,那么我们还需要对 new_h 添加单独的 Softmax 层才能得到最后的分类概率输出
new_c = c * sigmoid(f + self._forget_bias) + sigmoid(i) * self._activation(j)
new_h = self._activation(new_c) * sigmoid(o)

if self._state_is_tuple:
  new_state = LSTMStateTuple(new_c, new_h)
else:
  new_state = array_ops.concat([new_c, new_h], 1)
return new_h, new_state
  • 代码示例
import tensorflow as tf

lstm_cell = tf.contrib.rnn.BasicRNNCell(num_units=128)
print(lstm_cell.output_size)  # 128
print(lstm_cell.state_size)   # LSTMStateTuple(c=128, h=128)  

inputs = tf.placeholder(shape=[32, 100], dtype=tf.float32)  # 32 是 batch_size
h0 = lstm_cell.zero_state(batch_size=32, dtype=tf.float32) 
print(h0)
# LSTMStateTuple(c=<tf.Tensor 'BasicLSTMCellZeroState/zeros:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'BasicLSTMCellZeroState/zeros_1:0' shape=(32, 128) dtype=float32>)


new_h, new_state = lstm_cell.call(inputs=inputs, state=h0)   # 调用 call 函数, 在时间序列上推进一步
print(new_h.shape)  # (32, 128)
print(new_state.h)  # Tensor("mul_2:0", shape=(32, 128), dtype=float32)
print(new_state.c)  # Tensor("add_1:0", shape=(32, 128), dtype=float32)

二、一次执行多步:tf.nn.dynamic_rnn

目的:解决基础的 RNNCell 每次只能在时间上前进了一步的缺点。
函数:TF 提供了一个 tf.nn.dynamic_rnn 函数,使用该函数就相当于调用了 n 次call函数。即通过 (h0,x1,x2,.,xn) ( h 0 , x 1 , x 2 , … . , x n ) 直接得 (h1,h2,hn) ( h 1 , h 2 … , h n )


1、 RNN

tf.nn.dynamic_rnn(cell, inputs, initial_state=None, sequence_length=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)


输入参数:

  • cell: 一个 RNNCell 实例对象
  • inputs: RNN 的输入序列
  • initial_state: RNN 的初始状态, If cell.state_size is an integer, this must be a Tensor of appropriate type and shape [batch_size, cell.state_size]. If cell.state_size is a tuple, this should be a tuple of tensors having shapes [batch_size, s] for s in cell.state_size.
  • sequence_length: 形状为 [batch_size], 其中的每一个值为 sequence length(即 time_steps), eg:sequence_length=tf.fill([batch_size], time_steps)
  • time_major: 默认为 False,输入和输出张量的形状为 [batch_size, max_time, depth];当取 True 的时候, it avoids transposes at the beginning and end of the RNN calculation,输入和输出张量的形状为 [max_time, batch_size, depth]
  • scope: VariableScope for the created subgraph; defaults to “rnn”.

输出 (outputs, state) :

  • outputs:是 time_steps 步里所有的输出,形状为 [batch_size, max_time, cell.output_size]
  • state:是最后一步的隐状态,形状为batch_size, cell.state_size

time_major=False 时计算图中的 transpose 可视化:

这里写图片描述

2、 BLSTM

tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, initial_state_fw=None, initial_state_bw=None, sequence_length=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)


输入参数:

  • 只比上面 1 中多了一个反向的 LSTMCell 实例对象和反向的初始状态;输入 inputs 相同,只是信息是双向传递的

输出 (outputs, output_states) :

  • outputs
    • 输出是 time_steps 步里所有的输出, 它是一个元组 (output_fw, output_bw) 包含了前向和后向的输出结果,每一个结果的形状为 [batch_size, max_time, cell_fw.output_size]
    • It returns a tuple instead of a single concatenated Tensor. If the concatenated one is preferred, the forward and backward outputs can be concatenated as tf.concat(outputs, 2)
  • output_states:: 是一个元组 (output_state_fw, output_state_bw) ,包含前向和后向的最后一步的状态

三、堆叠多层:MultiRNNCell

很多时候,单层 RNN 的能力有限,我们需要多层的 RNN,在 TensorFlow 中,可以使用 tf.nn.rnn_cell.MultiRNNCell 函数对RNNCell 进行堆叠。

# 创建 2 个 LSTMCell,隐层神经元的数量分别为 128256
rnn_layers = [tf.nn.rnn_cell.LSTMCell(size) for size in [128, 256]]

# create a RNN cell composed sequentially of a number of RNNCells
multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)

# 'outputs' is a tensor of shape [batch_size, max_time, 256]
# 'state' is a N-tuple where N is the number of LSTMCells containing a
# tf.contrib.rnn.LSTMStateTuple for each cell
outputs, state = tf.nn.dynamic_rnn(cell=multi_rnn_cell,
                                   inputs=data,
                                   dtype=tf.float32)

四、参考资料

1、TensorFlow中RNN实现的正确打开方式
2、https://www.tensorflow.org/api_guides/python/contrib.rnn
3、https://www.tensorflow.org/api_guides/python/nn#Recurrent_Neural_Networks

  • 2
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
好的,下面是一个简单的LSTM模型的代码: ```python import tensorflow as tf # 定义参数 input_size = 100 hidden_size = 128 output_size = 10 batch_size = 32 sequence_length = 20 # 定义输入和输出占位符 inputs = tf.placeholder(tf.float32, [batch_size, sequence_length, input_size]) labels = tf.placeholder(tf.float32, [batch_size, output_size]) # 定义LSTM单元 lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_size) # 初始化状态 initial_state = lstm_cell.zero_state(batch_size, tf.float32) # 将输入数据转换为时间序列数据 inputs = tf.unstack(inputs, sequence_length, 1) # 定义LSTM模型 outputs, states = tf.nn.static_rnn(lstm_cell, inputs, initial_state=initial_state) # 最后一层输出 output = outputs[-1] # 定义全连接层 fc_weights = tf.Variable(tf.random_normal([hidden_size, output_size])) fc_bias = tf.Variable(tf.zeros([output_size])) logits = tf.matmul(output, fc_weights) + fc_bias # 定义损失函数 loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels)) # 定义优化器 optimizer = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss) # 训练模型 with tf.Session() as sess: # 初始化变量 sess.run(tf.global_variables_initializer()) # 迭代训练 for i in range(1000): # 生成随机数据 inputs_data = np.random.randn(batch_size, sequence_length, input_size) labels_data = np.random.randn(batch_size, output_size) # 训练模型 _, loss_val = sess.run([optimizer, loss], feed_dict={inputs: inputs_data, labels: labels_data}) print("Step %d, Loss: %f" % (i, loss_val)) ``` 上面的代码,我们使用TensorFlow的`BasicLSTMCell`作为LSTM单元,并使用`static_rnn`函数将输入数据转换为时间序列数据,最后使用全连接层将LSTM的输出转换为最终的预测结果。在训练过程,我们使用Adam优化器来最小化损失函数。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值