一、循环神经网络RNN
随着更加有效的循环神经网络结构不断提出, 循环神经网络挖掘数据中的时序信息以及语义信息的深度表达能力被充分利用,在语音识别、语言模型、机器翻译以及时序分析等方面实现了突破。
循环神经网络的总损失为所有时刻(或者部分时刻)上的损失函数的总和。
理论上循环神经网络支持任意长度的序列,然而在实际训练过程中,如果序列过长,一方面会导致优化时出现梯度消散和梯度爆炸的问题,另一方面,展开后的前馈神经网络会占用过大的内存,实际中一般会规定一个最大长度,当序列长度超过规定长度之后会对序列进行截断。
1. Tensorflow中的RNN
import tensorflow as tf
batch_size = 32
input_dim = 32
rnn_hidden_size = 32
inputs = tf.random_normal(shape=(batch_size, input_dim), dtype=tf.float32)
# 创建BasicRNNCell
rnn_cell = tf.nn.rnn_cell.BasicRNNCell(rnn_hidden_size)
# 初始化状态
init_state = rnn_cell.zero_state(batch_size, tf.float32)
print(init_state) # Tensor(shape=(32, 32))
# 进行1次运算.
output, h1 = rnn_cell(inputs, init_state)
print(output) # Tensor(shape=(32, 32))
print(h1) # Tensor(shape=(32, 32))
二、长短时记忆网络
RNN可以通过保存历史信息来帮助当前的决策,例如使用之前的单词来加强对当前文字的理解。但这也带来了一个问题:长期依赖问题。
所以提出了LSTM长短时记忆网络
1. Tensorflow中的LSTM
import tensorflow as tf
batch_size = 32
input_dim = 32
lstm_hidden_size = 32
inputs = tf.random_normal(shape=(batch_size, input_dim), dtype=tf.float32)
# 创建BasicRNNCell
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(lstm_hidden_size)
# 初始化状态
init_state = lstm_cell.zero_state(batch_size, tf.float32)
print(init_state) # LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32))
# 进行1次运算.
output, h1 = lstm_cell(inputs, init_state)
print(output) # Tensor(32, 32)
print(h1) # LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32))
Note:
BasicCell类提供了zero_state()来生成全零的初始状态.
不同于RNN,LSTM的init_state包含两个,一个是init_state.h, 另一个是init_state.c
三、多层RNN
1. Tensorflow中的MultiRNNCell
import tensorflow as tf
batch_size = 32
input_dim = 32
lstm_hidden_size = 32
number_of_layers = 2
inputs = tf.random_normal(shape=(batch_size, input_dim), dtype=tf.float32)
# 创建BasicRNNCell
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell
# 定义MultiRNNCell, 传入的是list
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(lstm_hidden_size) for _ in range(number_of_layers)])
# 初始化状态
init_state = stacked_lstm.zero_state(batch_size, tf.float32)
print(init_state)
# (LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32)), LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32)))
# 进行1次运算.
output, h1 = stacked_lstm(inputs, init_state)
print(output) # Tensor(32, 32)
print(h1)
# (LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32)), LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32)))
state是嵌套Tuple(), 并且多个.
Note: 虽然Cell有多层,但是output只取最后一层Cell的输出。详见下面源码:
def call(self, inputs, state):
"""Run this multi-layer cell on inputs, starting from state."""
cur_state_pos = 0
cur_inp = inputs
new_states = []
for i, cell in enumerate(self._cells):
with vs.variable_scope("cell_%d" % i):
if self._state_is_tuple:
if not nest.is_sequence(state):
raise ValueError(
"Expected state to be a tuple of length %d, but received: %s" %
(len(self.state_size), state))
cur_state = state[i] # 注意: 当前的state是逐层取.
else:
cur_state = array_ops.slice(state, [0, cur_state_pos],
[-1, cell.state_size])
cur_state_pos += cell.state_size
cur_inp, new_state = cell(cur_inp, cur_state) # 注意: cur_inp 使用的是 低层的输出.
new_states.append(new_state)
new_states = (
tuple(new_states) if self._state_is_tuple else array_ops.concat(
new_states, 1))
return cur_inp, new_states
四、 动态RNN
1. Tensorflow中的dynamic_rnn
如果只有RNNCell的话, 调用一次只能在序列或时间上前进一次; 但是dynamic_rnn可以实现一次调用,输出全部time steps
其实是内部封装了对RNNCell的多次调用,所以cell输入是一个instance
tf.nn.dynamic_rnn(
cell, # 必须是RNNCell的一个instance
inputs, # 如果time_major为False, shape=(batch_size, max_time, dim), 否则为(max_time, batch_size, dim)
sequence_length=None, # (batch_size,)
initial_state=None,
dtype=None,
parallel_iterations=None,
swap_memory=False,
time_major=False,
scope=None
)
return: (outputs, state)
outputs # (batch_size, max_time, dim)
state # (batch_size, dim)
import tensorflow as tf
batch_size = 32
sequence_max_len = 32
input_dim = 32
lstm_hidden_size = 32
inputs = tf.random_normal(shape=(batch_size, sequence_max_len, input_dim), dtype=tf.float32)
# 创建BasicRNNCell
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(lstm_hidden_size)
# 初始化状态
init_state = lstm_cell.zero_state(batch_size, tf.float32)
print(init_state) # LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32))
# 创建dynamic_rnn
outputs, state = tf.nn.dynamic_rnn(lstm_cell, inputs, initial_state=init_state)
print(outputs) # Tensor(32, 32)
print(state) # LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32))
2. Tensorflow中的bidirectional_dynamic_rnn
tf.nn.bidirectional_dynamic_rnn(
cell_fw, # 前向cell
cell_bw, # 后向cell
inputs, # 输入
sequence_length=None, # 输入长度
initial_state_fw=None, # 前向cell的初始状态
initial_state_bw=None, # 后向cell的初始状态
dtype=None,
parallel_iterations=None,
swap_memory=False,
time_major=False,
scope=None
)
return: (outputs, outputs_states)
outputs: A tuple(output_fw, output_bw) containing the forward and the backward rnn output Tensor
Note: 双向LSTM的output是包含两个方向的output,和Cell是否为多层无关.
import tensorflow as tf
batch_size = 32
sequence_max_len = 32
input_dim = 32
lstm_hidden_size = 32
inputs = tf.random.normal(shape=(batch_size, sequence_max_len, input_dim), dtype=tf.float32)
# 创建BasicRNNCell
fw_lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(lstm_hidden_size)
bw_lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(lstm_hidden_size)
# 初始化状态
fw_init_state = fw_lstm_cell.zero_state(batch_size, tf.float32)
bw_init_state = bw_lstm_cell.zero_state(batch_size, tf.float32)
print(fw_init_state) # LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32))
print(bw_init_state) # LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32))
# 创建dynamic_rnn
outputs, state = tf.nn.bidirectional_dynamic_rnn(fw_lstm_cell, bw_lstm_cell, inputs,
initial_state_fw=fw_init_state, initial_state_bw=bw_init_state)
print(outputs) # (Tensor(32, 32, 32), Tensor(32, 32, 32))
print(state) # (LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32)), LSTMStateTuple(c = Tensor(32, 32), h = Tensor(32, 32)))
五、针对RNNCell的封装Wrapper
由于Wrapper继承自RNNCell, 所以被它封装后依然是一个RNNCell类.
1.Tensorflow中的DropoutWrapper
# 定义lstm cell
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell
# 定义多层RNNCell
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([tf.nn.rnn_cell.DropoutWrapper(lstm_cell(lstm_size)) for _ in range(number_of_layers)])
2.Tensorflow中的AttentionWrapper
# 定义fw_cell 和 bw_cell
enc_cell_fw = tf.nn.rnn_cell.BasicLSTMCell(hidden_size)
enc_cell_bw = tf.nn.rnn_cell.BasicLSTMCell(hidden_size)
# 定义dec_cell
dec_cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_size)
# encoder部分
with tf.variable_scope("encoder"):
enc_outputs, enc_state = tf.nn.bidirectional_dynamic_rnn(enc_cell_fw, enc_cell_bw, inputs, inputs_lengths, dtype=tf.float32)
# 将两个LSTM的输出拼接为一个张量。
enc_outputs = tf.concat([enc_outputs[0], enc_outputs[1]], -1)
with tf.variable_scope("decoder"):
# attention_mechanism.
attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(hidden_size, enc_outputs, memory_seque ce_length=inputs_lengths)
# 将attention_mechanism和cell封装在一起.
attention_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, attention_mechanism, attention_layer_size=hidden_size)
# 构造decoder.
# 注意: 这是Train模式, 如果Infer需要写循环.
dec_outputs, _ = tf.nn.dynamic_rnn(attention_cell, tgt_inpust, tgt_lengths, dtype=tf.float32)