1.创建LSTMCell
LSTMCell =
tf.contrib.rnn.BasicLSTMCell(num_units)
tf.nn.rnn_cell.BasicLSTMCell(num_units)
BasicLSTMCell是最简单的一个LSTM类,没有实现clipping,projection layer,peep-hole等一些LSTM的高级变种,仅作为一个基本的basicline结构存在,如果要使用这些高级变种,需用tf.nn.rnn_cell.LSTMCell 这个类。
常用参数:
__init__(
num_units,#神经元个数
forget_bias=1.0,#偏置,加入到遗忘中,类似于遗忘阈值
state_is_tuple=True,#返回 2-tuples of the c_state and m_state,False:返回的是拼接c,m。以后被删除False
activation=None,#默认是tanh
reuse=None,
name=None
)
2.初始化
2.1 zero_state
这里还有一个状态初始化函数,就是zero_state(batch_size,dtype)两个参数。batch_size就是输入样本批次的数目,dtype就是数据类型。
zero_state(
batch_size,
dtype
)
init_state = LSTMCell.zero_state(batch_size, dtype=tf.float32)
output, final_state = tf.nn.dynamic_rnn(cell, input, initial_state=init_state, time_major=True)
2.2 LSTMStateTuple
tf.contrib.rnn.LSTMStateTuple
tf.nn.rnn_cell.LSTMStateTuple
Stores two elements: (c, h), Where c is the hidden state and h is the output.
用于保存c和h
self.c = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
self.h = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
self.initial_state = tf.contrib.rnn.LSTMStateTuple(c=self.c, h=self.h)
3.dropout
tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(_num_units), input_keep_prob=keep_prob)
tf.nn.rnn_cell.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(_num_units), input_keep_prob=keep_prob)
DropoutWrapper创建了cell之后添加dropout属性,防止过拟合。
__init__(
cell,
input_keep_prob=1.0,#输入的dropout几率
output_keep_prob=1.0,#输出的dropout几率
state_keep_prob=1.0,#state的dropout几率
variational_recurrent=False,#若为真,则说明所有时间步上应用相同的dropout,并且需要设置input_size参数。
input_size=None,
dtype=None,
seed=None
)
。
说从t-1时刻的状态传递到t时刻进行计算时,这个中间不进行memory的dropout;如下图所示,Dropout仅应用于虚线方向的输入,即仅针对于上一层的输出做Dropout。
综合上面两步骤:创建的LSTMcell为
self.decoder_cell = tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(num_units),input_keep_prob=self.keep_prob)
4.seq2seq
4.1 tf.contrib.seq2seq.TrainingHelper
Decoder的一个类,只能在训练时使用,作用是读取输入。
__init__(
inputs,#输入x的embeded, shape = [batch_size, sequence_length, embedding_size]
sequence_length,#序列长度
time_major=False,#如果是True,那么input的 shape =[sequence_length, batch_size, embedding_size]
name=None
)
示例:
helper_pt =
tf.contrib.seq2seq.TrainingHelper(
inputs=self.emb_x,
sequence_length=self.sequence_lengths,
time_major=False
)
4.2 tf.contrib.seq2seq.BasicDecoder
创建一个基础版的解码器
__init__(
cell,#创建的LSTMCell
helper,#创建的helper_pt
initial_state,#初始状态self.initial_state
output_layer=None#解码到全连接层,然后经过softmax
)
示例:
创建Dense层
from tensorflow.python.layers import core as layers_core
self.output_layer = layers_core.Dense(self.num_emb, use_bias=False)
decoder_pt = tf.contrib.seq2seq.BasicDecoder(
cell=self.decoder_cell,
helper=helper_pt,
initial_state=self.initial_state,#或者init_state
output_layer=self.output_layer
)
4.3 tf.contrib.seq2seq.dynamic_decode
构造一个动态的decoder,即根据传入的decoder实例动态解码,其内部通过Decoder对象的一次 initialize()和重复step()操作,其核心是control_flow_ops.while_loop循环
函数返回值:(final_outputs, final_state, final_sequence_lengths)
tf.contrib.seq2seq.dynamic_decode(
decoder,#一个Decoder实例,即decoder_pt
output_time_major=False,
impute_finished=False,
maximum_iterations=None,
parallel_iterations=32,
swap_memory=False,
scope=None
)
这里在解释一下time_major,在TrainingHelper中和此处,
batch major是指输入中batch_size是第一位元素,即[batch_size, sequence_length, embedding_size]
time major 是指输入中time_step是第一位元素,即[sequence_length, batch_size, embedding_size]
文档中解释: batch major tensors (this adds extra time to the computation)第二种方式计算速度更快
示例:
outputs_pt, _final_state, sequence_lengths_pt = tf.contrib.seq2seq.dynamic_decode(
decoder=decoder_pt,
output_time_major=False,
maximum_iterations=self.max_sequence_length,
swap_memory=True,
)
self.logits_pt = outputs_pt.rnn_output
self.g_predictions = tf.nn.softmax(self.logits_pt)
其中,final_outputs是一个二维的tuple = (rnn_outputs, sample_id)
rnn_output: [batch_size, sequence_length, vocab_size],RNN的输出,用于计算tf.nn.softmax(rnn_output)
sample_id: [batch_size], tf.int32,保存最终的编码结果,可以表示最后的答案。
完整示例:
def _get_cell(_num_units):
return tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(_num_units),
input_keep_prob=self.keep_prob)
with tf.variable_scope("decoder"):
self.decoder_cell = _get_cell(self.num_units)
# inital_states
self.c = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
self.h = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
self.initial_state = tf.contrib.rnn.LSTMStateTuple(c=self.c, h=self.h)
###################### pretain with targets ######################
helper_pt = tf.contrib.seq2seq.TrainingHelper(
inputs=self.emb_x,
sequence_length=self.sequence_lengths,
time_major=False,
)
decoder_pt = tf.contrib.seq2seq.BasicDecoder(
cell=self.decoder_cell,
helper=helper_pt,
initial_state=self.initial_state,
output_layer=self.output_layer
)
outputs_pt, _final_state, sequence_lengths_pt = tf.contrib.seq2seq.dynamic_decode(
decoder=decoder_pt,
output_time_major=False,
maximum_iterations=self.max_sequence_length,
swap_memory=True,
)
self.logits_pt = outputs_pt.rnn_output
self.g_predictions = tf.nn.softmax(self.logits_pt)