tf.contrib.seq2seq系列LSTMCell

最新推荐文章于 2021-12-02 01:57:06 发布

wn87947

最新推荐文章于 2021-12-02 01:57:06 发布

阅读量1.5k

点赞数 2

分类专栏： tensorflow API详解与示例文章标签： tensorflow rnn LSTMCell seq2seq

本文链接：https://blog.csdn.net/wn87947/article/details/82254034

版权

tensorflow API详解与示例专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1.创建LSTMCell

LSTMCell =

tf.contrib.rnn.BasicLSTMCell(num_units)
tf.nn.rnn_cell.BasicLSTMCell(num_units)

BasicLSTMCell是最简单的一个LSTM类，没有实现clipping，projection layer，peep-hole等一些LSTM的高级变种，仅作为一个基本的basicline结构存在，如果要使用这些高级变种，需用tf.nn.rnn_cell.LSTMCell 这个类。
常用参数：

__init__(
    num_units,#神经元个数
    forget_bias=1.0,#偏置，加入到遗忘中，类似于遗忘阈值
    state_is_tuple=True,#返回 2-tuples of the c_state and m_state，False:返回的是拼接c,m。以后被删除False
    activation=None,#默认是tanh
    reuse=None,
    name=None
)

2.初始化

2.1 zero_state

这里还有一个状态初始化函数，就是zero_state（batch_size，dtype）两个参数。batch_size就是输入样本批次的数目，dtype就是数据类型。

zero_state(
    batch_size,
    dtype
)

init_state = LSTMCell.zero_state(batch_size, dtype=tf.float32)
output, final_state = tf.nn.dynamic_rnn(cell, input, initial_state=init_state, time_major=True)

2.2 LSTMStateTuple

LSTMStateTuple

tf.contrib.rnn.LSTMStateTuple
tf.nn.rnn_cell.LSTMStateTuple

Stores two elements: (c, h)， Where c is the hidden state and h is the output.
用于保存c和h

self.c = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
self.h = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
self.initial_state = tf.contrib.rnn.LSTMStateTuple(c=self.c, h=self.h)

3.dropout

tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(_num_units), input_keep_prob=keep_prob)
tf.nn.rnn_cell.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(_num_units), input_keep_prob=keep_prob)

DropoutWrapper创建了cell之后添加dropout属性，防止过拟合。

__init__(
    cell,
    input_keep_prob=1.0,#输入的dropout几率 
    output_keep_prob=1.0,#输出的dropout几率 
    state_keep_prob=1.0,#state的dropout几率
    variational_recurrent=False,#若为真，则说明所有时间步上应用相同的dropout，并且需要设置input_size参数。
    input_size=None,
    dtype=None,
    seed=None
)
。

说从t-1时刻的状态传递到t时刻进行计算时，这个中间不进行memory的dropout；如下图所示，Dropout仅应用于虚线方向的输入，即仅针对于上一层的输出做Dropout。
dropout方向
综合上面两步骤：创建的LSTMcell为

self.decoder_cell = tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(num_units),input_keep_prob=self.keep_prob)

4.seq2seq

4.1 tf.contrib.seq2seq.TrainingHelper

Decoder的一个类，只能在训练时使用，作用是读取输入。

__init__(
    inputs,#输入x的embeded, shape = [batch_size, sequence_length, embedding_size] 
    sequence_length,#序列长度
    time_major=False,#如果是True,那么input的 shape =[sequence_length, batch_size, embedding_size] 
    name=None
)

示例：

helper_pt =
tf.contrib.seq2seq.TrainingHelper(
                inputs=self.emb_x,
                sequence_length=self.sequence_lengths,
                time_major=False
                 )

4.2 tf.contrib.seq2seq.BasicDecoder

创建一个基础版的解码器

__init__(
    cell,#创建的LSTMCell
    helper,#创建的helper_pt
    initial_state,#初始状态self.initial_state
    output_layer=None#解码到全连接层，然后经过softmax
)

示例：
创建Dense层

 from tensorflow.python.layers import core as layers_core
 self.output_layer = layers_core.Dense(self.num_emb, use_bias=False)

decoder_pt = tf.contrib.seq2seq.BasicDecoder(
    cell=self.decoder_cell,
    helper=helper_pt,
    initial_state=self.initial_state,#或者init_state
    output_layer=self.output_layer
)

4.3 tf.contrib.seq2seq.dynamic_decode

构造一个动态的decoder，即根据传入的decoder实例动态解码，其内部通过Decoder对象的一次 initialize()和重复step()操作，其核心是control_flow_ops.while_loop循环
函数返回值：(final_outputs, final_state, final_sequence_lengths)

tf.contrib.seq2seq.dynamic_decode(
    decoder,#一个Decoder实例，即decoder_pt
    output_time_major=False,
    impute_finished=False,
    maximum_iterations=None,
    parallel_iterations=32,
    swap_memory=False,
    scope=None
)

这里在解释一下time_major，在TrainingHelper中和此处，
batch major是指输入中batch_size是第一位元素，即[batch_size, sequence_length, embedding_size]
time major 是指输入中time_step是第一位元素，即[sequence_length, batch_size, embedding_size]
文档中解释： batch major tensors (this adds extra time to the computation）第二种方式计算速度更快

示例：

outputs_pt, _final_state, sequence_lengths_pt = tf.contrib.seq2seq.dynamic_decode(
    decoder=decoder_pt,
    output_time_major=False,
    maximum_iterations=self.max_sequence_length,
    swap_memory=True,
)
self.logits_pt = outputs_pt.rnn_output
self.g_predictions = tf.nn.softmax(self.logits_pt)

其中，final_outputs是一个二维的tuple = (rnn_outputs, sample_id)
rnn_output: [batch_size, sequence_length, vocab_size]，RNN的输出，用于计算tf.nn.softmax（rnn_output）
sample_id: [batch_size], tf.int32，保存最终的编码结果，可以表示最后的答案。

完整示例：

def _get_cell(_num_units):
    return tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(_num_units),
                                         input_keep_prob=self.keep_prob)

with tf.variable_scope("decoder"):
    self.decoder_cell = _get_cell(self.num_units)

    # inital_states
    self.c = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
    self.h = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
    self.initial_state = tf.contrib.rnn.LSTMStateTuple(c=self.c, h=self.h)

    ###################### pretain with targets ######################
    helper_pt = tf.contrib.seq2seq.TrainingHelper(
        inputs=self.emb_x,
        sequence_length=self.sequence_lengths,
        time_major=False,
    )
    decoder_pt = tf.contrib.seq2seq.BasicDecoder(
        cell=self.decoder_cell,
        helper=helper_pt,
        initial_state=self.initial_state,
        output_layer=self.output_layer
    )

    outputs_pt, _final_state, sequence_lengths_pt = tf.contrib.seq2seq.dynamic_decode(
        decoder=decoder_pt,
        output_time_major=False,
        maximum_iterations=self.max_sequence_length,
        swap_memory=True,
    )
    self.logits_pt = outputs_pt.rnn_output
    self.g_predictions = tf.nn.softmax(self.logits_pt)

wn87947

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
tf.contrib.seq2seq系列LSTMCell

1.创建LSTMCellLSTMCell = tf.contrib.rnn.BasicLSTMCell(num_units)tf.nn.rnn_cell.BasicLSTMCell(num_units)BasicLSTMCell是最简单的一个LSTM类，没有实现clipping，projection layer，peep-hole等一些LSTM的高级变种，仅作为一个基本的basi...
复制链接

扫一扫