tf.contrib.seq2seq系列LSTMCell

1.创建LSTMCell

LSTMCell =

tf.contrib.rnn.BasicLSTMCell(num_units)
tf.nn.rnn_cell.BasicLSTMCell(num_units)

BasicLSTMCell是最简单的一个LSTM类,没有实现clipping,projection layer,peep-hole等一些LSTM的高级变种,仅作为一个基本的basicline结构存在,如果要使用这些高级变种,需用tf.nn.rnn_cell.LSTMCell 这个类。
常用参数:

__init__(
    num_units,#神经元个数
    forget_bias=1.0,#偏置,加入到遗忘中,类似于遗忘阈值
    state_is_tuple=True,#返回 2-tuples of the c_state and m_state,False:返回的是拼接c,m。以后被删除False
    activation=None,#默认是tanh
    reuse=None,
    name=None
)

2.初始化

2.1 zero_state

这里还有一个状态初始化函数,就是zero_state(batch_size,dtype)两个参数。batch_size就是输入样本批次的数目,dtype就是数据类型。

zero_state(
    batch_size,
    dtype
)
init_state = LSTMCell.zero_state(batch_size, dtype=tf.float32)
output, final_state = tf.nn.dynamic_rnn(cell, input, initial_state=init_state, time_major=True) 

2.2 LSTMStateTuple

LSTMStateTuple

tf.contrib.rnn.LSTMStateTuple
tf.nn.rnn_cell.LSTMStateTuple

Stores two elements: (c, h), Where c is the hidden state and h is the output.
用于保存c和h

self.c = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
self.h = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
self.initial_state = tf.contrib.rnn.LSTMStateTuple(c=self.c, h=self.h)

3.dropout

tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(_num_units), input_keep_prob=keep_prob)
tf.nn.rnn_cell.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(_num_units), input_keep_prob=keep_prob)

DropoutWrapper创建了cell之后添加dropout属性,防止过拟合。

__init__(
    cell,
    input_keep_prob=1.0,#输入的dropout几率 
    output_keep_prob=1.0,#输出的dropout几率 
    state_keep_prob=1.0,#state的dropout几率
    variational_recurrent=False,#若为真,则说明所有时间步上应用相同的dropout,并且需要设置input_size参数。
    input_size=None,
    dtype=None,
    seed=None
)
。

说从t-1时刻的状态传递到t时刻进行计算时,这个中间不进行memory的dropout;如下图所示,Dropout仅应用于虚线方向的输入,即仅针对于上一层的输出做Dropout。
dropout方向
综合上面两步骤:创建的LSTMcell为

self.decoder_cell = tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(num_units),input_keep_prob=self.keep_prob)

4.seq2seq

4.1 tf.contrib.seq2seq.TrainingHelper

Decoder的一个类,只能在训练时使用,作用是读取输入。

__init__(
    inputs,#输入x的embeded, shape = [batch_size, sequence_length, embedding_size] 
    sequence_length,#序列长度
    time_major=False,#如果是True,那么input的 shape =[sequence_length, batch_size, embedding_size] 
    name=None
)

示例:

helper_pt =
tf.contrib.seq2seq.TrainingHelper(
                inputs=self.emb_x,
                sequence_length=self.sequence_lengths,
                time_major=False
                 )

4.2 tf.contrib.seq2seq.BasicDecoder

创建一个基础版的解码器

__init__(
    cell,#创建的LSTMCell
    helper,#创建的helper_pt
    initial_state,#初始状态self.initial_state
    output_layer=None#解码到全连接层,然后经过softmax
)

示例:
创建Dense层

 from tensorflow.python.layers import core as layers_core
 self.output_layer = layers_core.Dense(self.num_emb, use_bias=False)
decoder_pt = tf.contrib.seq2seq.BasicDecoder(
    cell=self.decoder_cell,
    helper=helper_pt,
    initial_state=self.initial_state,#或者init_state
    output_layer=self.output_layer
)

4.3 tf.contrib.seq2seq.dynamic_decode

构造一个动态的decoder,即根据传入的decoder实例动态解码,其内部通过Decoder对象的一次 initialize()和重复step()操作,其核心是control_flow_ops.while_loop循环
函数返回值:(final_outputs, final_state, final_sequence_lengths)

tf.contrib.seq2seq.dynamic_decode(
    decoder,#一个Decoder实例,即decoder_pt
    output_time_major=False,
    impute_finished=False,
    maximum_iterations=None,
    parallel_iterations=32,
    swap_memory=False,
    scope=None
)

这里在解释一下time_major,在TrainingHelper中和此处,
batch major是指输入中batch_size是第一位元素,即[batch_size, sequence_length, embedding_size]
time major 是指输入中time_step是第一位元素,即[sequence_length, batch_size, embedding_size]
文档中解释: batch major tensors (this adds extra time to the computation)第二种方式计算速度更快

示例:

outputs_pt, _final_state, sequence_lengths_pt = tf.contrib.seq2seq.dynamic_decode(
    decoder=decoder_pt,
    output_time_major=False,
    maximum_iterations=self.max_sequence_length,
    swap_memory=True,
)
self.logits_pt = outputs_pt.rnn_output
self.g_predictions = tf.nn.softmax(self.logits_pt)

其中,final_outputs是一个二维的tuple = (rnn_outputs, sample_id)
rnn_output: [batch_size, sequence_length, vocab_size],RNN的输出,用于计算tf.nn.softmax(rnn_output)
sample_id: [batch_size], tf.int32,保存最终的编码结果,可以表示最后的答案。

完整示例:

def _get_cell(_num_units):
    return tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(_num_units),
                                         input_keep_prob=self.keep_prob)

with tf.variable_scope("decoder"):
    self.decoder_cell = _get_cell(self.num_units)

    # inital_states
    self.c = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
    self.h = tf.random_normal([self.batch_size, self.num_units], mean=0, stddev=4)
    self.initial_state = tf.contrib.rnn.LSTMStateTuple(c=self.c, h=self.h)

    ###################### pretain with targets ######################
    helper_pt = tf.contrib.seq2seq.TrainingHelper(
        inputs=self.emb_x,
        sequence_length=self.sequence_lengths,
        time_major=False,
    )
    decoder_pt = tf.contrib.seq2seq.BasicDecoder(
        cell=self.decoder_cell,
        helper=helper_pt,
        initial_state=self.initial_state,
        output_layer=self.output_layer
    )

    outputs_pt, _final_state, sequence_lengths_pt = tf.contrib.seq2seq.dynamic_decode(
        decoder=decoder_pt,
        output_time_major=False,
        maximum_iterations=self.max_sequence_length,
        swap_memory=True,
    )
    self.logits_pt = outputs_pt.rnn_output
    self.g_predictions = tf.nn.softmax(self.logits_pt)
  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值