一、概念
Bi由两个LSTM层组成,一个从前向后处理序列,一个从后向前处理序列,在处理序列时,每个时间步的输入会被分别传递给两个LSTM层,然后它们的输出会被合并。
二、结构图
三、代码
def build_bidi_basic_model(num_encoder_tokens, encoder_embedding_dim, num_decoder_tokens, decoder_embedding_dim, latent_dim):
'''
description: 创建编码器和解码器模型
param {*} num_encoder_tokens 编码字典大小
param {*} encoder_embedding_dim 编码词向量大小
param {*} num_decoder_tokens 解码字典大小
param {*} decoder_enbedding_dim 解码词向量大小
param {*} latent_dim 神经元的数量
initial_states: 解码器参数的初始状态
return {*}
'''
############################### Encoder ###################################
# encoder inputs
encoder_inputs = K.layers.Input(shape=(None,), name='encoder_inputs')
# encoder embedding
encoder_embedding = K.layers.Embedding(num_encoder_tokens, encoder_embedding_dim,name='encoder_embedding')(encoder_inputs)
# encoder lstm
bidi_encoder_lstm = K.layers.Bidirectional(K.layers.LSTM(latent_dim, return_state=True, return_sequences=True, dropout=0.2, recurrent_dropout=0.5, name="encoder_lstm"))
'''
description: bilstm返回5个参数,第一个参数是前向和后向输出的所有预测的ht并进行合并,而剩下四个参数,分别的最后一次,前向的ht和ct和后向的ht和ct
return {*}
'''
encoder_outputs, forward_h, forward_c, backward_h, backward_c = bidi_encoder_lstm(encoder_embedding)
state_h = K.layers.Concatenate()([forward_h,backward_h])
state_c = K.layers.Concatenate()([forward_c,backward_c])
encoder_states = [state_h, state_c]
attention = K.layers.Attention(name='attention')
############################### Decoder ###################################
# decoder inputs
decoder_inputs = K.layers.Input(shape=(None,), name='decoder_inputs')
# decoder embeddding
decoder_embedding = K.layers.Embedding(num_decoder_tokens, decoder_embedding_dim, name='decoder_embedding')(decoder_inputs)
# decoder lstm, number of units is 2*latent_dim
# [forward_h, backward_h]
decoder_lstm = K.layers.LSTM(latent_dim*2, return_state=True,return_sequences=True, dropout=0.2,recurrent_dropout=0.5, name='decoder_lstm')
# get outputs and decoder states
decoder_outputs, *_ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
query_value_attention = attention([decoder_outputs, encoder_outputs])
attention_outputs = K.layers.Concatenate(axis=-1)([query_value_attention, decoder_outputs])
# decoder dense
decoder_dense = K.layers.Dense(num_decoder_tokens, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(attention_outputs)
bidi_basic_model = K.models.Model([encoder_inputs,decoder_inputs], [decoder_outputs])
return bidi_basic_model