NMT十篇必读论文(九)Sequence to Sequence Learning with Neural Networks

清华大学NLP整理的神经机器翻译reading list中提到了十篇必读论文

https://github.com/THUNLP-MT/MT-Reading-List

 

又回到神经机器翻译,这篇论文present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure。使用了两个LSTM作为encoder和decoder(没用attention),生词用UNK表示,在WMT14的数据集上比传统的基于短语的统计机器翻译提高了近2个bleu值。

 

论文的大体思想和RNN encoder-decoder是一样的,只是用来LSTM来实现。

paper提到三个important point:

1)encoder和decoder的LSTM是两个不同的模型

2)deep LSTM表现比shallow好,选用了4层的LSTM

We found that the LSTM models are fairly easy to train. We used deep LSTMs with 4 layers,with 1000 cells at each layer and 1000 dimensional word embeddings, with an input vocabulary of 160,000 and an output vocabulary of 80,000. We found deep LSTMs to significantly outperform shallow LSTMs, where each additional layer reduced perplexity by nearly 10%, possibly due to their much larger hidden state. We used a naive softmax over 80,000 words at each output. The resulting LSTM has 380M parameters of which 64M are pure recurrent connections (32M for the “encoder” LSTM and 32M for the “decoder” LSTM).

38亿个参数。。。。。。8个GPU跑了10天

3)实践中发现将输入句子reverse后再进行训练效果更好,好了近5个BLEU值(the test BLEU scores of its decoded translations increased from 25.9 to 30.6.)

得出结论:

 

最后一段相当鼓舞人心:

Most importantly, we demonstrated that a simple, straightforward and a relatively unoptimized approach can outperform a mature SMT system, so further work will likely lead to even greater translation accuracies. These results suggest that our approach will likely do well on other challenging sequence to sequence problems.
 

 

参考博客:

https://www.cnblogs.com/duye/p/9433013.html

https://www.codercto.com/a/37657.html

https://blog.csdn.net/u013713117/article/details/54773467

https://blog.csdn.net/MebiuW/article/details/52832847

论文复现:

https://blog.csdn.net/as472780551/article/details/82289944

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是一个使用神经机器翻译(NMT)的英语到印地语的 seq2seq 模型代码示例: ```python from keras.models import Model from keras.layers import Input, LSTM, Dense, Embedding from keras.preprocessing.text import Tokenizer from keras.preprocessing.sequence import pad_sequences from keras.utils import to_categorical import numpy as np # 定义模型输入和输出序列的最大长度 max_encoder_seq_length = 50 max_decoder_seq_length = 50 # 定义输入序列的维度 num_encoder_tokens = ... num_decoder_tokens = ... # 定义LSTM层的维度 latent_dim = 256 # 定义编码器模型 encoder_inputs = Input(shape=(None,)) encoder_embedding = Embedding(num_encoder_tokens, latent_dim) encoder_lstm = LSTM(latent_dim, return_state=True) encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding(encoder_inputs)) encoder_states = [state_h, state_c] # 定义解码器模型 decoder_inputs = Input(shape=(None,)) decoder_embedding = Embedding(num_decoder_tokens, latent_dim) decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True) decoder_outputs, _, _ = decoder_lstm(decoder_embedding(decoder_inputs), initial_state=encoder_states) decoder_dense = Dense(num_decoder_tokens, activation='softmax') decoder_outputs = decoder_dense(decoder_outputs) # 定义整个模型 model = Model([encoder_inputs, decoder_inputs], decoder_outputs) # 编译模型 model.compile(optimizer='rmsprop', loss='categorical_crossentropy') # 训练模型 model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size=batch_size, epochs=epochs, validation_split=0.2) # 预测模型 encoder_model = Model(encoder_inputs, encoder_states) decoder_state_input_h = Input(shape=(latent_dim,)) decoder_state_input_c = Input(shape=(latent_dim,)) decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c] decoder_outputs, state_h, state_c = decoder_lstm(decoder_embedding(decoder_inputs), initial_state=decoder_states_inputs) decoder_states = [state_h, state_c] decoder_outputs = decoder_dense(decoder_outputs) decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states) # 定义预测函数 def decode_sequence(input_seq): states_value = encoder_model.predict(input_seq) target_seq = np.zeros((1, 1)) target_seq[0, 0] = target_token_index['\t'] stop_condition = False decoded_sentence = '' while not stop_condition: output_tokens, h, c = decoder_model.predict([target_seq] + states_value) sampled_token_index = np.argmax(output_tokens[0, -1, :]) sampled_char = reverse_target_char_index[sampled_token_index] decoded_sentence += sampled_char if (sampled_char == '\n' or len(decoded_sentence) > max_decoder_seq_length): stop_condition = True target_seq = np.zeros((1, 1)) target_seq[0, 0] = sampled_token_index states_value = [h, c] return decoded_sentence ``` 需要注意的是,NMT 的 seq2seq 模型相对于简单的 seq2seq 模型要复杂得多,需要更多的调整和优化才能在实际任务中获得好的性能。此外,还需要对数据进行预处理,比如分词、标记化等,这些内容在上述代码中并未包含。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值