论文笔记09 Google's Neural Machine Translation System:Bridging the Gap Between Human and ML

Google’s Neural Machine Translation System:Bridging the Gap Between Human and Machine Translation

NMT的缺点

(1)在训练和翻译推理方面都是计算昂贵的。---->论文中解决方法:低精度算法
(2)当输入句子包含罕见词时缺乏稳健性。---->论文中解决方法:将词分成输入和输出的一组有限的公共子词单元(“wordpieces”)。
(3)有时无法翻译源中的所有单词句子。

GNMT

模型结构

在这里插入图片描述
它有三个组件:编码器网络,解码器网络和注意网络。 编码器将源句子变换为矢量列表,每个输入符号一个矢量。 给定这个向量列表,解码器一次产生一个符号,直到产生特殊的句末符号(EOS)。编码器和解码器通过注意模块连接,该注意模块允许解码器在解码过程中聚焦在源句子的不同区域上。

编码器端:

x 1 , x 2 , . . . , x M = E n c o d e r R N N ( x 1 , x 2 , . . . , x M ) \mathrm{x}_1,\mathrm{x}_2,...,\mathrm{x}_M=EncoderRNN(x_1,x_2,...,x_M) x1,x2,...,xM=EncoderRNN(x1,x2,...,xM)
x \mathrm{x} x:向量

解码器端:RNN+softmax:解码器RNN网络产生一个隐藏状态,即下一个要预测的符号,然后通过softmax层产生候选输出符号的概率分布。

在推理期间,计算到目前为止给定源句编码和解码目标序列的下一个符号的概率:
P ( y i ∣ y 0 , y 1 , . . . , y i − 1 , x 1 , x 2 , . . . , x M ) P(y_i|y_0,y_1,...,y_{i-1},\mathrm{x}_1,\mathrm{x}_2,...,\mathrm{x}_M) P(yiy0,y1,...,yi1,x1,x2,...,xM)

则序列的条件概率为:
P ( Y ∣ X ) = P ( Y ∣ x 1 , x 2 , . . . , x M ) P(Y|X)=P(Y|\mathrm{x}_1,\mathrm{x}_2,...,\mathrm{x}_M) P(YX)=P(Yx1,x

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
The code you provided defines a named tuple `Hypothesis` with two fields, `value` and `score`. This is a convenient way to store and manipulate hypotheses in the context of sequence-to-sequence models. The `NMT` class is a PyTorch module that implements a simple neural machine translation model. It consists of a bidirectional LSTM encoder, a unidirectional LSTM decoder, and a global attention mechanism based on Luong et al. (2015). Here's a breakdown of the code: ```python from collections import namedtuple import torch import torch.nn as nn import torch.nn.functional as F Hypothesis = namedtuple('Hypothesis', ['value', 'score']) class NMT(nn.Module): def __init__(self, src_vocab_size, tgt_vocab_size, emb_size, hidden_size): super(NMT, self).__init__() self.src_embed = nn.Embedding(src_vocab_size, emb_size) self.tgt_embed = nn.Embedding(tgt_vocab_size, emb_size) self.encoder = nn.LSTM(emb_size, hidden_size, bidirectional=True) self.decoder = nn.LSTMCell(emb_size + hidden_size, hidden_size) self.attention = nn.Linear(hidden_size * 2, hidden_size) self.out = nn.Linear(hidden_size, tgt_vocab_size) self.hidden_size = hidden_size def forward(self, src, tgt): batch_size = src.size(0) src_len = src.size(1) tgt_len = tgt.size(1) # Encode the source sentence src_embedded = self.src_embed(src) encoder_outputs, (last_hidden, last_cell) = self.encoder(src_embedded) # Initialize the decoder states decoder_hidden = last_hidden.view(batch_size, self.hidden_size) decoder_cell = last_cell.view(batch_size, self.hidden_size) # Initialize the attention context vector context = torch.zeros(batch_size, self.hidden_size, device=src.device) # Initialize the output scores outputs = torch.zeros(batch_size, tgt_len, self.hidden_size, device=src.device) # Decode the target sentence for t in range(tgt_len): tgt_embedded = self.tgt_embed(tgt[:, t]) decoder_input = torch.cat([tgt_embedded, context], dim=1) decoder_hidden, decoder_cell = self.decoder(decoder_input, (decoder_hidden, decoder_cell)) attention_scores = self.attention(encoder_outputs) attention_weights = F.softmax(torch.bmm(attention_scores, decoder_hidden.unsqueeze(2)).squeeze(2), dim=1) context = torch.bmm(attention_weights.unsqueeze(1), encoder_outputs).squeeze(1) output = self.out(decoder_hidden) outputs[:, t] = output return outputs ``` The `__init__` method initializes the model parameters and layers. It takes four arguments: - `src_vocab_size`: the size of the source vocabulary - `tgt_vocab_size`: the size of the target vocabulary - `emb_size`: the size of the word embeddings - `hidden_size`: the size of the encoder and decoder hidden states The model has four main components: - `src_embed`: an embedding layer for the source sentence - `tgt_embed`: an embedding layer for the target sentence - `encoder`: a bidirectional LSTM encoder that encodes the source sentence - `decoder`: a unidirectional LSTM decoder that generates the target sentence The attention mechanism is implemented in the `forward` method. It takes two arguments: - `src`: the source sentence tensor of shape `(batch_size, src_len)` - `tgt`: the target sentence tensor of shape `(batch_size, tgt_len)` The method first encodes the source sentence using the bidirectional LSTM encoder. The encoder outputs and final hidden and cell states are stored in `encoder_outputs`, `last_hidden`, and `last_cell`, respectively. The decoder is initialized with the final hidden and cell states of the encoder. At each time step, the decoder takes as input the embedded target word and the context vector, which is a weighted sum of the encoder outputs based on the attention scores. The decoder output and hidden and cell states are updated using the LSTMCell module. The attention scores are calculated by applying a linear transform to the concatenated decoder hidden state and encoder outputs, followed by a softmax activation. The attention weights are used to compute the context vector as a weighted sum of the encoder outputs. Finally, the decoder hidden state is passed through a linear layer to produce the output scores for each target word in the sequence. The output scores are stored in the `outputs` tensor and returned by the method.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值