神经网络入门_神经机器翻译入门

最新推荐文章于 2024-01-02 01:21:01 发布

weixin_26750481

最新推荐文章于 2024-01-02 01:21:01 发布

阅读量1.3k

点赞数

文章标签：神经网络人工智能 python

原文链接：https://medium.com/analytics-vidhya/introduction-to-neural-machine-translation-nmt-c37b264eb9f1

版权

神经网络入门

We all are familiar with Google translator and have probably already used it. But have you ever wondered how come it translates almost any known language into the language of our choice?

我们都熟悉Google翻译，并且可能已经使用过。但是您是否想过它将几乎所有已知的语言翻译成我们选择的语言的方法？

So in this article, we are going to decode this mystery and learn how to build a language translator using Long Short Term Memory (LSTM). To learn more about LSTM, check out this link.

因此，在本文中，我们将解开这个奥秘，并学习如何使用长短期记忆(LSTM)构建语言翻译器。要了解有关LSTM的更多信息，请查看此链接。

I have broken this article into two parts. The first part consists of a brief explanation of NMT and the Encoder-Decoder structure. Following this, the second part of the article provides a step by step approach to create a language translator yourself using Python.

我将本文分为两部分。第一部分包括对NMT和Encoder-Decoder结构的简要说明。接下来，本文的第二部分提供了逐步使用Python自己创建语言翻译器的方法。

So let's get started and understand the core concepts involved.

因此，让我们开始并了解其中涉及的核心概念。

什么是机器翻译？ (What is Machine Translation?)

Machine translation is a subfield of computational linguistics that is focused on the task of automatically converting source text in one language to text in another language.

机器翻译是计算语言学的一个子领域，专注于自动将一种语言的源文本转换为另一种语言的文本的任务。

In machine translation, the input already consists of a series of symbols in some language, and the computer program must convert this into a series of symbols in a different language.

在机器翻译中，输入已经由某种语言的一系列符号组成，并且计算机程序必须将其转换为另一种语言的一系列符号。

Neural machine translation (NMT) is a proposition to machine translation that uses an artificial neural network to predict the probability of a sequence of words, typically modeling whole sentences in a single integrated model.

神经机器翻译 (NMT)是机器翻译的命题，它使用人工神经网络来预测单词序列的概率，通常在单个集成模型中对整个句子进行建模。

With the power of Neural networks, Neural Machine Translation (NMT) has emerged as the most powerful algorithm to perform this task. This state-of-the-art algorithm is an application of deep learning in which massive datasets of translated sentences are used to train a model capable of translating between any two languages.

凭借神经网络的强大功能，神经机器翻译(NMT)已成为执行此任务的最强大算法。这种最先进的算法是深度学习的一种应用，其中大量已翻译句子的数据集用于训练能够在任何两种语言之间进行翻译的模型。

了解序列到序列(Seq2Seq)架构 (Understand the Sequence to Sequence (Seq2Seq) Architecture)

As the name suggests, seq2seq takes as input a sequence of words(sentence or sentences) and generates an output sequence of words. It does so by the use of the recurrent neural network (RNN). The idea is to use 2 RNN that will work together with a special token and trying to predict the next state sequence from the previous sequence.

顾名思义，seq2seq将单词序列(句子或句子)作为输入，并生成单词的输出序列。它通过使用递归神经网络(RNN)来实现。这个想法是使用2 RNN，它将与一个特殊的令牌一起工作，并尝试根据前一个序列预测下一个状态序列。

It mainly has two components i.e encoder and decoder, and hence sometimes it is called the Encoder-Decoder Network.

它主要有两个部分，即编码器和解码器 ，因此有时被称为编码器-解码器网络 。

Encoder: It uses deep neural network layers and converts the input words to corresponding hidden vectors. Each vector represents the current word and the context of the word.

编码器：它使用深层神经网络层，并将输入的单词转换为相应的隐藏向量。每个向量代表当前单词和该单词的上下文。

Decoder: It is similar to the encoder. It takes as input the hidden vector generated by the encoder, its own hidden states, and current word to produce the next hidden vector and finally predict the next word.

解码器：类似于编码器。它以编码器生成的隐藏向量，其自身的隐藏状态和当前单词作为输入，以生成下一个隐藏向量并最终预测下一个单词。

The ultimate goal of any NMT model is to take input a sentence in one language and return that sentence translated into a different language as output.

任何NMT模型的最终目标都是以一种语言输入一个句子，然后将该句子翻译成另一种语言作为输出。

The figure below is a naive representation of a translation algorithm trained for translating from Chinese to English.

下图是经过培训的从中文到英文翻译的翻译算法的简单表示。

这个怎么运作？ (How it works?)

The first step is to somehow convert our textual data into a numeric form. To do this in machine translation, each word is transformed into a One Hot Encoding vector which can then be inputted into the model. A-One Hot Encoding vector is simply a vector with a 0 at every index except for a 1 at a single index corresponding to that particular word.

第一步是以某种方式将文本数据转换为数字形式。为此，需要在机器翻译中将每个单词转换为一个“热编码”向量，然后可以将其输入到模型中。 “一个热编码”向量只是在每个索引处具有0的向量，除了在与该特定单词相对应的单个索引处具有1之外。

These vectors are created by assigning an index to each unique word in the input language, and then repeat this process for the output language. In assigning a unique index to each unique word, we will be creating what is referred to as a Vocabulary for each language. Ideally, the Vocabulary for each language would simply contain every unique word in that language.

通过为输入语言中的每个唯一单词分配索引来创建这些向量，然后对输出语言重复此过程。在为每个唯一的单词分配唯一的索引时，我们将为每种语言创建所谓的词汇表 。理想情况下，每种语言的词汇表将仅包含该语言中的每个唯一单词。

By creating a vocabulary for both the input and output languages, we can perform this technique on every sentence in each language to completely transform any corpus of translated sentences into a format suitable for the task of machine translation.

通过为输入和输出语言创建词汇表，我们可以对每种语言中的每个句子执行此技术，以将任何已翻译句子的主体完全转换为适合机器翻译任务的格式。

Let us now look at the magic behind this Encoder-Decoder algorithm. At the most basic level, the Encoder portion of the model takes a sentence in the input language and creates a thought vector from this sentence. This thought vector stores the meaning of the sentence and is subsequently passed to a Decoder which outputs the translation of the sentence in the output language.

现在让我们看一下这种Encoder-Decoder算法背后的魔力。在最基本的层次上，模型的编码器部分采用输入语言的句子，并从该句子中创建思想向量 。该思想向量存储句子的含义，然后传递给解码器，该解码器以输出语言输出句子的翻译。

In the case of the Encoder, each word in the input sentence is fed separately into the model in a number of consecutive time-steps. At each time-step, t, the model updates a hidden vector, h, using information from the word inputted to the model at that time-step. This hidden vector works to store information about the inputted sentence. In this way, since no words have yet been inputted to the Encoder at time-step t=0, the hidden state in the Encoder starts out as an empty vector at this time-step. The hidden state is represented with the blue box in the below figure, where the subscript t=0 indicates the time-step and the superscript E corresponds to the fact that it’s a hidden state of the Encoder (rather than a D for the Decoder).

对于编码器，输入句子中的每个单词会以多个连续的时间步长分别输入模型。在每个时间步长t ，模型都会更新隐藏的向量h 。使用当时输入到模型中的单词信息。该隐藏的向量用于存储有关输入句子的信息。以此方式，由于在时间步t ＝ 0时还没有词输入到编码器，因此在该时间步编码器中的隐藏状态作为空向量开始。下图中的蓝色框表示隐藏状态，其中下标t = 0表示时间步长，而上标E对应于以下事实：它是编码器的隐藏状态(而不是解码器的D) 。

At each time-step, this hidden vector takes in information from the inputted word at that time-step, while preserving the information it has already stored from previous time-steps. Thus, at the final time-step, the meaning of the whole input sentence is stored in the hidden vector. This hidden vector at the final time-step is the thought vector referred to above, which is then inputted into the Decoder.

在每个时间步，此隐藏向量都会从该时间步的输入单词中获取信息，同时保留从先前时间步已存储的信息。因此，在最后的时间步中，整个输入语句的含义都存储在隐藏的向量中。在最后的时间步长上的这个隐藏的向量是上面提到的思想向量 ，然后将其输入到解码器中。

Also, notice how the final hidden state of the Encoder becomes the thought vector and is relabeled with superscript D at t=0. This is because this final hidden vector of the Encoder becomes the initial hidden vector of the Decoder. In this way, we are passing the encoded meaning of the sentence to the Decoder to be translated to a sentence in the output language. However, unlike the Encoder, we need the Decoder to output a translated sentence of variable length. Thus, we are going to have our Decoder output a prediction word at each time-step until we have outputted a complete sentence.

另外，请注意编码器的最终隐藏状态如何变为思想向量并在t = 0处用上标D重新标记。这是因为编码器的此最终隐藏向量变为解码器的初始隐藏向量。这样，我们将句子的编码含义传递给解码器，以将其翻译为输出语言中的句子。但是，与编码器不同，我们需要解码器输出可变长度的翻译语句。因此，我们将使我们的解码器在每个时间步长输出一个预测字，直到输出完整的句子为止。

In order to start this translation, we are going to input a <SOS> tag as the input at the first time-step in the Decoder. Just as in the Encoder, the Decoder will use the <SOS> input at time-step t=1 to update its hidden state. However, rather than just proceeding to the next time-step, the Decoder will use an additional weight matrix to create a probability over all of the words in the output vocabulary. In this way, the word with the highest probability in the output vocabulary will become the first word in the predicted output sentence.

为了开始翻译，我们将在解码器的第一时间输入<SOS>标签作为输入。就像在编码器中一样，解码器将在时间步t = 1处使用<SOS>输入来更新其隐藏状态。但是，解码器不仅会继续进行下一个时间步，还将使用附加的权重矩阵为输出词汇表中的所有单词创建概率。这样，在输出词汇中具有最高概率的单词将成为预测的输出句子中的第一个单词。

The Decoder has to output prediction sentences of variable lengths, the Decoder will continue predicting words in this fashion until it predicts the next word in the sentence to be a <EOS> tag. Once this tag has been predicted, the decoding process is complete and we are left with a complete predicted translation of the input sentence.

解码器必须输出可变长度的预测语句，解码器将以此方式继续预测单词，直到它预测句子中的下一个单词是<EOS>标签。一旦预测了该标签，解码过程就完成了，剩下的就是输入句子的完整预测翻译。

使用Keras的NMT的Python实现 (Python Implementation of NMT using Keras)

Now that we understood the encoder-decoder architecture, let's create a model that will translate English sentences into their French-language counterparts using Keras and python.

现在，我们了解了编码器-解码器体系结构，让我们创建一个模型，该模型将使用Keras和python将英语句子翻译为对应的法语。

As a first step, we will import the required libraries and will configure values for different parameters that we will be using in the code. Let’s first import the required libraries:

第一步，我们将导入所需的库，并将为代码中要使用的不同参数配置值。首先导入所需的库：

数据集 (The Dataset)

We need a dataset that contains English sentences and their French translations which can be freely downloaded from this link. Download the file fra-eng.zip and extract it. On each line, the text file contains an English sentence and its French translation, separated by a tab.

我们需要一个包含英语句子及其法语翻译的数据集，可以从此链接免费下载。下载文件fra-eng.zip并将其fra-eng.zip压缩。在每一行上，文本文件包含一个英语句子及其法语翻译，并用制表符分隔。

Let’s go ahead and split each line into input text and target text.

让我们继续并将每一行分为输入文本和目标文本。

Output:Number of sample input: 20000
Number of sample output: 20000
Number of sample output input: 20000

In the script above we created three lists input_sentences[], output_sentences[], and output_sentences_inputs[]. Next, in the for loop the fra.txt file is read one line at a time. Each line is split into two substrings at the position where the tab occurs. The left substring (the English sentence) is inserted into the input_sentences[] list. The substring to the right of the tab is the corresponding translated French sentence.

在上面的脚本中，我们创建了三个列表input_sentences[] ， output_sentences[]和output_sentences_inputs[] 。接下来，在for循环中， fra.txt读取fra.txt文件。每行在制表符出现的位置分为两个子字符串。左子字符串(英语句子)插入到input_sentences[]列表中。选项卡右边的子字符串是相应的法语翻译句子。

Here, the <eos> token, which denotes the end-of-sentence is prefixed to the translated sentence. Similarly, the <sos> token, which denotes for "start of the sentence", is concatenated at the start of the translated sentence.

在此，表示句子结尾的<eos>标记被添加到翻译句子的前面。类似地，将<sos>标记(表示“句子的开头”)连接在翻译句子的开头。

Let us also print a random sentence from the lists:

让我们还从列表中打印一个随机句子：

print("English sentence: ",input_sentences[180])
print("French translation: ",output_sentences[180])Output:English sentence:  Join us. 
French translation:  Joignez-vous à nous. <eos>

标记化和填充 (Tokenization and Padding)

The next step is tokenizing the original and translated sentences and applying padding to the sentences that are longer or shorter than a certain length, which in case of inputs will be the length of the longest input sentence. And for the output, this will be the length of the longest sentence in the output.

下一步是标记原始句子和翻译后的句子，并将填充或填充到比特定长度长或短的句子，在输入的情况下，这将是最长输入句子的长度。对于输出，这将是输出中最长句子的长度。

But before we do that, let’s visualize the length of the sentences. We will capture the lengths of all the sentences in two separate lists for English and French, respectively.

但是在我们这样做之前，让我们先看一下句子的长度。我们将分别在两个单独的英语和法语列表中捕获所有句子的长度。

The histogram above shows the maximum length of the French sentences is 12 and that of the English sentence is 6.

上面的直方图显示法语句子的最大长度为12，英语句子的最大长度为6。

Next, vectorize our text data by using Keras’s Tokenizer() class. It will turn our sentences into sequences of integers. We can then pad those sequences with zeros to make all the sequences of the same length.

接下来，使用Keras的Tokenizer()类对文本数据进行矢量化处理。它将把我们的句子变成整数序列。然后，我们可以用零填充这些序列，以使所有序列具有相同的长度。

The word_index attributes of the Tokenizer class return a word-to-index dictionary where the keys represent words and the values represent corresponding integers. Finally, the above script prints the number of unique words in the dictionary and the length of the longest sentence in the input English language.

Tokenizer类的word_index属性返回一个单词索引字典，其中的键表示单词，值表示相应的整数。最后，以上脚本在输入的英语中打印词典中唯一词的数量和最长句子的长度。

Output:Total unique words in the input: 3501 
Length of longest sentence in input: 6

Likewise, the output sentences can also be tokenized in the same way:

同样，输出语句也可以用相同的方式标记：

Output: Total unique words in the output: 9511 
Length of longest sentence in the output: 12

Now the lengths of the longest sentences in both the language can be verified from the histogram above. It can also be concluded that English sentences are normally shorter and contain a smaller number of words on average, compared to the translated French sentences.

现在，可以从上面的直方图中验证两种语言中最长句子的长度。还可以得出结论，与翻译后的法语句子相比，英语句子通常较短，平均包含较少的单词。

Next, we need to pad the input. The reason behind padding the input and the output is that text sentences can be of varying length, however LSTM expects input instances with the same length. Therefore, we need to convert our sentences into fixed-length vectors. One way to do this is via padding.

接下来，我们需要填充input 。对输入和输出进行填充的原因是文本句子的长度可以变化，但是LSTM期望输入实例具有相同的长度。因此，我们需要将句子转换为固定长度的向量。一种方法是通过填充。

encoder_input_sequences.shape: (20000, 6)
decoder_input_sequences.shape: (20000, 12)
decoder_output_sequences.shape: (20000, 12)

Since there are 20,000 sentences in the input(English) and each input sentence is of length 6, the shape of the input is now (20000, 6). Similarly, there are 20,000 sentences in the output(French) and each output sentence is of length 12, the shape of the input is now (20000, 12) and the same goes for translated language.

由于输入中有20,000个句子(英语)，并且每个输入句子的长度为6，所以输入的形状现在为(20000，6)。同样，输出(French)中有20,000个句子，每个输出句子的长度为12，输入的形状现在为(20000，12)，翻译语言也是如此。

You may recall that the original sentence at index 180 is join us. The tokenizer divided the sentence into two words join and us, converted them to integers, and then applied pre-padding by adding four zeros at the start of the corresponding integer sequence for the sentence at index 180 of the input list.

您可能还记得索引180的原始句子是join us 。分词器将句子分为两个单词join和us ，将它们转换为整数，然后通过在输入列表的索引180处的句子的相应整数序列的开头添加四个零来应用预填充。

print("encoder_input_sequences[180]:", encoder_input_sequences[180])Output: 
encoder_input_sequences[180]: [  0   0   0   0 464  59]

To verify that the integer values for join and us are 464 and 59 respectively, you can pass the words to the word2index_inputs dictionary, as shown below:

要验证join和us的整数值分别为464和59，可以将单词传递到word2index_inputs字典，如下所示：

print(word2idx_inputs["join"])
print(word2idx_inputs["us"])Output:
464 
59

It is further important to mention that in the case of the decoder, the post-padding is applied, which means that zeros are appended at the end of the sentence. In the encoder, zeros were padded at the beginning. The reason behind this approach is that encoder output is based on the words occurring at the end of the sentence, therefore the original words were kept at the end of the sentence, and zeros were padded at the beginning. On the other hand, in the case of the decoder, the processing starts from the beginning of a sentence, and therefore post-padding is performed on the decoder inputs and outputs.

进一步重要的是要提到，在解码器的情况下，将应用后填充，这意味着在句子的末尾添加了零。在编码器中， 开始时填充零。这种方法背后的原因是，编码器输出基于句子末尾出现的单词，因此原始单词保留在句子末尾，并且在开头填充零。另一方面，在解码器的情况下，处理从句子的开头开始，因此对解码器的输入和输出执行后填充。

词嵌入 (Word Embeddings)

We always have to convert our words into their corresponding numeric vector representations before feeding it to any deep learning model and we already converted our words into numeric. So what’s the difference between integer/numeric representation and word embeddings?

在将其输入任何深度学习模型之前，我们总是必须将单词转换为相应的数字矢量表示形式，并且我们已经将单词转换为数字。那么整数/数字表示和词嵌入之间有什么区别？

There are two main differences between single integer representation and word embeddings. With integer representation, a word is represented only with a single integer. With vector representation, a word is represented by a vector of 50, 100, 200, or whatever dimensions you like. Hence, word embeddings capture a lot more information about words. Secondly, the single-integer representation doesn’t capture the relationships between different words. On the contrary, word embeddings retain relationships between the words.

单个整数表示和词嵌入之间有两个主要区别。使用整数表示时，单词仅用单个整数表示。使用矢量表示时，单词由50、100、200或任意尺寸的矢量表示。因此，单词嵌入可以捕获有关单词的更多信息。其次，单整数表示不能捕获不同单词之间的关系。相反，词嵌入保留了词之间的关系。

For English sentences, i.e. the inputs, we will use the GloVe word embeddings. For the translated French sentences in the output, we will use custom word embedding. You can download GloVe word embedding from here.

对于英文句子(即输入)，我们将使用GloVe词嵌入。对于输出中翻译的法语句子，我们将使用自定义单词嵌入。您可以从此处下载GloVe词嵌入。

Let’s create word embeddings for the inputs first. To do so, we need to load the GloVe word vectors into memory. We will then create a dictionary where words are the keys and the corresponding vectors are values,

让我们首先为输入创建单词嵌入。为此，我们需要将GloVe单词向量加载到内存中。然后，我们将创建一个字典，其中单词是键，而相应的向量是值，

Recall that we have 3501 unique words in the input. We will create a matrix where the row number will represent the integer value for the word and the columns will correspond to the dimensions of the word. This matrix will contain the word embeddings for the words in our input sentences.

回想一下，我们在输入中包含3501个唯一词。我们将创建一个矩阵，其中行号代表单词的整数值，列对应于单词的尺寸。该矩阵将包含输入句子中单词的单词嵌入。

创建模型 (Creating the Model)

The first step is to create an Embedding layer for our neural network.

第一步是为我们的神经网络创建一个嵌入层。

The Embedding layer is defined as the first hidden layer of a network. It must specify 3 arguments:

嵌入层被定义为网络的第一隐藏层。它必须指定3个参数：

input_dim: This is the size of the vocabulary in the text data. For example, if your data is integer encoded to values between 0–10, then the size of the vocabulary would be 11 words.
input_dim ：这是文本数据中词汇的大小。例如，如果您的数据是整数编码为0-10之间的值，则词汇表的大小将为11个单词。
output_dim: This is the size of the vector space in which words will be embedded. It defines the size of the output vectors from this layer for each word. For example, it could be 32 or 100 or even larger. Test different values for your problem.
output_dim ：这是将在其中嵌入单词的向量空间的大小。它为每个单词定义了该层输出矢量的大小。例如，它可以是32或100甚至更大。为您的问题测试不同的值。
input_length: This is the length of input sequences, as you would define for any input layer of a Keras model. For example, if all of your input documents are comprised of 1000 words, this would be 1000.
input_length ：这是输入序列的长度，就像您为Keras模型的任何输入层定义的那样。例如，如果您所有的输入文档都由1000个单词组成，则为1000。

The next thing we need to do is to define our outputs, as we know that the output will be a sequence of words. Recall that the total number of unique words in the output is 9511. Therefore, each word in the output can be any of the 9511 words. The length of an output sentence is 12. And for each input sentence, we need a corresponding output sentence. Therefore, the final shape of the output will be:

我们需要做的下一件事是定义输出，因为我们知道输出将是一个单词序列。回想一下，输出中唯一词的总数为9511 。因此，输出中的每个单词可以是9511单词中的任何一个。输出句子的长度为12 。对于每个输入句子，我们需要一个对应的输出句子。因此，输出的最终形状将是：

(number of inputs, length of the output sentence, the number of words in the output)

(输入数量，输出句子的长度，输出中的单词数)

#shape of the output
decoder_targets_one_hot = np.zeros((len(input_sentences), max_out_len, num_words_output), 
dtype='float32'
)decoder_targets_one_hot.shapeShape: (20000, 12, 9512)

To make predictions, the final layer of the model will be a dense layer, therefore we need the outputs in the form of one-hot encoded vectors since we will be using softmax activation function at the dense layer. To create such one-hot encoded output, the next step is to assign 1 to the column number that corresponds to the integer representation of the word.

为了进行预测，模型的最后一层将是一个密集层，因此我们需要采用一热编码矢量形式的输出，因为我们将在密集层使用softmax激活函数。要创建这样的单编码输出，下一步是将1分配给与该单词的整数表示形式对应的列号。

for i, d in enumerate(decoder_output_sequences):
    for t, word in enumerate(d):
        decoder_targets_one_hot[i, t, word] = 1

The next step is to define the encoder and decoder network.

下一步是定义编码器和解码器网络。

The input to the encoder will be the sentence in English and the output will be the hidden state and cell state of the LSTM.

编码器的输入将是英语中的句子，输出将是LSTM的隐藏状态和单元状态。

The next step is to define the decoder. The decoder will have two inputs: the hidden state and cell state from the encoder and the input sentence, which actually will be the output sentence with a token appended at the beginning.

下一步是定义解码器。解码器将有两个输入：编码器和输入语句的隐藏状态和单元状态，它们实际上将是输出语句，并在开头添加了令牌。

训练模型 (Training the Model)

Let’s compile the model defining the optimizer and our cross-entropy loss.

让我们编译定义优化器和我们的交叉熵损失的模型。

No surprises here. lstm_2, our encoder, takes as input from the embedding layer, while the decoder, lstm_3uses the encoder's internal states as well as the embedding layer. Our model has around 6,500,000 parameters in total!

这里没有惊喜。 lstm_2 (我们的编码器)从嵌入层获取输入，而解码器lstm_3使用编码器的内部状态以及嵌入层。我们的模型总共有大约6,500,000个参数！

Its time yo train our model, I would recommend specifying EarlyStopping() parameter for avoiding computational resources wastage and overfitting.

在训练我们的模型时，建议您指定EarlyStopping()参数，以避免计算资源的浪费和过度拟合。

Save the model weights.

保存模型权重。

model.save('seq2seq_eng-fra.h5')

Plot the Accuracy curve for train and test data.

绘制火车和测试数据的精度曲线。

#Accuracy
plt.title('model accuracy')
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

As we can see, our model achieved train accuracy of around 87% and test accuracy of around 77% which shows that the model is overfitting. We are only training on 20,0000 records, so you can add more records and also add a dropout layer to reduce overfitting.

如我们所见，我们的模型实现了约87％的训练精度和约77％的测试精度，这表明该模型过度拟合。我们仅对20,0000条记录进行培训，因此您可以添加更多记录，并添加辍学层以减少过度拟合。

测试机器翻译模型 (Testing the Machine Translation model)

Let us load the model weights and test our model.

让我们加载模型权重并测试我们的模型。

encoder_model = Model(encoder_inputs, encoder_states)model.compile(optimizer='rmsprop', loss='categorical_crossentropy')model.load_weights('seq2seq_eng-fra.h5')

Ok, with the weights in place it’s time to test our machine translation model by translating a few test sentences.

好的，权重就位了，是时候通过翻译一些测试语句来测试我们的机器翻译模型了。

The inference mode works a bit differently than the training procedure. The procedure can be broken down into 4 steps:

推理模式的工作原理与训练过程有所不同。该过程可分为4个步骤：

1. Encode the input sequence, return its internal states.

1.对输入序列进行编码，返回其内部状态。

2. Run the decoder using just the start-of-sequence character as input and the encoder internal states as the decoder’s initial states.

2.仅使用序列开始字符作为输入，并使用编码器内部状态作为解码器的初始状态来运行解码器。

3. Append the character predicted (after lookup of the token) by the decoder to the decoded sequence.

3.将解码器预测的字符(在查找令牌之后)追加到解码序列中。

4. Repeat the process with the previously predicted character token as input and updates internal states.

4.使用先前预测的字符标记作为输入重复该过程，并更新内部状态。

Let’s go ahead and implement this. Since we only need the encoder for encoding the input sequence we’ll split the encoder and decoder into two separate models.

让我们继续实施。由于我们只需要编码器对输入序列进行编码，我们将编码器和解码器分为两个单独的模型。

we want our output to be a sequence of words in the French language. To do so, we need to convert the integers back to words. We will create new dictionaries for both inputs and outputs where the keys will be the integers and the corresponding values will be the words.

我们希望我们的输出是法语中的单词序列。为此，我们需要将整数转换回单词。我们将为输入和输出创建新的字典，其中键将是整数，而相应的值将是单词。

The method will accept an input-padded sequence English sentence (in the integer form) and will return the translated French sentence.

该方法将接受带有输入填充序列的英语句子(以整数形式)，并将返回翻译后的法语句子。

预测 (Predictions)

To test the performance we will randomly choose a sentence from the input_sentences list, retrieve the corresponding padded sequence for the sentence, and will pass it to the translate_sentence() method. The method will return the translated sentence.

为了测试性能，我们将从input_sentences列表中随机选择一个句子，检索该句子的相应填充序列，并将其传递给translate_sentence()方法。该方法将返回翻译后的句子。

Results:

结果：

Splendid, isn’t it? Our NMT model has successfully translated so many sentences into French. You can verify that on Google Translate too.

很棒，不是吗？我们的NMT模型已经成功地将很多句子翻译成法语。您也可以在Google翻译上进行验证。

Of course, not all sentences can be translated correctly. In order to increase the accuracy, even more, you can look for the Attention mechanism and embed them in the encoder-decoder structure.

当然，并非所有句子都可以正确翻译。为了提高准确性，您甚至可以寻找Attention机制 并将它们嵌入到编码器-解码器结构中。

You can download datasets of different languages like German, Hindi, Spanish, Russian, Italian, etc from manythings.org and build NMT models for language translation.

您可以从manythings.org下载德语，北印度语，西班牙语，俄语，意大利语等不同语言的数据集，并构建用于语言翻译的NMT模型。

You can find the code in my GitHub repository.

您可以在我的GitHub存储库中找到代码。

结论 (Conclusion)

Neural machine translation(NMT) is a fairly advanced application of natural language processing and involves very complex architecture.

神经机器翻译(NMT)是自然语言处理的相当高级的应用程序，涉及非常复杂的体系结构。

This article explains we saw the capabilities of encoder-decoder models combined with LSTM layers for sequence-to-sequence learning. The encoder is an LSTM that encodes input sentences while the decoder decodes the inputs and generates corresponding outputs.

本文解释了我们看到了将编码器-解码器模型与LSTM层相结合的功能，以进行序列到序列的学习。编码器是一种LSTM，用于对输入语句进行编码，而解码器则对输入进行解码并生成相应的输出。

Well, that’s all for this article hope you guys have enjoyed reading it and I’ll be glad if the article is of any help. Feel free to share your comments/thoughts/feedback in the comment section.

好吧，这就是本文的全部内容，希望你们喜欢阅读，如果本文对您有所帮助，我将感到高兴。随时在评论部分分享您的评论/想法/反馈。

谢谢阅读！！！ (Thanks for reading!!!)

翻译自: https://medium.com/analytics-vidhya/introduction-to-neural-machine-translation-nmt-c37b264eb9f1

神经网络入门

weixin_26750481

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
神经网络入门_神经机器翻译入门

神经网络入门We all are familiar with Google translator and have probably already used it. But have you ever wondered how come it translates almost any known language into the language of our choice? 我们都熟悉Go...
复制链接

扫一扫