2021-2-6 吴恩达-C5 序列模型-w3 序列模型和注意力机制(课后编程1-Neural Machine Translation 神经机器翻译)

最新推荐文章于 2024-11-01 13:10:19 发布

没人不认识我

最新推荐文章于 2024-11-01 13:10:19 发布

阅读量444

点赞数

分类专栏：深度学习 python IT

本文链接：https://blog.csdn.net/weixin_42555985/article/details/113715934

版权

神经机器翻译注意力机制日期转换 LSTM 双向LSTM

关键词由CSDN通过智能技术生成

原文链接
如果打不开，也可以复制链接到https://nbviewer.jupyter.org中打开。

序列模型和注意力机制 Neural Machine Translation 神经机器翻译

1.将人类可读日期翻译成机器可读日期
- 1.1数据集
2.带注意力的神经机器翻译
- 2.1注意机制
3.可视化注意力 (选学)
- 3.1从网络获取激活
4.全代码

欢迎来到本周的第一个编程作业！
你将构建一个神经机器翻译 (NMT) 模型，将人类可读日期 (“25th of June, 2009”) 翻译为机器可读日期 (“2009-06-25”)。你将使用注意力模型执行此操作，它是序列模型中最复杂的序列之一。

这个作业是与NVIDIA的深度学习研究所共同制作的。

让我们加载你完成此作业所需的所有包。

from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply
from keras.layers import RepeatVector, Dense, Activation, Lambda
from keras.optimizers import Adam
from keras.utils import to_categorical
from keras.models import load_model, Model
import keras.backend as K
import numpy as np

from faker import Faker
import random
from tqdm import tqdm
from babel.dates import format_date
from nmt_utils import *
import matplotlib.pyplot as plt
#%matplotlib inline

1.将人类可读日期翻译成机器可读日期

你将构建的模型可用于从一种语言翻译到另一种语言, 例如从英语翻译成印地语。然而，语言翻译需要大量的数据集，通常需要使用GPU训练数天。为了让你在不使用大量数据集的情况下也能体验这些模型，我们将使用更简单的“日期转换”任务。

网络将输入以各种可能格式编写的日期 (例如：“the 29th of August 1958”, “03/30/1968”, “24 JUNE 1987”) 将它们转换为标准化的机器可读日期 (例如：“1958-08-29”, “1968-03-30”, “1987-06-24”)。我们将让网络学会以通用的机器可读格式输出日期YYYY-MM-DD。

1.1数据集

我们将在10000个人类可读日期及并与之对应的，标准化、机器可读日期的数据集上训练模型。让我们运行以下代码来加载数据集并打印一些样例。

m = 10000
dataset, human_vocab, machine_vocab, inv_machine_vocab = load_dataset(m)

结果

100%|█████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 17749.83it/s]

打印一下

print(dataset[:10])

结果

[('9 may 1998', '1998-05-09'), ('10.11.19', '2019-11-10'), ('9/10/70', '1970-09-10'), ('saturday april 28 1990', '1990-04-28'), ('thursday january 26 1995', '1995-01-26'), ('monday march 7 1983', '1983-03-07'), ('sunday may 22 1988', '1988-05-22'), ('08 jul 2008', '2008-07-08'), ('8 sep 1999', '1999-09-08'), ('thursday january 1 1981', '1981-01-01')]

说明，你已经加载了

dataset：一个元组列表 (人类可读日期, 机器可读日期)。
human_vocab：一个python字典，将人类可读日期中使用的所有字符映射到整数值索引。
machine_vocab: 一个python字典，将机器可读日期中使用的所有字符映射到整数值索引。这些索引不一定与 human_vocab 的索引一致。
inv_machine_vocab: machine_vocab的逆字典，从索引到字符的映射。

让我们对数据进行预处理，将原始文本数据映射到索引值。我们还将使用Tx=30（我们假设它是人类可读日期的最大长度；如果我们得到更长的输入，我们将不得不截断它）和Ty=10（因为“YYYY-MM-DD”是10个字符长）。

Tx = 30
Ty = 10
X, Y, Xoh, Yoh = preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty)

print("X.shape:", X.shape)
print("Y.shape:", Y.shape)
print("Xoh.shape:", Xoh.shape)
print("Yoh.shape:", Yoh.shape)

结果

X.shape: (10000, 30)
Y.shape: (10000, 10)
Xoh.shape: (10000, 30, 37)
Yoh.shape: (10000, 10, 11)

现在你有：

X: 训练集中人类可读日期经过处理的版本, 其中每个字符都被它在 human_vocab 中映射的该字符的索引替换。每个日期都使用特殊字符（）进一步填充为 $T_x$ 值。。维度为 X.shape = (m, Tx)
Y: 训练集中机器可读日期的处理版本, 其中每个字符都被它在machine_vocab中映射的索引替换。维度为 Y.shape = (m, Ty)。
Xoh: X 的 one-hot 版本, one-hot 中“1” 项的索引被映射到在human_vocab中对应字符。维度为 Xoh.shape = (m, Tx, len(human_vocab))
Yoh: Y 的 one-hot 版本, one-hot 中“1” 项的索引被映射到在machine_vocab中对应字符。维度为 Yoh.shape = (m, Tx, len(machine_vocab))。这里, len(machine_vocab) = 11 因为有 11 个字符 (’-’ 以及 0-9)。

我们再来看看一些预处理训练样本。请随意使用下面代码中的index来搜索数据集，并查看如何预处理源/目标日期。

index = 0
print("Source date:", dataset[index][0])
print("Target date:", dataset[index][1])
print()
print("Source after preprocessing (indices):", X[index])
print("Target after preprocessing (indices):", Y[index])
print()
print("Source after preprocessing (one-hot):", Xoh[index])
print("Target after preprocessing (one-hot):", Yoh[index])

结果

Source date: 9 may 1998
Target date: 1998-05-09

Source after preprocessing (indices): [12  0 24 13 34  0  4 12 12 11 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36]
Target after preprocessing (indices): [ 2 10 10  9  0  1  6  0  1 10]

Source after preprocessing (one-hot): [[0. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]]
Target after preprocessing (one-hot): [[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

2.带注意力的神经机器翻译

如果你必须把一本书的段落从法语翻译成英语，你不会读完整段，然后合上书来翻译。即使在翻译过程中，你也会反复阅读法语段落中与你正在翻译的英语部分相对应的部分。

注意机制告诉一个神经机器翻译模型，在任何步骤它都应该有注意力。

2.1注意机制

在这一部分中，你将实现讲课程中介绍的注意机制。这里有一个图来提醒你这个模型是如何工作的。左侧图展示注意力模型。右侧图表展示了一个“注意”步骤：计算注意力变量 $\alpha^{\langle t, t' \rangle}$ , 使用注意力变量计算输出中每个时间步( $\ldots, T_y$ )的上下文变量 $context^{\langle t \rangle}$
在这里插入图片描述
下面是你可能会注意到的模型的一些特性

在这个模型中有两个独立的LSTM（见左图）。因为图片底部，在Attention之前，那个是一个Bi-directional LSTM双向LSTM，我们称它为pre-attention Bi-LSTM 预注意双向LSTM。图顶部的LSTM位于Attention之后，因此我们将其称为post-attention LSTM 后注意LSTM。
- pre-attention Bi LSTM经过 $T_x$ 时间步；
- post-attention LSTM经过 $T_y$ 时间步。
post attention LSTM 通过 $s^{\langle t \rangle}, c^{\langle t \rangle}$ 从一个时间步传递到下一个时间步。在课程中，我们只对post-activation序列模型使用了一个基本的RNN，因此状态被RNN输出激活 $s^{\langle t\rangle}$ 捕获。但是由于我们在这里使用的是LSTM，LSTM同时具有输出激活 $s^{\langle t\rangle}$ 和隐藏单元状态 $c^{\langle t\rangle}$ ，与以前的文本生成示例（如第1周中的恐龙）不同，在这个模型中， $t$ 时的激活后LSTM不会将生成的 $y^{\langle t-1\rangle}$ 作为输入；它只将 $s^{\langle t\rangle}$ 和 $c^{\langle t\rangle}$ 作为输入。我们这样设计模型，是因为（不同于语言生成中相邻字符高度相关）YYYY-MM-DD日期中前一个字符和下一个字符之间的依赖性没有那么强。
我们使用 $^{\langle t\rangle}=[\overrightarrow{a}^{\langle t\rangle}$