L5W2作业1 Emojify!

欢迎来到第2周的第二个作业。此次你将使用单词向量表示来构建Emojifier表情符号。

你是否曾经想过让短信更具表现力?你的emojifier应用程序将帮助你做到这一点。因此,与其写“恭喜晋升!有机会喝杯咖啡聊天吧。爱你!” emojifier可以自动将其变成“恭喜升职👍!有机会一起喝咖啡☕️聊天吧,爱你!❤️”

你将实现一个模型,该模型输入一个句子(例如“让我们今晚去看棒球比赛!”),并找到最适合与该句子搭配使用的表情符号(⚾️)。在许多表情符号界面中,你需要记住❤️是“心”符号而不是“爱”符号。但是使用单词向量,你会看到,即使你的训练集仅将几个单词与特定表情符号明确关联,你的算法也能够将测试集中的单词归纳并关联到同一表情符号,即使这些单词没有甚至不会出现在训练集中。这样,即使使用很小的训练集,也可以构建从句子到表情符号的准确分类器映射。

在本练习中,你将从使用单词嵌入的基准模型(Emojifier-V1)开始,然后构建一个包含LSTM的更复杂的模型(Emojifier-V2)。

让我们开始吧!运行以下单元格以加载要使用的软件包。

评论

In [1]:

cd /home/kesci/input/deeplearning157914
/home/kesci/input/deeplearning157914

In [3]:

!pip install emoji
# 就51k很快哒,不用搭镜像哒
Collecting emoji
  Downloading https://files.pythonhosted.org/packages/40/8d/521be7f0091fe0f2ae690cc044faf43e3445e0ff33c574eae752dd7e39fa/emoji-0.5.4.tar.gz (43kB)
    100% |████████████████████████████████| 51kB 66kB/s ta 0:00:0111
Building wheels for collected packages: emoji
  Running setup.py bdist_wheel for emoji ... done
  Stored in directory: /home/kesci/.cache/pip/wheels/2a/a9/0a/4f8e8cce8074232aba240caca3fade315bb49fac68808d1a9c
Successfully built emoji
Installing collected packages: emoji
Successfully installed emoji-0.5.4
You are using pip version 9.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

In [4]:

import numpy as np
from emo_utils import *
import emoji
import matplotlib.pyplot as plt

%matplotlib inline

1 基准模型:Emojifier-V1

1.1 EMOJISET数据集

让我们从构建一个简单的baseline分类器开始。

你有一个很小的数据集(X,Y),其中:

  • X包含127个句子(字符串)
  • Y包含一个介于0到4之间的整数标签,对应于每个句子的表情符号

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CuqLw6qc-1667826629100)(L5W2%E4%BD%9C%E4%B8%9A1%20Emojify%EF%BC%81.assets/960.png)]

图1 :EMOJISET-5分类问题。这里列出了一些例句。

让我们使用下面的代码加载数据集。我们将数据集分为训练(127个示例)和测试(56个示例)集。

In [5]:

X_train, Y_train = read_csv('data/train_emoji.csv')
X_test, Y_test = read_csv('data/tesss.csv')

In [6]:

aa = max(X_train, key=len);
maxLen = len(max(X_train, key=len).split())

运行以下单元格以打印X_train和Y_train的句子的相应标签。更改index以查看不同的示例。由于iPython笔记本使用的字体,爱心表情符号可能会是黑色而不是红色。

In [7]:

index = 1
print(X_train[index], label_to_emoji(Y_train[index]))
I am proud of your achievements 😄

1.2 Emojifier-V1概述

在这一部分中,你将实现一个称为“Emojifier-v1”的基准模型。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WMu7RIMi-1667826629102)(L5W2%E4%BD%9C%E4%B8%9A1%20Emojify%EF%BC%81.assets/960-1667297029543-1.png)]

图2 :基准模型(Emojifier-V1)。

模型的输入是与句子相对应的字符串(例如,“I love you”。在代码中,输出将是维度为(1,5)的概率向量,然后将其传递到argmax层中以提取概率最大的表情符号的输出索引。

为了使我们的标签成为适合训练softmax分类器的格式,让我们将从当前Y的维度、(m,1)转换为“独热表示”(m,5),其中每个 row是一个独热向量,提供了一个示例的标签,你可以使用下一个代码截取器来实现。在这里,Y_oh在变量名Y_oh_trainY_oh_test中代表"Y-one-hot" :

In [8]:

Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)

让我们看看convert_to_one_hot()做了什么。随时更改index以输出不同的值。

In [9]:

index = 50
print(Y_train[index], "is converted into one hot", Y_oh_train[index])
0 is converted into one hot [1. 0. 0. 0. 0.]

现在,所有数据都准备好输入到Emojify-V1模型。让我们实现模型!

1.3 实现Emojifier-V1

如图(2)所示,第一步是将输入句子转换为单词向量表示形式,然后将它们平均在一起。与之前的练习类似,我们将使用预训练的50维GloVe嵌入。运行以下单元格以加载word_to_vec_map,其中包含所有向量表示形式。

In [10]:

word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')

你已加载:

  • word_to_index:字典将单词映射到词汇表中的索引(400,001个单词,有效索引范围是0到400,000)
  • index_to_word:字典从索引到词汇表中对应词的映射
  • word_to_vec_map:将单词映射到其GloVe向量表示的字典。

运行以下单元格以检查其是否有效。

In [11]:

word = "cucumber"
index = 289846
print("the index of", word, "in the vocabulary is", word_to_index[word])
print("the", str(index) + "th word in the vocabulary is", index_to_word[index])
the index of cucumber in the vocabulary is 113317
the 289846th word in the vocabulary is potatos

练习:实现sentence_to_avg(),你将需要执行两个步骤:

  1. 将每个句子转换为小写,然后将句子拆分为单词列表。X.lower()X.split()可能有用。
  2. 对于句子中的每个单词,请访问其GloVe表示。然后,将所有这些值取平均值。

In [12]:

# GRADED FUNCTION: sentence_to_avg

def sentence_to_avg(sentence, word_to_vec_map):
    """
    Converts a sentence (string) into a list of words (strings). Extracts the GloVe representation of each word
    and averages its value into a single vector encoding the meaning of the sentence.
    
    Arguments:
    sentence -- string, one training example from X
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    
    Returns:
    avg -- average vector encoding information about the sentence, numpy-array of shape (50,)
    """
    
    ### START CODE HERE ###
    # Step 1: Split sentence into list of lower case words (≈ 1 line)
    words = (sentence.lower()).split() 

    # Initialize the average word vector, should have the same shape as your word vectors.
    avg = np.zeros(50)
    
    # Step 2: average the word vectors. You can loop over the words in the list "words".
    for w in words:
        avg +=  word_to_vec_map[w]
    avg = avg/len(words)
    
    ### END CODE HERE ###
    
    return avg

In [13]:

avg = sentence_to_avg("Morrocan couscous is my favorite dish", word_to_vec_map)
print("avg = ", avg)
avg =  [-0.008005    0.56370833 -0.50427333  0.258865    0.55131103  0.03104983
 -0.21013718  0.16893933 -0.09590267  0.141784   -0.15708967  0.18525867
  0.6495785   0.38371117  0.21102167  0.11301667  0.02613967  0.26037767
  0.05820667 -0.01578167 -0.12078833 -0.02471267  0.4128455   0.5152061
  0.38756167 -0.898661   -0.535145    0.33501167  0.68806933 -0.2156265
  1.797155    0.10476933 -0.36775333  0.750785    0.10282583  0.348925
 -0.27262833  0.66768    -0.10706167 -0.283635    0.59580117  0.28747333
 -0.3366635   0.23393817  0.34349183  0.178405    0.1166155  -0.076433
  0.1445417   0.09808667]

预期输出:
avg = [-0.008005 0.56370833 -0.50427333 0.258865 0.55131103 0.03104983
-0.21013718 0.16893933 -0.09590267 0.141784 -0.15708967 0.18525867
0.6495785 0.38371117 0.21102167 0.11301667 0.02613967 0.26037767
0.05820667 -0.01578167 -0.12078833 -0.02471267 0.4128455 0.5152061
0.38756167 -0.898661 -0.535145 0.33501167 0.68806933 -0.2156265
1.797155 0.10476933 -0.36775333 0.750785 0.10282583 0.348925
-0.27262833 0.66768 -0.10706167 -0.283635 0.59580117 0.28747333
-0.3366635 0.23393817 0.34349183 0.178405 0.1166155 -0.076433
0.1445417 0.09808667]

模型

现在,你已经完成了所有实现model()函数的步骤。使用sentence_to_avg()之后,你需要使平均值通过正向传播,计算损失,然后反向传播以更新softmax的参数。

练习:实现图(2)中描述的model()函数。假设Yoh(“Y独热”)是输出标签的独热编码,则在正向传递中需要实现的公式和计算交叉熵损失的公式为:
z ( i ) = W . a v g ( i ) + b z^{(i)} = W . avg^{(i)} + b z(i)=W.avg(i)+b

a ( i ) = s o f t m a x ( z ( i ) ) a^{(i)} = softmax(z^{(i)}) a(i)=softmax(z(i))

L ( i ) = − ∑ k = 0 n y − 1 Y o h k ( i ) ∗ l o g ( a k ( i ) ) \mathcal{L}^{(i)} = - \sum_{k = 0}^{n_y - 1} Yoh^{(i)}_k * log(a^{(i)}_k) L(i)=k=0ny1Yohk(i)log(ak(i))

有可能提出一个更有效的向量化实现。但是,由于我们始终使用for循环将句子一次转换为 a v g ( i ) avg^{(i)} avg(i)表示形式,因此这次我们不用理会。

我们为你提供了一个函数softmax()

In [14]:

# GRADED FUNCTION: model

def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):
    """
    Model to train word vector representations in numpy.
    
    Arguments:
    X -- input data, numpy array of sentences as strings, of shape (m, 1)
    Y -- labels, numpy array of integers between 0 and 7, numpy-array of shape (m, 1)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    learning_rate -- learning_rate for the stochastic gradient descent algorithm
    num_iterations -- number of iterations
    
    Returns:
    pred -- vector of predictions, numpy-array of shape (m, 1)
    W -- weight matrix of the softmax layer, of shape (n_y, n_h)
    b -- bias of the softmax layer, of shape (n_y,)
    """
    
    np.random.seed(1)

    # Define number of training examples
    m = Y.shape[0]                          # number of training examples
    n_y = 5                                 # number of classes  
    n_h = 50                                # dimensions of the GloVe vectors 
    
    # Initialize parameters using Xavier initialization
    W = np.random.randn(n_y, n_h) / np.sqrt(n_h)
    b = np.zeros((n_y,))
    
    # Convert Y to Y_onehot with n_y classes
    Y_oh = convert_to_one_hot(Y, C = n_y) 
    
    # Optimization loop
    for t in range(num_iterations):                       # Loop over the number of iterations
        for i in range(m):                                # Loop over the training examples
            
            ### START CODE HERE ### (≈ 4 lines of code)
            # Average the word vectors of the words from the i'th training example
            avg = sentence_to_avg(X[i],word_to_vec_map)

            # Forward propagate the avg through the softmax layer
            z = np.dot(W,avg) + b
            a = softmax(z)

            # Compute cost using the i'th training label's one hot representation and "A" (the output of the softmax)
            cost = -np.sum(Y_oh[i]*np.log(a))
            ### END CODE HERE ###
            
            # Compute gradients 
            dz = a - Y_oh[i]
            dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
            db = dz

            # Update parameters with Stochastic Gradient Descent
            W = W - learning_rate * dW
            b = b - learning_rate * db
        
        if t % 100 == 0:
            print("Epoch: " + str(t) + " --- cost = " + str(cost))
            pred = predict(X, Y, W, b, word_to_vec_map)

    return pred, W, b

In [15]:

print(X_train.shape)
print(Y_train.shape)
print(np.eye(5)[Y_train.reshape(-1)].shape)
print(X_train[0])
print(type(X_train))
Y = np.asarray([5,0,0,5, 4, 4, 4, 6, 6, 4, 1, 1, 5, 6, 6, 3, 6, 3, 4, 4])
print(Y.shape)

X = np.asarray(['I am going to the bar tonight', 'I love you', 'miss you my dear',
 'Lets go party and drinks','Congrats on the new job','Congratulations',
 'I am so happy for you', 'Why are you feeling bad', 'What is wrong with you',
 'You totally deserve this prize', 'Let us go play football',
 'Are you down for football this afternoon', 'Work hard play harder',
 'It is suprising how people can be dumb sometimes',
 'I am very disappointed','It is the best day in my life',
 'I think I will end up alone','My life is so boring','Good job',
 'Great so awesome'])

print(X.shape)
print(np.eye(5)[Y_train.reshape(-1)].shape)
print(type(X_train))
(132,)
(132,)
(132, 5)
never talk to me again
<class 'numpy.ndarray'>
(20,)
(20,)
(132, 5)
<class 'numpy.ndarray'>

运行下一个单元格来训练模型并学习softmax参数(W,b)。

In [17]:

pred, W, b = model(X_train, Y_train, word_to_vec_map)
print(pred.T)
Epoch: 0 --- cost = 1.9520498812810076
Accuracy: 0.3484848484848485
Epoch: 100 --- cost = 0.07971818726014794
Accuracy: 0.9318181818181818
Epoch: 200 --- cost = 0.04456369243681402
Accuracy: 0.9545454545454546
Epoch: 300 --- cost = 0.03432267378786059
Accuracy: 0.9696969696969697
[[3. 2. 3. 0. 4. 0. 3. 2. 3. 1. 3. 3. 1. 3. 2. 3. 2. 3. 1. 2. 3. 0. 2. 2.
  2. 1. 4. 3. 3. 4. 0. 3. 4. 2. 0. 3. 2. 2. 3. 4. 2. 2. 0. 2. 3. 0. 3. 2.
  4. 3. 0. 3. 3. 3. 4. 2. 1. 1. 1. 2. 3. 1. 0. 0. 0. 3. 4. 4. 2. 2. 1. 2.
  0. 3. 2. 2. 0. 3. 3. 1. 2. 1. 2. 2. 4. 3. 3. 2. 4. 0. 0. 3. 3. 3. 3. 2.
  0. 1. 2. 3. 0. 2. 2. 2. 3. 2. 2. 2. 4. 1. 1. 3. 3. 4. 1. 2. 1. 1. 3. 1.
  0. 4. 0. 3. 3. 4. 4. 1. 4. 3. 0. 2.]]

预期输出 (on a subset of iterations):
Epoch: 0 — cost = 1.9520498812810076
Accuracy: 0.3484848484848485
Epoch: 100 — cost = 0.07971818726014794
Accuracy: 0.9318181818181818
Epoch: 200 — cost = 0.04456369243681402
Accuracy: 0.9545454545454546
Epoch: 300 — cost = 0.03432267378786059
Accuracy: 0.9696969696969697

Great!你的模型在训练集上具有很高的准确性。现在让我们看看它如何在测试集上运行。

1.4 检查测试集表现

In [18]:

print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)
Training set:
Accuracy: 0.9772727272727273
Test set:
Accuracy: 0.8571428571428571

预期输出:
Training set:
Accuracy: 0.9772727272727273
Test set:
Accuracy: 0.8571428571428571

假设有5个类别,那么随机猜测的准确率将达到20%。在仅训练了127个示例之后,这是相当不错的表现。

在训练集中,算法看到带有标签❤️的句子"I love you"。但是,你可以检查单词"adore"是否没有出现在训练集中。尽管如此,让我们看看如果你写"I adore you."会发生什么。

In [19]:

X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy"])
Y_my_labels = np.array([[0], [0], [2], [1], [4],[3]])

pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)
print_predictions(X_my_sentences, pred)
Accuracy: 0.8333333333333334

i adore you ❤️
i love you ❤️
funny lol 😄
lets play with a ball ⚾
food is ready 🍴
not feeling happy 😄

惊人!由于adore具有与love类似的嵌入方式,因此该算法可以正确地推广到甚至从未见过的单词。heartdearbelovedadore之类的单词具有类似于love的嵌入向量,因此也可以使用-随意修改上面的输入并尝试各种输入语句。看看效果如何?

请注意,尽管这样并不能使"not feeling happy"正确。该算法忽略单词顺序,因此不善于理解"not happy."之类的短语。

输出混淆矩阵还可以帮助你了解哪些类别对你的模型而言更加困难。混淆矩阵显示算法将一个类(“实际”类)的示例错误地标记为另一个类(“预测”类)的频率。

In [20]:

print(Y_test.shape)
print('           '+ label_to_emoji(0)+ '    ' + label_to_emoji(1) + '    ' +  label_to_emoji(2)+ '    ' + label_to_emoji(3)+'   ' + label_to_emoji(4))
print(pd.crosstab(Y_test, pred_test.reshape(56,), rownames=['Actual'], colnames=['Predicted'], margins=True))
plot_confusion_matrix(Y_test, pred_test)
(56,)
           ❤️    ⚾    😄    😞   🍴
Predicted  0.0  1.0  2.0  3.0  4.0  All
Actual                                 
0            6    0    0    1    0    7
1            0    8    0    0    0    8
2            2    0   16    0    0   18
3            1    1    2   12    0   16
4            0    0    1    0    6    7
All          9    9   19   13    6   56

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-yESq8iJw-1667826629103)(L5W2%E4%BD%9C%E4%B8%9A1%20Emojify%EF%BC%81.assets/q2137j6513.png)]

这部分你应该记住

  • 即使有127个训练示例,你也可以获得合理的Emojifying模型。这是由于单词向量为你提供的泛化能力。
  • Emojify-V1在 “This movie is not good and not enjoyable” 等句子上表现不佳,因为它不理解单词的组合-只是将所有单词的嵌入向量平均在一起,而没有注意单词的顺序。在下一部分中,你将构建一个更好的算法。

2 Emojifier-V2:在Keras中使用LSTM:

让我们建立一个LSTM模型作为输入单词序列。此模型将能够考虑单词顺序。Emojifier-V2将继续使用预训练的单词嵌入来表示单词,但会将其输入到LSTM中,LSTM的工作是预测最合适的表情符号。

运行以下单元格以加载Keras软件包。

In [21]:

import numpy as np
np.random.seed(0)
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.initializers import glorot_uniform
np.random.seed(1)
Using TensorFlow backend.
/opt/conda/lib/python3.6/site-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.25.7) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)

2.1 模型概述

这是你将实现的Emojifier-v2:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ZvL8ZCe2-1667826629103)(L5W2%E4%BD%9C%E4%B8%9A1%20Emojify%EF%BC%81.assets/960-1667297029543-2.png)]

图3 :Emojifier-V2;2层LSTM序列分类器。

2.2 Keras和小批次处理

在本练习中,我们想使用mini-batch训练Keras。但是,大多数深度学习框架要求同一小批次中的所有序列具有相同的长度。这就是向量化可以起作用的原因:如果你有3个单词的句子和4个单词的句子,那么它们所需的计算是不同的(一个LSTM需要3个步骤,一个LSTM需要4个步骤),所以同时做他们两个是不可能的。

常见的解决方案是使用填充。具体来说,设置最大序列长度,并将所有序列填充为相同的长度。例如,最大序列长度为20,我们可以用“0”填充每个句子,以便每个输入句子的长度为20。因此,句子"i love you"将表示为 ( e i , e l o v e , e y o u , 0 ⃗ , 0 ⃗ , … , 0 ⃗ ) (e_{i}, e_{love}, e_{you}, \vec{0}, \vec{0}, \ldots, \vec{0}) (ei,elove,eyou,0 ,0 ,,0 )。在此示例中,任何长度超过20个单词的句子都必须被截断。选择最大序列长度的一种简单方法是仅选择训练集中最长句子的长度

2.3 嵌入层

在Keras中,嵌入矩阵表示为“层”,并将正整数(对应于单词的索引)映射为固定大小的密集向量(嵌入向量)。可以使用预训练的嵌入对其进行训练或初始化。在这一部分中,你将学习如何在Keras中创建Embedding()层,并使用之前在笔记本中加载的GloVe 50维向量对其进行初始化。因为我们的训练集很小,所以我们不会更新单词嵌入,而是将其值保持不变。但是在下面的代码中,我们将向你展示Keras如何允许你训练或固定该层。

Embedding()层采用大小(batch size,max input length)的整数矩阵作为输入。如下图所示,这对应于转换为索引列表(句子)的句子。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-yayXl2BX-1667826629104)(L5W2%E4%BD%9C%E4%B8%9A1%20Emojify%EF%BC%81.assets/960-1667297029544-3.png)]

图4:嵌入层;此示例显示了两个示例通过嵌入层的传播。两者都被零填充到max_len=5的长度。最终的向量维度为(2,max_len,50),因为我们使用的词嵌入为50维。

输入中的最大整数(即单词索引)应不大于词汇表的大小。该层输出一个维度数组(batch size, max input length, dimension of word vectors)。

第一步是将所有训练语句转换为索引列表,然后对所有这些列表进行零填充,以使它们的长度为最长句子的长度。

练习:实现以下函数,将X(字符串形式的句子数组)转换为与句子中单词相对应的索引数组。输出维度应使其可以赋予Embedding()(如图4所示)。

In [22]:

# GRADED FUNCTION: sentences_to_indices

def sentences_to_indices(X, word_to_index, max_len):
    """
    Converts an array of sentences (strings) into an array of indices corresponding to words in the sentences.
    The output shape should be such that it can be given to `Embedding()` (described in Figure 4). 
    
    Arguments:
    X -- array of sentences (strings), of shape (m, 1)
    word_to_index -- a dictionary containing the each word mapped to its index
    max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. 
    
    Returns:
    X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)
    """
    
    m = X.shape[0]                                   # number of training examples
    
    ### START CODE HERE ###
    # Initialize X_indices as a numpy matrix of zeros and the correct shape (≈ 1 line)
    X_indices = np.zeros((m,max_len))
    
    for i in range(m):                               # loop over training examples
        
        # Convert the ith training sentence in lower case and split is into words. You should get a list of words.
        sentence_words =(X[i].lower()).split() 
        
        # Initialize j to 0
        j = 0
        
        # Loop over the words of sentence_words
        for w in sentence_words:
            # Set the (i,j)th entry of X_indices to the index of the correct word.
            X_indices[i, j] = word_to_index[w]
            # Increment j to j + 1
            j = j+1
            
    ### END CODE HERE ###
    
    return X_indices

运行以下单元格以检查sentences_to_indices()的作用,并检查结果。

In [23]:

X1 = np.array(["funny lol", "lets play baseball", "food is ready for you"])
X1_indices = sentences_to_indices(X1,word_to_index, max_len = 5)
print("X1 =", X1)
print("X1_indices =", X1_indices)
X1 = ['funny lol' 'lets play baseball' 'food is ready for you']
X1_indices = [[155345. 225122.      0.      0.      0.]
 [220930. 286375.  69714.      0.      0.]
 [151204. 192973. 302254. 151349. 394475.]]

预期输出:
X1 = [‘funny lol’ ‘lets play baseball’ ‘food is ready for you’]
X1_indices = [[155345. 225122. 0. 0. 0.]
[220930. 286375. 69714. 0. 0.]
[151204. 192973. 302254. 151349. 394475.]]

让我们使用预先训练的单词向量在Keras中构建Embedding()层。建立此层后,你将把sentences_to_indices()的输出作为输入传递给它,而Embedding()层将返回句子的单词嵌入。

练习:实现pretrained_embedding_layer()。你将需要执行以下步骤:

  • 将嵌入矩阵初始化为具有正确维度的零的numpy数组。
  • 使用从word_to_vec_map中提取的所有词嵌入来填充嵌入矩阵。
  • 定义Keras嵌入层。 使用Embedding()。确保在调用 Embedding()时设置trainable = False来使该层不可训练。如果要设置trainable = True,那么它将允许优化算法修改单词嵌入的值。
  • 将嵌入权重设置为等于嵌入矩阵

In [24]:

# GRADED FUNCTION: pretrained_embedding_layer

def pretrained_embedding_layer(word_to_vec_map, word_to_index):
    """
    Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
    
    Arguments:
    word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    embedding_layer -- pretrained layer Keras instance
    """
    
    vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)
    emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 50)
    
    ### START CODE HERE ###
    # Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim)
    emb_matrix = np.zeros((vocab_len,emb_dim))
    
    # Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
    for word, index in word_to_index.items():
        emb_matrix[index, :] = word_to_vec_map[word]

    # Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False. 
    embedding_layer = Embedding(vocab_len,emb_dim,trainable = False )
    ### END CODE HERE ###

    # Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
    embedding_layer.build((None,))
    
    # Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
    embedding_layer.set_weights([emb_matrix])
    
    return embedding_layer

In [25]:

embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
print("weights[0][1][3] =", embedding_layer.get_weights()[0][1][3])
WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
weights[0][1][3] = -0.3403

预期输出:
weights[0][1][3] = -0.3403

2.4 构建Emojifier-V2

现在让我们构建Emojifier-V2模型,你将使用已构建的嵌入层来执行此操作,并将其输出提供给LSTM网络。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f42OL3Ci-1667826629105)(L5W2%E4%BD%9C%E4%B8%9A1%20Emojify%EF%BC%81.assets/960-1667297029544-4.png)]

图3:Emojifier-v2;2层LSTM序列分类器。

练习:实现Emojify_V2(),它构建图3所示结构的Keras图。该模型将由input_shape定义的维度为(m, max_len, )的句子数组作为输入。 它应该输出形状为softmax的概率向量(m, C = 5)。你可能需要Input(shape = ..., dtype = '...')LSTM(), Dropout(), Dense(), 和 Activation()

In [26]:

# GRADED FUNCTION: Emojify_V2

def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
    """
    Function creating the Emojify-v2 model's graph.
    
    Arguments:
    input_shape -- shape of the input, usually (max_len,)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    model -- a model instance in Keras
    """
    
    ### START CODE HERE ###
    # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
    sentence_indices = Input(shape=input_shape,dtype='int32')
    
    # Create the embedding layer pretrained with GloVe Vectors (≈1 line)
    embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
    
    # Propagate sentence_indices through your embedding layer, you get back the embeddings
    embeddings = embedding_layer(sentence_indices)  
    
    # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a batch of sequences.
    X = LSTM (128, return_sequences = True)(embeddings)  
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X trough another LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a single hidden state, not a batch of sequences.
    X = LSTM (128, return_sequences = False)(X)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
    X = Dense(5)(X)
    # Add a softmax activation
    X = Activation('softmax')(X)
    
    # Create Model instance which converts sentence_indices into X.
    model = Model(sentence_indices,X)
    
    ### END CODE HERE ###
    
    return model

运行以下单元格以创建你的模型并检查其总结。由于数据集中的所有句子均少于10个单词,因此我们选择“max_len = 10”。你应该看到你的体系结构,它使用“20,223,927”个参数,其中20,000,050(词嵌入)是不可训练的,其余223,877是可训练的。因为我们的词汇量有400,001个单词(有效索引从0到400,000),所以有400,001*50 = 20,000,050个不可训练的参数。

In [27]:

model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
model.summary()
WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 10)                0         
_________________________________________________________________
embedding_2 (Embedding)      (None, 10, 50)            20000050  
_________________________________________________________________
lstm_1 (LSTM)                (None, 10, 128)           91648     
_________________________________________________________________
dropout_1 (Dropout)          (None, 10, 128)           0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 5)                 645       
_________________________________________________________________
activation_1 (Activation)    (None, 5)                 0         
=================================================================
Total params: 20,223,927
Trainable params: 223,877
Non-trainable params: 20,000,050
_________________________________________________________________

与之前一样,在Keras中创建模型后,你需要对其进行编译并定义要使用的损失,优化器和指标。使用categorical_crossentropy损失,adam优化器和['accuracy']度量来编译模型:

In [28]:

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

现在该训练你的模型了。你的Emojifier-V2模型以(m, max_len) 维度数组作为输入,并输出维度概率矢量 (m, number of classes)。因此,我们必须将X_train(作为字符串的句子数组)转换为X_train_indices(作为单词索引列表的句子数组),并将Y_train(作为索引的标签)转换为Y_train_oh(作为独热向量的标签)。

In [29]:

X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
Y_train_oh = convert_to_one_hot(Y_train, C = 5)

X_train_indicesY_train_oh上拟合Keras模型。我们将使用 epochs = 50batch_size = 32

In [30]:

model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)
WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/50
132/132 [==============================] - 4s 27ms/step - loss: 1.6083 - acc: 0.1970
Epoch 2/50
132/132 [==============================] - 0s 2ms/step - loss: 1.5324 - acc: 0.2955
Epoch 3/50
132/132 [==============================] - 0s 3ms/step - loss: 1.5011 - acc: 0.3258
Epoch 4/50
132/132 [==============================] - 0s 2ms/step - loss: 1.4391 - acc: 0.3561
Epoch 5/50
132/132 [==============================] - 0s 2ms/step - loss: 1.3481 - acc: 0.4545
Epoch 6/50
132/132 [==============================] - 0s 2ms/step - loss: 1.2347 - acc: 0.5152
Epoch 7/50
132/132 [==============================] - 0s 3ms/step - loss: 1.1768 - acc: 0.4470
Epoch 8/50
132/132 [==============================] - 0s 2ms/step - loss: 1.0552 - acc: 0.5758
Epoch 9/50
132/132 [==============================] - 0s 2ms/step - loss: 0.8779 - acc: 0.7121
Epoch 10/50
132/132 [==============================] - 0s 2ms/step - loss: 0.8241 - acc: 0.6970
Epoch 11/50
132/132 [==============================] - 0s 3ms/step - loss: 0.7038 - acc: 0.7500
Epoch 12/50
132/132 [==============================] - 0s 2ms/step - loss: 0.6011 - acc: 0.8030
Epoch 13/50
132/132 [==============================] - 0s 2ms/step - loss: 0.4941 - acc: 0.8333
Epoch 14/50
132/132 [==============================] - 0s 2ms/step - loss: 0.5102 - acc: 0.8333
Epoch 15/50
132/132 [==============================] - 0s 3ms/step - loss: 0.4796 - acc: 0.8258
Epoch 16/50
132/132 [==============================] - 0s 2ms/step - loss: 0.3544 - acc: 0.8636
Epoch 17/50
132/132 [==============================] - 0s 2ms/step - loss: 0.3910 - acc: 0.8561
Epoch 18/50
132/132 [==============================] - 0s 2ms/step - loss: 0.6495 - acc: 0.8106
Epoch 19/50
132/132 [==============================] - 0s 3ms/step - loss: 0.5181 - acc: 0.8182
Epoch 20/50
132/132 [==============================] - 0s 2ms/step - loss: 0.3964 - acc: 0.8409
Epoch 21/50
132/132 [==============================] - 0s 2ms/step - loss: 0.4708 - acc: 0.8182
Epoch 22/50
132/132 [==============================] - 0s 2ms/step - loss: 0.3883 - acc: 0.8636
Epoch 23/50
132/132 [==============================] - 0s 2ms/step - loss: 0.3782 - acc: 0.8561
Epoch 24/50
132/132 [==============================] - 0s 3ms/step - loss: 0.3052 - acc: 0.9091
Epoch 25/50
132/132 [==============================] - 0s 2ms/step - loss: 0.3471 - acc: 0.8864
Epoch 26/50
132/132 [==============================] - 0s 2ms/step - loss: 0.2424 - acc: 0.9394
Epoch 27/50
132/132 [==============================] - 0s 2ms/step - loss: 0.3171 - acc: 0.8788
Epoch 28/50
132/132 [==============================] - 0s 3ms/step - loss: 0.2406 - acc: 0.9318
Epoch 29/50
132/132 [==============================] - 0s 2ms/step - loss: 0.3935 - acc: 0.8712
Epoch 30/50
132/132 [==============================] - 0s 2ms/step - loss: 0.2671 - acc: 0.9091
Epoch 31/50
132/132 [==============================] - 0s 2ms/step - loss: 0.2963 - acc: 0.8864
Epoch 32/50
132/132 [==============================] - 0s 3ms/step - loss: 0.2038 - acc: 0.9318
Epoch 33/50
132/132 [==============================] - 0s 2ms/step - loss: 0.2120 - acc: 0.9470
Epoch 34/50
132/132 [==============================] - 0s 2ms/step - loss: 0.1580 - acc: 0.9621
Epoch 35/50
132/132 [==============================] - 0s 2ms/step - loss: 0.1644 - acc: 0.9621
Epoch 36/50
132/132 [==============================] - 0s 2ms/step - loss: 0.1890 - acc: 0.9394
Epoch 37/50
132/132 [==============================] - 0s 2ms/step - loss: 0.1818 - acc: 0.9470
Epoch 38/50
132/132 [==============================] - 0s 2ms/step - loss: 0.2176 - acc: 0.9318
Epoch 39/50
132/132 [==============================] - 0s 3ms/step - loss: 0.1409 - acc: 0.9621
Epoch 40/50
132/132 [==============================] - 0s 2ms/step - loss: 0.1565 - acc: 0.9545
Epoch 41/50
132/132 [==============================] - 0s 2ms/step - loss: 0.0876 - acc: 0.9848
Epoch 42/50
132/132 [==============================] - 0s 2ms/step - loss: 0.0816 - acc: 0.9773
Epoch 43/50
132/132 [==============================] - 0s 2ms/step - loss: 0.0812 - acc: 0.9848
Epoch 44/50
132/132 [==============================] - 0s 3ms/step - loss: 0.0499 - acc: 0.9924
Epoch 45/50
132/132 [==============================] - 0s 2ms/step - loss: 0.0776 - acc: 0.9848
Epoch 46/50
132/132 [==============================] - 0s 2ms/step - loss: 0.1006 - acc: 0.9773
Epoch 47/50
132/132 [==============================] - 0s 2ms/step - loss: 0.1436 - acc: 0.9470
Epoch 48/50
132/132 [==============================] - 0s 3ms/step - loss: 0.3119 - acc: 0.9242
Epoch 49/50
132/132 [==============================] - 0s 2ms/step - loss: 0.1112 - acc: 0.9848
Epoch 50/50
132/132 [==============================] - 0s 2ms/step - loss: 0.1712 - acc: 0.9545

Out[30]:

<keras.callbacks.History at 0x7f0c6f629860>

你的模型在训练集上的表现应接近100% accuracy。你获得的确切精度可能会有所不同。运行以下单元格以在测试集上评估模型。

In [31]:

X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen)
Y_test_oh = convert_to_one_hot(Y_test, C = 5)
loss, acc = model.evaluate(X_test_indices, Y_test_oh)
print()
print("Test accuracy = ", acc)
56/56 [==============================] - 0s 4ms/step

Test accuracy =  0.8214285629136222

你应该获得80%到95%的测试精度。运行下面的单元格以查看标签错误的示例。

In [32]:

# This code allows you to see the mislabelled examples
C = 5
y_test_oh = np.eye(C)[Y_test.reshape(-1)]
X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen)
pred = model.predict(X_test_indices)
for i in range(len(X_test)):
    x = X_test_indices
    num = np.argmax(pred[i])
    if(num != Y_test[i]):
        print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip())
Expected emoji:😄 prediction: she got me a nice present	❤️
Expected emoji:😞 prediction: work is hard	😄
Expected emoji:😞 prediction: This girl is messing with me	❤️
Expected emoji:🍴 prediction: any suggestions for dinner	😄
Expected emoji:❤️ prediction: I love taking breaks	😞
Expected emoji:😄 prediction: you brighten my day	❤️
Expected emoji:😄 prediction: will you be my valentine	❤️
Expected emoji:🍴 prediction: See you at the restaurant	😄
Expected emoji:😞 prediction: go away	⚾
Expected emoji:🍴 prediction: I did not have breakfast ❤️

现在,你可以按照自己的示例进行尝试。在下面写下你自己的句子。

In [33]:

# Change the sentence below to see your prediction. Make sure all the words are in the Glove embeddings.  
x_test = np.array(['not feeling happy'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+  label_to_emoji(np.argmax(model.predict(X_test_indices))))
not feeling happy 😞

此前,Emojify-V1模型没有正确标记"not feeling happy,“,但是我们的Emojiy-V2正确实现了。(Keras的输出每次都是稍微随机的,因此你可能无法获得相同的结果。)由于训练集很小且有很多否定的例子,因此当前模型在理解否定(例如"not happy”)方面仍然不是很健壮。但是,如果训练集更大,则LSTM模型在理解此类复杂句子方面将比Emojify-V1模型好得多。

恭喜!你已经完成了此笔记本! ❤️❤️❤️

你应该记住

  • 如果你的NLP任务的训练集很小,则使用单词嵌入可以大大帮助你的算法。词嵌入功能使你的模型可以在测试集中甚至没有出现在训练集中的词上使用。
  • Keras(和大多数其他深度学习框架)中的训练序列模型需要一些重要的细节:
    - 要使用小批量,需要填充序列,以使小批量中的所有示例都具有相同的长度。
    - 可以使用预训练的值来初始化Embedding()层。这些值可以是固定的,也可以在数据集中进一步训练。但是,如果你标记的数据集很小,则通常不值得尝试训练大量预训练的嵌入。
    - LSTM()具有一个名为return_sequences的标志,用于确定你是要返回每个隐藏状态还是仅返回最后一个状态。
    - 你可以在LSTM()之后紧接使用Dropout()来规范你的网络。

祝贺你完成此任务并构建了Emojifier。我们希望你对在这此笔记本所取得的成就感到满意!

😀😀😀😀😀😀

致谢

感谢Alison Darcy和Woebot团队对创建此任务的建议。Woebot是一个聊天机器人,随时准备与你24/7通话。作为Woebot技术的一部分,它使用单词嵌入来理解你所说内容的情感。你可以通过访问http://woebot.io 尝试使用

In [ ]:

 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

追寻远方的人

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值