学习RNN-part2

最新推荐文章于 2024-08-13 10:52:36 发布

a634238158

最新推荐文章于 2024-08-13 10:52:36 发布

阅读量227

点赞数 1

分类专栏：深度学习文章标签：深度学习 RNN

本文链接：https://blog.csdn.net/a634238158/article/details/100906275

版权

深度学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

RNN学习-词向量

词嵌入层难学：实战我们会load预训练模型。本次代码你会学到3项：

加载词嵌入层，并用余弦公式表达词相似度
使用词嵌入层可解决词类analogy问题，例如会使模型基于man2woman，学习到king2?
有些词嵌入层需要修改，避免政治正确

实战代码：

# 1 导入
import numpy as np
from w2v_utils import *
words, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')

# 2 已知嵌入矩阵，计算词之间的相似度
def cosine_similarity(u, v):
    dot = np.dot(u, v)
    norm_u = np.linalg.norm(u)
    norm_v = np.linalg.norm(v)
    cosine_similarity = dot / norm_u / norm_v
    return cosine_similarity
# 2.1 找两个词的相似度
father = word_to_vec_map["father"]
mother = word_to_vec_map["mother"]
print("cosine_similarity(father, mother) = ", cosine_similarity(father, mother))  # 0.89
# 2.2 已知合适词组，找一个词最般配的另一个词
def complete_analogy(word_a, word_b, word_c, word_to_vec_map):
    word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()
    e_a, e_b, e_c = word_to_vec_map[word_a], word_to_vec_map[word_b], word_to_vec_map[word_c]
    words = word_to_vec_map.keys()
    max_cosine_sim = -100              
    best_word = None                   
    for w in words:        
        if w in [word_a, word_b, word_c] :
            continue
        cosine_sim = cosine_similarity((e_b - e_a), (word_to_vec_map[w] - e_c))
        if cosine_sim > max_cosine_sim:
            max_cosine_sim = cosine_sim
            best_word = w
    return best_word
triads_to_try = [('italy', 'italian', 'spain'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('small', 'smaller', 'large')]
for triad in triads_to_try:
    print ('{} -> {} :: {} -> {}'.format( *triad, complete_analogy(*triad,word_to_vec_map)))
'''italy -> italian :: spain -> spanish
india -> delhi :: japan -> tokyo
man -> woman :: boy -> girl
small -> smaller :: large -> larger'''

RNN学习-情感分析

通过嵌入层，将输入的文字附加上emoji表情+原文字输出

在这里插入图片描述

实战代码1-用basicRNN构建表情颜文字：

# 1 导入
import numpy as np
from emo_utils import *
import emoji
import matplotlib.pyplot as plt
X_train, Y_train = read_csv('data/train_emoji.csv') # m=127
X_test, Y_test = read_csv('data/tesss.csv') # m=56
maxLen = len(max(X_train, key=len).split())
# 1.1 预览一下
index = 1
print(X_train[index], label_to_emoji(Y_train[index]))
"""I am proud of your achievements ?"""
# 1.2 预处理：Y变成(m,5)独热码
Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)
# 1.3 预览数据
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt') # 400,001words
word = "cucumber"
index = 289846
print("the index of", word, "in the vocabulary is", word_to_index[word])
print("the", str(index) + "th word in the vocabulary is", index_to_word[index])
"""the index of cucumber in the vocabulary is 113317
the 289846th word in the vocabulary is potatos"""

# 2 实现模型
# 2.1 处理输入词向量
def sentence_to_avg(sentence, word_to_vec_map):
    """
    提取句子中每个词的GloVe representation然后累加/句子长度作为句子的特征向量"""
    words = sentence.lower().split()
    avg = np.zeros(50,)
    for w in words:
        avg += word_to_vec_map[w]
    avg = avg / len(words)
    return avg
# 2.2 构建basicRNN模型
def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):
    """
    Arguments:
    X -- shape (m, 1)
    Y -- shape (m, 1)
    
    Returns:
    pred -- vector of predictions, numpy-array of shape (m, 1)
    W -- weight matrix of the softmax layer, of shape (n_y, n_h)
    b -- bias of the softmax layer, of shape (n_y,)
    """
    np.random.seed(1)
    m = Y.shape[0]                          # number of training examples
    n_y = 5                                 # number of classes  
    n_h = 50                                # dimensions of the GloVe vectors 
    W = np.random.randn(n_y, n_h) / np.sqrt(n_h)
    b = np.zeros((n_y,))
    Y_oh = convert_to_one_hot(Y, C = n_y) 
    # Optimization loop
    for t in range(num_iterations):                       
        for i in range(m):                                
            avg = sentence_to_avg(X[i],word_to_vec_map)
            z = np.dot(W,avg) + b
            a = softmax(z)
            cost = -1 * np.multiply(Y[i],np.log(a))
            dz = a - Y_oh[i]
            dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
            db = dz
            W = W - learning_rate * dW
            b = b - learning_rate * db       
        if t % 100 == 0:
            print("Epoch: " + str(t) + " --- cost = " + str(cost))
            pred = predict(X, Y, W, b, word_to_vec_map)
    return pred, W, b
# 2.3 开始训练
pred, W, b = model(X_train, Y_train, word_to_vec_map)
'''Epoch: 0 --- cost = [ 2.82117539  2.22537435  3.90409976  3.65077617  4.17192113]
Accuracy: 0.348484848485
Epoch: 100 --- cost = [  7.39085514   6.39666398   0.15943637   9.61056197  11.77782592]
Accuracy: 0.931818181818
Epoch: 200 --- cost = [  7.86956435   7.883712     0.08912738  11.25652113  13.75952996]
Accuracy: 0.954545454545
Epoch: 300 --- cost = [  8.06494045   8.67838712   0.06864535  12.0741376   14.92485916]
Accuracy: 0.969696969697'''
# 2.4 检验模型成果
print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)
'''Training set:
Accuracy: 0.977272727273
Test set:
Accuracy: 0.857142857143'''
X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy"])
Y_my_labels = np.array([[0], [0], [2], [1], [4],[3]])
pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)
print_predictions(X_my_sentences, pred)
'''
Accuracy: 0.833333333333

i adore you ❤️
i love you ❤️
funny lol ?
lets play with a ball ⚾
food is ready ?
not feeling happy ?'''

Amazing! Because adore has a similar embedding as love, the algorithm has generalized correctly even to a word it has never seen before. Words such as heart, dear, beloved or adore have embedding vectors similar to love, and so might work too.

What you should remember from this part:

Even with a 127 training examples, you can get a reasonably good model for Emojifying. This is due to the generalization power word vectors gives you.
Emojify-V1 will perform poorly on sentences such as “This movie is not good and not enjoyable” because it doesn’t understand combinations of words–it just averages all the words’ embedding vectors together, without paying attention to the ordering of words. You will build a better algorithm in the next part.

实战代码2：用LSTM构建表情颜文字

# 1 导入
import numpy as np
np.random.seed(0)
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.initializers import glorot_uniform
np.random.seed(1)

# 2 输入预处理:填充
ef sentences_to_indices(X, word_to_index, max_len):
    """
    将X这些句子转化成特征向量并填充
    Arguments:
    X -- array of sentences (strings), of shape (m, 1)
    word_to_index -- a dictionary containing the each word mapped to its index
    max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. 
    
    Returns:
    X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)
    """    
    m = X.shape[0]                                   
    X_indices = np.zeros((m,max_len))
    
    for i in range(m):                               
        sentence_words = X[i].lower().split()
        j = 0
        for w in sentence_words:
            X_indices[i, j] = word_to_index[w]
            j = j+1
    return X_indices

# 3 词嵌入层
def pretrained_embedding_layer(word_to_vec_map, word_to_index):
    """
    Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
    
    Arguments:
    word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    embedding_layer -- pretrained layer Keras instance
    """
    
    vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)
    emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 50)
    emb_matrix = np.zeros((vocab_len,emb_dim))
    # Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
    for word, index in word_to_index.items():
        emb_matrix[index, :] = word_to_vec_map(word_to_index(index)) 
    embedding_layer = Embedding(input_dim = vocab_len,output_dim = emb_dim,trainable=False)
    # Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
    embedding_layer.build((None,))
    # Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
    embedding_layer.set_weights([emb_matrix])    
    return embedding_layer

# 4 构建模型
def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
    """
    Function creating the Emojify-v2 model's graph.
    
    Arguments:
    input_shape -- shape of the input, usually (max_len,)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    model -- a model instance in Keras
    """
    # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
    sentence_indices = Input(input_shape, dtype = 'int32')
    # Create the embedding layer pretrained with GloVe Vectors (≈1 line)
    embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
    # Propagate sentence_indices through your embedding layer, you get back the embeddings
    embeddings = embedding_layer(sentence_indices)
    # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a batch of sequences.
    X = LSTM(128, return_sequences = True)(embeddings)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X trough another LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a single hidden state, not a batch of sequences.
    X = LSTM(128, return_sequences = False)(embeddings)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
    X = Dense(5)(X)
    # Add a softmax activation
    X = Activation("softmax")(X)
    # Create Model instance which converts sentence_indices into X.
    model = Model(inputs = sentence_indices, outputs=X)
    return model
model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
model.summary()
"""_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 10)                0         
_________________________________________________________________
embedding_3 (Embedding)      (None, 10, 50)            20000050  
_________________________________________________________________
lstm_3 (LSTM)                (None, 10, 128)           91648     
_________________________________________________________________
dropout_3 (Dropout)          (None, 10, 128)           0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dropout_4 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 645       
_________________________________________________________________
activation_2 (Activation)    (None, 5)                 0         
=================================================================
Total params: 20,223,927
Trainable params: 223,877
Non-trainable params: 20,000,050
_________________________________________________________________"""

# 5 训练
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
Y_train_oh = convert_to_one_hot(Y_train, C = 5)
model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)
"""Epoch 50/50
132/132 [==============================] - 0s - loss: 0.0797 - acc: 0.9848     - ETA: 0s - loss: 0.0812 - acc: 0.984"""

# 6 评估
X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen)
Y_test_oh = convert_to_one_hot(Y_test, C = 5)
loss, acc = model.evaluate(X_test_indices, Y_test_oh)
print()
print("Test accuracy = ", acc)
"""Test accuracy =  0.925000008515"""
# This code allows you to see the mislabelled examples
C = 5
y_test_oh = np.eye(C)[Y_test.reshape(-1)]
X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen)
pred = model.predict(X_test_indices)
for i in range(len(X_test)):
    x = X_test_indices
    num = np.argmax(pred[i])
    if(num != Y_test[i]):
        print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip())
"""
Expected emoji:❤️ prediction: I love taking breaks	?
Expected emoji:? prediction: she is a bully	?
Expected emoji:? prediction: she said yes	?
Expected emoji:❤️ prediction: I love you to the stars and back	?
# Change the sentence below to see your prediction. Make sure all the words are in the Glove embeddings.  
x_test = np.array(['not feeling happy'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+  label_to_emoji(np.argmax(model.predict(X_test_indices))))
"""not feeling happy ?"""

我们可以学到

用keras框架输入每一个mini-batch必须保证X的长度一致才可向量化，但句子的长度往往不一致。因此我们：padding
学会如何创建embedding keras层。keras.layers.Embedding(vocab_len, sequence_length)
- step1:将整个X根据mini-batch切成列表indices
- step2:填充到max length
- step3:喂给embedding层即可 E维度为(400001,max_length)
- step4:生成对应的矩阵
If you have an NLP task where the training set is small, using word embeddings can help your algorithm significantly. Word embeddings allow your model to work on words in the test set that may not even have appeared in your training set.
Training sequence models in Keras (and in most other deep learning frameworks) requires a few important details:
- To use mini-batches, the sequences need to be padded so that all the examples in a mini-batch have the same length.
- An Embedding() layer can be initialized with pretrained values. These values can be either fixed or trained further on your dataset. If however your labeled dataset is small, it’s usually not worth trying to train a large pre-trained set of embeddings.
- LSTM() has a flag called return_sequences to decide if you would like to return every hidden states or only the last one.
- You can use Dropout() right after LSTM() to regularize your network.

a634238158

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
学习RNN-part2

RNN学习-词向量词嵌入层难学：实战我们会load预训练模型。本次代码你会学到3项：加载词嵌入层，并用余弦公式表达词相似度使用词嵌入层可解决词类analogy问题，例如会使模型基于man2woman，学习到king2?有些词嵌入层需要修改，避免政治正确实战代码：# 1 导入import numpy as npfrom w2v_utils import *words, word...
复制链接

扫一扫

专栏目录