学习RNN-part2

RNN学习-词向量

词嵌入层难学:实战我们会load预训练模型。本次代码你会学到3项:

  • 加载词嵌入层,并用余弦公式表达词相似度
  • 使用词嵌入层可解决词类analogy问题,例如会使模型基于man2woman,学习到king2?
  • 有些词嵌入层需要修改,避免政治正确

实战代码:

# 1 导入
import numpy as np
from w2v_utils import *
words, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')

# 2 已知嵌入矩阵,计算词之间的相似度
def cosine_similarity(u, v):
    dot = np.dot(u, v)
    norm_u = np.linalg.norm(u)
    norm_v = np.linalg.norm(v)
    cosine_similarity = dot / norm_u / norm_v
    return cosine_similarity
# 2.1 找两个词的相似度
father = word_to_vec_map["father"]
mother = word_to_vec_map["mother"]
print("cosine_similarity(father, mother) = ", cosine_similarity(father, mother))  # 0.89
# 2.2 已知合适词组,找一个词最般配的另一个词
def complete_analogy(word_a, word_b, word_c, word_to_vec_map):
    word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()
    e_a, e_b, e_c = word_to_vec_map[word_a], word_to_vec_map[word_b], word_to_vec_map[word_c]
    words = word_to_vec_map.keys()
    max_cosine_sim = -100              
    best_word = None                   
    for w in words:        
        if w in [word_a, word_b, word_c] :
            continue
        cosine_sim = cosine_similarity((e_b - e_a), (word_to_vec_map[w] - e_c))
        if cosine_sim > max_cosine_sim:
            max_cosine_sim = cosine_sim
            best_word = w
    return best_word
triads_to_try = [('italy', 'italian', 'spain'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('small', 'smaller', 'large')]
for triad in triads_to_try:
    print ('{} -> {} :: {} -> {}'.format( *triad, complete_analogy(*triad,word_to_vec_map)))
'''italy -> italian :: spain -> spanish
india -> delhi :: japan -> tokyo
man -> woman :: boy -> girl
small -> smaller :: large -> larger'''

RNN学习-情感分析

通过嵌入层,将输入的文字附加上emoji表情+原文字输出

在这里插入图片描述

实战代码1-用basicRNN构建表情颜文字

# 1 导入
import numpy as np
from emo_utils import *
import emoji
import matplotlib.pyplot as plt
X_train, Y_train = read_csv('data/train_emoji.csv') # m=127
X_test, Y_test = read_csv('data/tesss.csv') # m=56
maxLen = len(max(X_train, key=len).split())
# 1.1 预览一下
index = 1
print(X_train[index], label_to_emoji(Y_train[index]))
"""I am proud of your achievements ?"""
# 1.2 预处理:Y变成(m,5)独热码
Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)
# 1.3 预览数据
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt') # 400,001words
word = "cucumber"
index = 289846
print("the index of", word, "in the vocabulary is", word_to_index[word])
print("the", str(index) + "th word in the vocabulary is", index_to_word[index])
"""the index of cucumber in the vocabulary is 113317
the 289846th word in the vocabulary is potatos"""

# 2 实现模型
# 2.1 处理输入词向量
def sentence_to_avg(sentence, word_to_vec_map):
    """
    提取句子中每个词的GloVe representation然后累加/句子长度作为句子的特征向量"""
    words = sentence.lower().split()
    avg = np.zeros(50,)
    for w in words:
        avg += word_to_vec_map[w]
    avg = avg / len(words)
    return avg
# 2.2 构建basicRNN模型
def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):
    """
    Arguments:
    X -- shape (m, 1)
    Y -- shape (m, 1)
    
    Returns:
    pred -- vector of predictions, numpy-array of shape (m, 1)
    W -- weight matrix of the softmax layer, of shape (n_y, n_h)
    b -- bias of the softmax layer, of shape (n_y,)
    """
    np.random.seed(1)
    m = Y.shape[0]                          # number of training examples
    n_y = 5                                 # number of classes  
    n_h = 50                                # dimensions of the GloVe vectors 
    W = np.random.randn(n_y, n_h) / np.sqrt(n_h)
    b = np.zeros((n_y,))
    Y_oh = convert_to_one_hot(Y, C = n_y) 
    # Optimization loop
    for t in range(num_iterations):                       
        for i in range(m):                                
            avg = sentence_to_avg(X[i],word_to_vec_map)
            z = np.dot(W,avg) + b
            a = softmax(z)
            cost = -1 * np.multiply(Y[i],np.log(a))
            dz = a - Y_oh[i]
            dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
            db = dz
            W = W - learning_rate * dW
            b = b - learning_rate * db       
        if t % 100 == 0:
            print("Epoch: " + str(t) + " --- cost = " + str(cost))
            pred = predict(X, Y, W, b, word_to_vec_map)
    return pred, W, b
# 2.3 开始训练
pred, W, b = model(X_train, Y_train, word_to_vec_map)
'''Epoch: 0 --- cost = [ 2.82117539  2.22537435  3.90409976  3.65077617  4.17192113]
Accuracy: 0.348484848485
Epoch: 100 --- cost = [  7.39085514   6.39666398   0.15943637   9.61056197  11.77782592]
Accuracy: 0.931818181818
Epoch: 200 --- cost = [  7.86956435   7.883712     0.08912738  11.25652113  13.75952996]
Accuracy: 0.954545454545
Epoch: 300 --- cost = [  8.06494045   8.67838712   0.06864535  12.0741376   14.92485916]
Accuracy: 0.969696969697'''
# 2.4 检验模型成果
print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)
'''Training set:
Accuracy: 0.977272727273
Test set:
Accuracy: 0.857142857143'''
X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy"])
Y_my_labels = np.array([[0], [0], [2], [1], [4],[3]])
pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)
print_predictions(X_my_sentences, pred)
'''
Accuracy: 0.833333333333

i adore you ❤️
i love you ❤️
funny lol ?
lets play with a ball ⚾
food is ready ?
not feeling happy ?'''

Amazing! Because adore has a similar embedding as love, the algorithm has generalized correctly even to a word it has never seen before. Words such as heart, dear, beloved or adore have embedding vectors similar to love, and so might work too.

What you should remember from this part:

  • Even with a 127 training examples, you can get a reasonably good model for Emojifying. This is due to the generalization power word vectors gives you.
  • Emojify-V1 will perform poorly on sentences such as “This movie is not good and not enjoyable” because it doesn’t understand combinations of words–it just averages all the words’ embedding vectors together, without paying attention to the ordering of words. You will build a better algorithm in the next part.

实战代码2:用LSTM构建表情颜文字

# 1 导入
import numpy as np
np.random.seed(0)
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.initializers import glorot_uniform
np.random.seed(1)

# 2 输入预处理:填充
ef sentences_to_indices(X, word_to_index, max_len):
    """
    将X这些句子转化成特征向量并填充
    Arguments:
    X -- array of sentences (strings), of shape (m, 1)
    word_to_index -- a dictionary containing the each word mapped to its index
    max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. 
    
    Returns:
    X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)
    """    
    m = X.shape[0]                                   
    X_indices = np.zeros((m,max_len))
    
    for i in range(m):                               
        sentence_words = X[i].lower().split()
        j = 0
        for w in sentence_words:
            X_indices[i, j] = word_to_index[w]
            j = j+1
    return X_indices

# 3 词嵌入层
def pretrained_embedding_layer(word_to_vec_map, word_to_index):
    """
    Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
    
    Arguments:
    word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    embedding_layer -- pretrained layer Keras instance
    """
    
    vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)
    emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 50)
    emb_matrix = np.zeros((vocab_len,emb_dim))
    # Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
    for word, index in word_to_index.items():
        emb_matrix[index, :] = word_to_vec_map(word_to_index(index)) 
    embedding_layer = Embedding(input_dim = vocab_len,output_dim = emb_dim,trainable=False)
    # Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
    embedding_layer.build((None,))
    # Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
    embedding_layer.set_weights([emb_matrix])    
    return embedding_layer

# 4 构建模型
def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
    """
    Function creating the Emojify-v2 model's graph.
    
    Arguments:
    input_shape -- shape of the input, usually (max_len,)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    model -- a model instance in Keras
    """
    # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
    sentence_indices = Input(input_shape, dtype = 'int32')
    # Create the embedding layer pretrained with GloVe Vectors (≈1 line)
    embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
    # Propagate sentence_indices through your embedding layer, you get back the embeddings
    embeddings = embedding_layer(sentence_indices)
    # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a batch of sequences.
    X = LSTM(128, return_sequences = True)(embeddings)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X trough another LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a single hidden state, not a batch of sequences.
    X = LSTM(128, return_sequences = False)(embeddings)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
    X = Dense(5)(X)
    # Add a softmax activation
    X = Activation("softmax")(X)
    # Create Model instance which converts sentence_indices into X.
    model = Model(inputs = sentence_indices, outputs=X)
    return model
model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
model.summary()
"""_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 10)                0         
_________________________________________________________________
embedding_3 (Embedding)      (None, 10, 50)            20000050  
_________________________________________________________________
lstm_3 (LSTM)                (None, 10, 128)           91648     
_________________________________________________________________
dropout_3 (Dropout)          (None, 10, 128)           0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dropout_4 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 645       
_________________________________________________________________
activation_2 (Activation)    (None, 5)                 0         
=================================================================
Total params: 20,223,927
Trainable params: 223,877
Non-trainable params: 20,000,050
_________________________________________________________________"""

# 5 训练
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
Y_train_oh = convert_to_one_hot(Y_train, C = 5)
model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)
"""Epoch 50/50
132/132 [==============================] - 0s - loss: 0.0797 - acc: 0.9848     - ETA: 0s - loss: 0.0812 - acc: 0.984"""

# 6 评估
X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen)
Y_test_oh = convert_to_one_hot(Y_test, C = 5)
loss, acc = model.evaluate(X_test_indices, Y_test_oh)
print()
print("Test accuracy = ", acc)
"""Test accuracy =  0.925000008515"""
# This code allows you to see the mislabelled examples
C = 5
y_test_oh = np.eye(C)[Y_test.reshape(-1)]
X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen)
pred = model.predict(X_test_indices)
for i in range(len(X_test)):
    x = X_test_indices
    num = np.argmax(pred[i])
    if(num != Y_test[i]):
        print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip())
"""
Expected emoji:❤️ prediction: I love taking breaks	?
Expected emoji:? prediction: she is a bully	?
Expected emoji:? prediction: she said yes	?
Expected emoji:❤️ prediction: I love you to the stars and back	?
# Change the sentence below to see your prediction. Make sure all the words are in the Glove embeddings.  
x_test = np.array(['not feeling happy'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+  label_to_emoji(np.argmax(model.predict(X_test_indices))))
"""not feeling happy ?"""

我们可以学到

  • keras框架输入每一个mini-batch必须保证X的长度一致才可向量化,但句子的长度往往不一致。因此我们:padding
  • 学会如何创建embedding keras层keras.layers.Embedding(vocab_len, sequence_length)
    • step1:将整个X根据mini-batch切成列表indices
    • step2:填充到max length
    • step3:喂给embedding层即可 E维度为(400001,max_length)
    • step4:生成对应的矩阵
  • If you have an NLP task where the training set is small, using word embeddings can help your algorithm significantly. Word embeddings allow your model to work on words in the test set that may not even have appeared in your training set.
  • Training sequence models in Keras (and in most other deep learning frameworks) requires a few important details:
    • To use mini-batches, the sequences need to be padded so that all the examples in a mini-batch have the same length.
    • An Embedding() layer can be initialized with pretrained values. These values can be either fixed or trained further on your dataset. If however your labeled dataset is small, it’s usually not worth trying to train a large pre-trained set of embeddings.
    • LSTM() has a flag called return_sequences to decide if you would like to return every hidden states or only the last one.
    • You can use Dropout() right after LSTM() to regularize your network.
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
### 回答1: CNN-RNN-CTC是一种用于语音识别的深度学习模型。这个模型结合了卷积神经网络(CNN),循环神经网络(RNN)和连续标签分类(CTC)的算法。 首先,卷积神经网络(CNN)被用来从原始语音信号中提取特征。CNN通过一系列卷积和池化操作,可以有效地捕捉到语音信号中的时频特征。这些特征在后续的处理中起到了很重要的作用。 其次,循环神经网络(RNN)在特征提取后的序列数据上进行处理。RNN具有记忆功能,可以处理变长的序列数据。这使得RNN能够更好地建模语音信号的时序关系,从而提高语音识别的性能。 最后,连续标签分类(CTC)是一种解决无对齐标签序列训练问题的方法。在语音识别中,输入序列和输出序列之间的对齐是未知的,这使得传统的监督学习方法难以应用。CTC通过引入一个空白标签和重复标签,可以将输入序列的输出序列映射到最有可能的标签序列。通过优化CTC损失函数,我们可以训练模型来进行语音识别,并且不需要进行手工的对齐。 总而言之,CNN-RNN-CTC模型将卷积神经网络的特征提取能力,循环神经网络的序列建模能力和连续标签分类的对齐能力相结合,能够有效地解决语音识别中的训练问题,提高语音识别的性能。 ### 回答2: CNN-RNN-CTC是一种常用的深度学习模型,适用于序列标注任务,如语音识别或文本识别。该模型结合了卷积神经网络(CNN)、循环神经网络(RNN)和连续条件随机场(CTC)的优势。 首先,CNN经常被用于图像处理任务,能够有效提取图像特征。在CNN-RNN-CTC模型中,CNN用来对输入的声学特征或图像进行特征提取,将其转化为更适合序列任务的形式。 其次,RNN是一种能够处理序列数据的神经网络,能够捕捉到数据的时间依赖关系。在CNN-RNN-CTC模型中,RNN用来对CNN提取的特征进行进一步处理,从而得到更加准确的序列标注结果。 最后,CTC是一种解决序列对齐问题的方法。在CNN-RNN-CTC模型中,CTC用来实现无对齐标签的序列学习,可以自动进行对齐和标注的训练。它中的条件随机场层可以根据输入序列和标签序列之间的对应关系,计算出最可能的标签序列。 综上所述,CNN-RNN-CTC模型能够利用CNN提取输入的特征,RNN处理序列数据,CTC解决标签对齐问题,从而有效地解决序列标注任务。在语音识别或文本识别等方面有较好的应用效果。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值