Andrew Ng Deep Learning 第五课第二周

最新推荐文章于 2021-02-14 21:09:22 发布

未知丶丶

最新推荐文章于 2021-02-14 21:09:22 发布

阅读量271

点赞数 1

分类专栏：深度学习文章标签：深度学习

本文链接：https://blog.csdn.net/qq_43310834/article/details/88726700

版权

深度学习专栏收录该内容

107 篇文章 13 订阅

订阅专栏

Andrew Ng Deep Learning 第五课第二周

前言
自然语言处理
课后选择题

前言

网易云课堂（双语字幕，不卡）：https://mooc.study.163.com/smartSpec/detail/1001319001.htmcourseId=1004570029、
Coursera（贵）：https://www.coursera.org/specializations/deep-learning
本人初学者，先在网易云课堂上看网课，再去Coursera上做作业，开博客以记录，文章中引用图片皆为课程中所截。
题目转载至：http://www.cnblogs.com/hezhiyao/p/7810725.html
编程作业所需库：链接：https://pan.baidu.com/s/1aS1Oia2fskemBHHEMnSepw 密码：66gd

自然语言处理

词嵌入

在这里插入图片描述
Tips：假如训练时能够判断出orange和apple是同一类词，将很容易在blank上填上需要的juice

Tips：相较于上一周说的one-hot向量，可以采用词嵌入的方式标记每一个单词，简单来说就是定一系列的特征，对于每个单词进行特征标记，这样类似的词（orange和apple）的一些特征量就会比较接近

在这里插入图片描述
Tips：将特征浓缩成2D形式画图能发现类似的单词在2D图中距离较近，即成一个个组合

类比推理

在这里插入图片描述

Tips：e为词嵌入向量（即特征向量），要做到类比推理，说明A->B,C->D类似，则嵌入向量之差相似度要高，此处的sim为相似函数

Tips：相似函数sim(a,b)即为求a与b向量中夹角的余弦值

嵌入矩阵E

Tips：根据上文说的嵌入向量E，对于每个词都有，可以运用一个新建立的矩阵和词汇表的one-hot向量来计算
在这里插入图片描述
Tips：假定每个词有m个特征值，即所求的单个单词的词嵌入向量为(m,1)，假定词汇表存入n个值，则该词的one-hot向量为(n,1)，E矩阵为(m,n)，能做到要求哪个词的时候就使用E矩阵乘该单词的one-hot向量，得到(m,1)词嵌入向量

学习词嵌入

在这里插入图片描述
Tips：对句子的每个单词结合词嵌入得到各自的词嵌入向量，经过网络和softmax输出目标blank中的词，为了防止参数过多，可以仅仅采用前k个单词的词嵌入向量输入网络

Tips：而为了目标词采用的文本可以采用前k个词和周围k个词的形式

编程作业

import numpy as np
import w2v_utils
words, word_to_vec_map = w2v_utils.read_glove_vecs('data/glove.6B.50d.txt')
def cosine_similarity(u, v):
    """
    u与v的余弦相似度反映了u与v的相似程度
    
    参数：
        u -- 维度为(n,)的词向量
        v -- 维度为(n,)的词向量
        
    返回：
        cosine_similarity -- 由上面公式定义的u和v之间的余弦相似度。
    """
    distance = 0.0
     
    ### START CODE HERE ###
    # Compute the dot product between u and v (≈1 line)
    dot=np.dot(u,v)
    # Compute the L2 norm of u (≈1 line)
    unorm=np.sqrt(np.sum(np.square(u)))
     
    # Compute the L2 norm of v (≈1 line)
    vnorm=np.sqrt(np.sum(np.square(v)))
    # Compute the cosine similarity defined by formula (1) (≈1 line)
    cosine_similarity=dot/(unorm*vnorm)
    ### END CODE HERE ###
     
    return cosine_similarity
def complete_analogy(word_a, word_b, word_c, word_to_vec_map):
    """
    解决“A与B相比就类似于C与____相比一样”之类的问题
    
    参数：
        word_a -- 一个字符串类型的词
        word_b -- 一个字符串类型的词
        word_c -- 一个字符串类型的词
        word_to_vec_map -- 字典类型，单词到GloVe向量的映射
        
    返回：
        best_word -- 满足(v_b - v_a) 最接近 (v_best_word - v_c) 的词
    """
     # convert words to lower case
    word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()
     
    ### START CODE HERE ###
    # Get the word embeddings v_a, v_b and v_c (≈1-3 lines)
    v_a,v_b,v_c=word_to_vec_map[word_a],word_to_vec_map[word_b],word_to_vec_map[word_c]
    ### END CODE HERE ###
     
    words = word_to_vec_map.keys()
    max_cosine_sim = -100              # Initialize max_cosine_sim to a large negative number
    best_word = None                   # Initialize best_word with None, it will help keep track of the word to output
 
    # loop over the whole word vector set
    for w in words:        
        # to avoid best_word being one of the input words, pass on them.
        if w in [word_a, word_b, word_c] :
            continue
         
        ### START CODE HERE ###
        # Compute cosine similarity between the vector (e_b - e_a) and the vector ((w's vector representation) - e_c)  (≈1 line)
        similarity=cosine_similarity(v_b-v_a,word_to_vec_map[w]-v_c)  
        # If the cosine_sim is more than the max_cosine_sim seen so far,
            # then: set the new max_cosine_sim to the current cosine_sim and the best_word to the current word (≈3 lines)
        if (similarity>max_cosine_sim):
            max_cosine_sim=similarity
            best_word=w
        ### END CODE HERE ###
         
    return best_word

Word2Vec

在这里插入图片描述
Tips：在有内容（context）的情况下选择目标词（target）来优化模型，即先求出内容的词嵌入向量，根据此向量输入网络，由softmax输出yhat（y预测值），再有代价函数来对网络和词嵌入进行优化

Tips：求softmax中每个值的方法计算量很大，此处θ为参数矩阵

负采样

在这里插入图片描述

Tips：在决定了内容和目标（即同时出现在一句话）的情况下，标记为1，再从单词表中选出k个目标词和内容为0的情况（即不常出现在同一句话）

Tips：相较于上文的softmax，这里采用sigmoid方式，将词汇表的10000个单词当成10000个logistic分类器，仅仅对1个正样本和k个负样本总共k+1个样本进行二元分类进行训练，每次训练随机选择k个不同的负样本，这样也能得到相应的yhat
在这里插入图片描述
Tips：随机采取负样本的方式↑

Glove词向量

在这里插入图片描述
Tips：此处的X_ij为单词i和单词j同时出现在一个句子里的次数

情绪分类

在这里插入图片描述
Tips：此处即为每个单词作为词嵌入向量x输入到RNN中进行many to one的输出训练

编程作业

import numpy as np
import emo_utils
import emoji
import matplotlib.pyplot as plt

%matplotlib inline
X_train, Y_train = emo_utils.read_csv('data/train_emoji.csv')
X_test, Y_test = emo_utils.read_csv('data/test.csv')

maxLen = len(max(X_train, key=len).split())

Y_oh_train = emo_utils.convert_to_one_hot(Y_train, C=5)
Y_oh_test = emo_utils.convert_to_one_hot(Y_test, C=5)
word_to_index, index_to_word, word_to_vec_map = emo_utils.read_glove_vecs('data/glove.6B.50d.txt')
def sentence_to_avg(sentence, word_to_vec_map):
    """
    将句子转换为单词列表，提取其GloVe向量，然后将其平均。
    
    参数：
        sentence -- 字符串类型，从X中获取的样本。
        word_to_vec_map -- 字典类型，单词映射到50维的向量的字典
        
    返回：
        avg -- 对句子的均值编码，维度为(50,)
    """
    ### START CODE HERE ###
    # Step 1: Split sentence into list of lower case words (≈ 1 line)
              # words 是list类型的
    words=(sentence.lower()).split()
    # Initialize the average word vector, should have the same shape as your word vectors.
    avg=np.zeros([50,])
     
    # Step 2: average the word vectors. You can loop over the words in the list "words".
    for w in words:
        avg += word_to_vec_map[w]
    avg = np.divide(avg, len(words)) 
    ### END CODE HERE ###
     
    return avg
def model(X, Y, word_to_vec_map, learning_rate=0.01, num_iterations=400):
    """
    在numpy中训练词向量模型。
    
    参数：
        X -- 输入的字符串类型的数据，维度为(m, 1)。
        Y -- 对应的标签，0-7的数组，维度为(m, 1)。
        word_to_vec_map -- 字典类型的单词到50维词向量的映射。
        learning_rate -- 学习率.
        num_iterations -- 迭代次数。
        
    返回：
        pred -- 预测的向量，维度为(m, 1)。
        W -- 权重参数，维度为(n_y, n_h)。
        b -- 偏置参数，维度为(n_y,)
    """
    np.random.seed(1)
 
    # Define number of training examples
    m = Y.shape[0]                          # number of training examples
    n_y = 5                                 # number of classes  
    n_h = 50                                # dimensions of the GloVe vectors 
     
    # Initialize parameters using Xavier initialization
    W = np.random.randn(n_y, n_h) / np.sqrt(n_h)
    b = np.zeros((n_y,))
     
    # Convert Y to Y_onehot with n_y classes
    Y_oh = emo_utils.convert_to_one_hot(Y, C = n_y)      # (m,n_y)
     
    # Optimization loop
    for t in range(num_iterations):                       # Loop over the number of iterations
        for i in range(m):                                # Loop over the training examples
             
            ### START CODE HERE ### (≈ 4 lines of code)
            # Average the word vectors of the words from the i'th training example
            avg=sentence_to_avg(X[i], word_to_vec_map)
 
            # Forward propagate the avg through the softmax layer
            z=np.dot(W,avg)+b
            a=emo_utils.softmax(z)
            # Compute cost using the i'th training label's one hot representation and "A" (the output of the softmax)
            cost=-np.sum(Y_oh[i]*np.log(a))
            ### END CODE HERE ###
             
            # Compute gradients 
            dz = a - Y_oh[i]
            dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
            db = dz
 
            # Update parameters with Stochastic Gradient Descent
            W = W - learning_rate * dW
            b = b - learning_rate * db
         
        if t % 100 == 0:
            print("Epoch: " + str(t) + " --- cost = " + str(cost))
            pred = emo_utils.predict(X, Y, W, b, word_to_vec_map)
 
    return pred, W, b

keras版本

import numpy as np
np.random.seed(0)
import keras
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence

np.random.seed(1)
from keras.initializers import glorot_uniform
def sentences_to_indices(X, word_to_index, max_len):
    """
    输入的是X（字符串类型的句子的数组），再转化为对应的句子列表，
    输出的是能够让Embedding()函数接受的列表或矩阵（参见图4）。
    
    参数：
        X -- 句子数组，维度为(m, 1)
        word_to_index -- 字典类型的单词到索引的映射
        max_len -- 最大句子的长度，数据集中所有的句子的长度都不会超过它。
        
    返回：
        X_indices -- 对应于X中的单词索引数组，维度为(m, max_len)
    """
    m = X.shape[0]                                   # number of training examples
     
    ### START CODE HERE ###
    # Initialize X_indices as a numpy matrix of zeros and the correct shape (≈ 1 line)
    X_indices=np.zeros(([m,max_len]))
     
    for i in range(m):                               # loop over training examples
         
        # Convert the ith training sentence in lower case and split is into words. You should get a list of words.
        sentence_words=(X[i].lower()).split() 
         
        # Initialize j to 0
        j=0
         
        # Loop over the words of sentence_words
        for w in sentence_words:
            # Set the (i,j)th entry of X_indices to the index of the correct word.
            X_indices[i][j]=word_to_index[w]
            # Increment j to j + 1
            j+=1
             
    ### END CODE HERE ###
     
    return X_indices
def pretrained_embedding_layer(word_to_vec_map, word_to_index):
    """
    创建Keras Embedding()层，加载已经训练好了的50维GloVe向量
    
    参数：
        word_to_vec_map -- 字典类型的单词与词嵌入的映射
        word_to_index -- 字典类型的单词到词汇表（400,001个单词）的索引的映射。
        
    返回：
        embedding_layer() -- 训练好了的Keras的实体层。
    """
    vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)
    emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 50)
     
    ### START CODE HERE ###
    # Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim)
    emb_matrix=np.zeros([vocab_len,emb_dim])
     
    # Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
    for word, index in word_to_index.items():
        emb_matrix[index][:]=word_to_vec_map[word]
 
    # Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False. 
    embedding_layer=Embedding(vocab_len, emb_dim, trainable=False)
    ### END CODE HERE ###
 
    # Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
    embedding_layer.build((None,))
     
    # Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
    embedding_layer.set_weights([emb_matrix])
     
    return embedding_layer
def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
    """
    实现Emojify-V2模型的计算图
    
    参数：
        input_shape -- 输入的维度，通常是(max_len,)
        word_to_vec_map -- 字典类型的单词与词嵌入的映射。
        word_to_index -- 字典类型的单词到词汇表（400,001个单词）的索引的映射。
    
    返回：
        model -- Keras模型实体
    """
    ### START CODE HERE ###
    # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
    sentence_indices=Input(input_shape,dtype='int32')
     
    # Create the embedding layer pretrained with GloVe Vectors (≈1 line)
    embedding_layer=pretrained_embedding_layer(word_to_vec_map, word_to_index)
     
    # Propagate sentence_indices through your embedding layer, you get back the embeddings
    embeddings = embedding_layer(sentence_indices)
     
    # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a batch of sequences.
    X = LSTM(128, return_sequences=True)(embeddings)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X trough another LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a single hidden state, not a batch of sequences.
    X = LSTM(128, return_sequences=False)(X)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
    X = Dense(5)(X)
    # Add a softmax activation
    X = Activation('softmax')(X)
     
    # Create Model instance which converts sentence_indices into X.
    model = Model(inputs=sentence_indices, outputs=X)
     
    ### END CODE HERE ###
     
    return model
max_Len=10
model = Emojify_V2((max_Len,), word_to_vec_map, word_to_index)
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
Y_train_oh = emo_utils.convert_to_one_hot(Y_train, C = 5)
model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)
X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen)
Y_test_oh = emo_utils.convert_to_one_hot(Y_test, C = 5)
loss, acc = model.evaluate(X_test_indices, Y_test_oh)

print("Test accuracy = ", acc)

词嵌入除偏

在这里插入图片描述
Tips：第一步。定下我们需要除偏的方向。第二步。将偏见方向中一些可能带有偏见特征（比如性别）的词加上特征情况。第三步。将一些类似的词平衡在一起

课后选择题

在这里插入图片描述

未知丶丶

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Andrew Ng Deep Learning 第五课第二周

Andrew Ng Deep Learning 第五课第二周前言自然语言处理词嵌入类比推理嵌入矩阵E学习词嵌入编程作业Word2Vec负采样Glove词向量情绪分类编程作业词嵌入除偏课后选择题前言网易云课堂（双语字幕，不卡）：https://mooc.study.163.com/smartSpec/detail/1001319001.htmcourseId=1004570029、Cours...
复制链接

扫一扫