Andrew Ng Deep Learning 第五课第三周

最新推荐文章于 2022-01-24 23:16:38 发布

未知丶丶

最新推荐文章于 2022-01-24 23:16:38 发布

阅读量240

点赞数 1

分类专栏：深度学习文章标签：深度学习

本文链接：https://blog.csdn.net/qq_43310834/article/details/88769250

版权

深度学习专栏收录该内容

107 篇文章 13 订阅

订阅专栏

Andrew Ng Deep Learning 第五课第三周

前言
序列模型和注意力机制
编程作业
课后选择题

前言

网易云课堂（双语字幕，不卡）：https://mooc.study.163.com/smartSpec/detail/1001319001.htmcourseId=1004570029、
Coursera（贵）：https://www.coursera.org/specializations/deep-learning
本人初学者，先在网易云课堂上看网课，再去Coursera上做作业，开博客以记录，文章中引用图片皆为课程中所截。
题目转载至：http://www.cnblogs.com/hezhiyao/p/7810725.html
编程作业所需库：链接：https://pan.baidu.com/s/1aS1Oia2fskemBHHEMnSepw 密码：66gd

序列模型和注意力机制

seq2seq模型描述

机器翻译

在这里插入图片描述
Tips：将模型分为两块，一块为编码网络，一块为解码网络。编码网络为RNN，解码为语言模型与序列生成

Tips：机器翻译会给予很多种翻译可能，而取正确答案的方式就是选择概率最大的哪一种

beam搜索

beam搜索其实只是让计算机多留几种可能翻译的结果，最后再取概率最大的那种
在这里插入图片描述
Tips：B为集束带宽，是一个超参数

Tips：第一步：通过编码网络和解码网络得到词汇表中出现在第一个位置可能性最大的B个词，此处假设三个概率最高的词为in，jane，september，即为P(y^<1>|x)

Tips：第二步：在解码网络种，将第一步选择出来的词作为第一个时间步的输出，作为第二个时间步的输入来评估单词表种出现在第二个位置的单词中可能性最大的B个，即为P(y^<2>|x,“in”)
在这里插入图片描述
Tips：之后重复第二步，直到终止在句尾符号

beam搜索改进

在这里插入图片描述

Tips：对于beam搜索，每一步即为最大化↑，而机器翻译中取概率最大的情况中，由数学公式可知，P(y^<1>…y^<Ty>|x)=P(y^<1>|x)×P(y^<2>|x，y^<1>)×…

Tips：更改式↑，此处α为超参数（0-1之间）

beam搜索的误差分析

在这里插入图片描述

Tips：抽取出翻译错的样本，先用RNN求出得到算法翻译结果的概率和人类翻译结果的概率，进行对比，若人类翻译结果概率高，说明RNN得到的结果是正确的，但是beam搜索解码中未保留，则beam搜索误差，反之则是RNN得到的概率出错

bleu得分

在这里插入图片描述
Tips：bleu得分是为了评测算法翻译的好坏而定义的，举个例子，当unigram=2时，分子分解成the cat二次、cat the一次、cat on一次、on the一次、the mat一次，分母即为6，在参考中，the cat出现一次，cat on出现一次，on the出现一次，the mat出现一次，则分子为4，P=4/6，即bleu得分为6（该计算中计算分子的出现次数时，次数小于等于参考中出现该词的最大值）
在这里插入图片描述

Tips：原课件中的定义是错误的

注意力模型

在这里插入图片描述
Tips：将翻译原句输入，输入双向RNN，将左右两项a平列成a作为每个时间步的输出，α为注意力权重，c为新参数，由下层计算得到，作为输入，输入上层的RNN神经元

Tips：小型神经网络↑

图片描述

在这里插入图片描述
Tips：将模型分为两块，一块为编码网络，一块为解码网络。编码网络为Alexnet，解码也为语言模型与序列生成

语音辨识

在这里插入图片描述
Tips：同样使用注意力模型来进行语音辨识

Tips：一般情况下不是one to one，只要把序列中重复和blank删除（space空格不删除）就能得到语句翻译

触发字检测

在这里插入图片描述
Tips：相比基本语音辨识输出的字符，未检测到特殊触发字时RNN输出0，检测到时输出1就可以完成触发字检测

编程作业

from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply
from keras.layers import RepeatVector, Dense, Activation, Lambda
from keras.optimizers import Adam
from keras.utils import to_categorical
from keras.models import load_model, Model
import keras.backend as K
import numpy as np
 
from faker import Faker
import random
from tqdm import tqdm
from babel.dates import format_date
from nmt_utils import *
import matplotlib.pyplot as plt
%matplotlib inline
m = 10000
dataset, human_vocab, machine_vocab, inv_machine_vocab = load_dataset(m)
dataset[:10]
Tx = 30
Ty = 10
X, Y, Xoh, Yoh = preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty)
repeator = RepeatVector(Tx)
concatenator = Concatenate(axis=-1)
densor1 = Dense(10, activation = "tanh")
densor2 = Dense(1, activation = "relu")
activator = Activation(softmax, name='attention_weights') # We are using a custom softmax(axis = 1) loaded in this notebook
dotor = Dot(axes = 1)
def one_step_attention(a, s_prev):
    """
    执行一个注意步骤：输出作为注意权重的点积计算的上下文向量
   “alphas”和bi-lstm的隐藏状态“a”。
    参数：
    A——bi lstm的隐藏状态输出，numpy数组形状（m，tx，2*n_a）
    S_prev——前一个（注意后）lstm的隐藏状态，numpy数组形状（m，n_s）
    返回：
    context——上下文向量，下一个（尝试后）lstm单元的输入
    """
     
    ### START CODE HERE ###
    # Use repeator to repeat s_prev to be of shape (m, Tx, n_s) so that you can concatenate it with all hidden states "a" (≈ 1 line)
    s_prev = repeator(s_prev)
    # Use concatenator to concatenate a and s_prev on the last axis (≈ 1 line)
    concat = concatenator([a,s_prev])
    # Use densor1 to propagate concat through a small fully-connected neural network to compute the "intermediate energies" variable e. (≈1 lines)
    e = densor1(concat)
    # Use densor2 to propagate e through a small fully-connected neural network to compute the "energies" variable energies. (≈1 lines)
    energies = densor2(e)
    # Use "activator" on "energies" to compute the attention weights "alphas" (≈ 1 line)
    alphas = activator(energies)
    # Use dotor together with "alphas" and "a" to compute the context vector to be given to the next (post-attention) LSTM-cell (≈ 1 line)
    context = dotor([ alphas,a])
    ### END CODE HERE ###
     
    return context
n_a = 32
n_s = 64
post_activation_LSTM_cell = LSTM(n_s, return_state = True)
output_layer = Dense(len(machine_vocab), activation=softmax)
def model(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size):
    """
    参数：
    Tx——输入序列的长度
    ty——输出序列的长度
    n_a——bi lstm的隐藏状态大小
    n_s——关注后LSTM的隐藏状态大小
    human_vocab_size—python字典“human_vocab”的大小
    machine_vocab_size—python字典“machine_vocab”的大小
    返回：
    模型--KERS模型实例
    """
     
    # Define the inputs of your model with a shape (Tx,)
    # Define s0 and c0, initial hidden state for the decoder LSTM of shape (n_s,)
    X = Input(shape=(Tx, human_vocab_size))
    s0 = Input(shape=(n_s,), name='s0')
    c0 = Input(shape=(n_s,), name='c0')
    s = s0
    c = c0
     
    # Initialize empty list of outputs
    outputs = []
     
    ### START CODE HERE ###
     
    # Step 1: Define your pre-attention Bi-LSTM. Remember to use return_sequences=True. (≈ 1 line)
    a = Bidirectional(LSTM(n_a, return_sequences = True), input_shape = (m, Tx, n_a*2))(X)
     
    # Step 2: Iterate for Ty steps
    for t in range(Ty):
     
        # Step 2.A: Perform one step of the attention mechanism to get back the context vector at step t (≈ 1 line)
        context = one_step_attention(a, s)
         
        # Step 2.B: Apply the post-attention LSTM cell to the "context" vector.
        # Don't forget to pass: initial_state = [hidden state, cell state] (≈ 1 line)
        s, _, c = post_activation_LSTM_cell(context,initial_state = [s, c])
         
        # Step 2.C: Apply Dense layer to the hidden state output of the post-attention LSTM (≈ 1 line)
        out = output_layer(s)
         
        # Step 2.D: Append "out" to the "outputs" list (≈ 1 line)
        outputs.append(out)
     
    # Step 3: Create model instance taking three inputs and returning the list of outputs. (≈ 1 line)
    model = Model([X, s0, c0], outputs = outputs)
     
    ### END CODE HERE ###
     
    return model

model = model(Tx, Ty, n_a, n_s, len(human_vocab), len(machine_vocab))

### START CODE HERE ### (≈2 lines)
opt = Adam(lr = 0.005, beta_1 = 0.9, beta_2 = 0.999,decay = 0.01)  
model.compile(loss = 'categorical_crossentropy', optimizer = opt,metrics = ['accuracy']) 
### END CODE HERE ###

s0 = np.zeros((m, n_s))
c0 = np.zeros((m, n_s))
outputs = list(Yoh.swapaxes(0,1))

model.fit([Xoh, s0, c0], outputs, epochs=1, batch_size=100)

EXAMPLES = ['3 May 1979', '5 April 09', '21th of August 2016', 'Tue 10 Jul 2007', 'Saturday May 9 2018', 'March 3 2001', 'March 3rd 2001', '1 March 2001']
for example in EXAMPLES:
     
    source = string_to_int(example, Tx, human_vocab)
    source = np.array(list(map(lambda x: to_categorical(x, num_classes=len(human_vocab)), source))).swapaxes(0,1)
    prediction = model.predict([source, s0, c0])
    prediction = np.argmax(prediction, axis = -1)
    output = [inv_machine_vocab[int(i)] for i in prediction]
     
    print("source:", example)
    print("output:", ''.join(output))

课后选择题

在这里插入图片描述
Tips：输入x并不是概率，而是输入句子的每个字符（或者单词）

Tips：c->c,oo->o,o->o,kk->k,b->b.ooooo->o,oo->o,kkk->k

未知丶丶

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
4
评论
Andrew Ng Deep Learning 第五课第三周

Andrew Ng Deep Learning 第五课第三周前言序列模型和注意力机制前言网易云课堂（双语字幕，不卡）：https://mooc.study.163.com/smartSpec/detail/1001319001.htmcourseId=1004570029、Coursera（贵）：https://www.coursera.org/specializations/deep-l...
复制链接

扫一扫