《深度学习——Andrew Ng》第五课第一周编程作业_1_Building a RNN Step by Step

最新推荐文章于 2022-01-03 20:22:52 发布

sinat_34022298

最新推荐文章于 2022-01-03 20:22:52 发布

阅读量1.2k

点赞数

分类专栏：深度学习 DeepLearning NG 文章标签：深度学习神经网络 RNN

本文链接：https://blog.csdn.net/sinat_34022298/article/details/80139886

版权

本文介绍了《深度学习——Andrew Ng》课程中关于RNN的第一周编程作业，探讨了RNN如何模拟人类的思维方式，并概述了时间序列模型的结构，包括一对一、一对多和多对多应用场景。同时，强调了RNN前向、后向传播的计算在实现过程中的重要性。

摘要由CSDN通过智能技术生成

平时大多数时间做图片相关的事情，所以本来没计划学习RNN的；后来想了想，反正都学到了CNN，就再坚持一下，把RNN也看了把，看完之后感觉很神奇，和CNN不一样的算法，感觉像是人的不同思维方式，感想吴大大的深入浅出，讲课和作业都很好。

序列模型总共三周：

第一周循环序列模型
第二周自然语言处理与词嵌入
第三周序列模型和注意力机制

RNN

关于RNN网络网上有很多教程，我就用课上的笔记做一个概要的总结吧。这里的RNN是循环神经网络（Recurrent Neural Network）。

“每次的输出Y，与前一轮a和本轮x有关，而输给下一轮神经元的a也和前一轮a和本轮x有关。”这个很像我们说话或者阅读的前后结合思维方式，所以RNN是这种前后以时间顺序连接的一个个神经元。
这里写图片描述

时间序列模型有多种结构，如下图：
这里写图片描述

一对一的；

一对多：音乐生成模型；

多对多：1、语句词义词性检测；2、不同语言翻译。

RNN程序

像之前构建 DNN 网络一样，这个作业的重点在于前向、后向传播（主要是求导）的计算。

import numpy as np
from rnn_utils import *


# GRADED FUNCTION: rnn_cell_forward
def rnn_cell_forward(xt, a_prev, parameters):
    """
    Implements a single forward step of the RNN-cell as described in Figure (2)

    Arguments:
    xt -- your input data at timestep "t", numpy array of shape (n_x, m).
    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias, numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    Returns:
    a_next -- next hidden state, of shape (n_a, m)
    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
    cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
    """

    # Retrieve parameters from "parameters"
    Wax = parameters["Wax"]
    Waa = parameters["Waa"]
    Wya = parameters["Wya"]
    ba = parameters["ba"]
    by = parameters["by"]

    ### START CODE HERE ### (≈2 lines)
    # compute next activation state using the formula given above
    a_next = np.tanh( np.dot(Waa, a_prev) + np.dot(Wax, xt) + ba )
    # compute output of the current cell using the formula given above
    yt_pred = softmax( np.dot(Wya, a_next) + by )
    ### END CODE HERE ###

    # store values you need for backward propagation in cache
    cache = (a_next, a_prev, xt, parameters)

    return a_next, yt_pred, cache


# GRADED FUNCTION: rnn_forward
def rnn_forward(x, a0, parameters):
    """
    Implement the forward propagation of the recurrent neural network described in Figure (3).

    Arguments:
    x -- Input data for every time-step, of shape (n_x, m, T_x).
    a0 -- Initial hidden state, of shape (n_a, m)
    parameters -- python dictionary containing:
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)

    Returns:
    a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
    y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
    caches -- tuple of values needed for the backward pass, contains (list of caches, x)
    """

    # Initialize "caches" which will contain the list of all caches
    caches = []

    # Retrieve dimensions from shapes of x and Wy
    n_x, m, T_x = x.shape                   # n_x：每个样本每个时刻的向量长度； m：样本个数； T_x：时间维度
    n_y, n_a = parameters["Wya"].shape      # 参数是共享的，所以Wya只有两个维度

    ### START CODE HERE ###

    # initialize "a" and "y" with zeros (≈2 lines)
    a = np.zeros((n_a, m, T_x))
    y_pred = np.zeros((n_y, m, T_x))

    # Initialize a_next (≈1 line)
    a_next = a0

    # loop over all time-steps
    for t in range(T_x):
        # Update next hidden state, compute the prediction, get the cache (≈1 line)
        a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t], a_next, parameters)
        # Save the value of the new "next" hidden state in a (≈1 line)
        a[:,:,t] = a_next
        # Save the value of the prediction in y (≈1 line)
        y_pred[:,:,t] = yt_pred
        # Append "cache" to "caches" (≈1 line)
        caches.append(cache)

    ### END CODE HERE ###

    # store values needed for backward propagation in cache
    caches = (caches, x)

    return a, y_pred, caches



# GRADED FUNCTION: lstm_cell_forward
def lstm_cell_forward(xt, a_prev, c_prev, parameters):
    """
    Implement a single forward step of the LSTM-cell as described in Figure (4)

    Arguments:
    xt -- your input data at timestep "t", numpy array of shape (n_x, m).
    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
    c_prev -- Memory state at timestep "t-1", numpy array of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
                        bf -- Bias of the forget gate, numpy array of