Coursera 吴恩达DeepLearning.AI 第五课 sequence model 序列模型 第一周Building your Recurrent Neural Network - Ste

这是Coursera上吴恩达深度学习课程的第五课,主要介绍如何构建基本的循环神经网络(RNN)和长短期记忆网络(LSTM)。内容包括RNN单元的前向传播、LSTM网络的结构和门控机制,以及RNN和LSTM的反向传播过程。作业涉及实现RNN和LSTM的前向传播,并理解其在自然语言处理等序列任务中的应用。
摘要由CSDN通过智能技术生成

本周的习题有点多,主要是python不熟悉,然后时间不够,提醒说马上过期才开始看的视频,optional部分没有写完

Building your Recurrent Neural Network - Step by Step

Welcome to Course 5's first assignment! In this assignment, you will implement your first Recurrent Neural Network in numpy.

Recurrent Neural Networks (RNN) are very effective for Natural Language Processing and other sequence tasks because they have "memory". They can read inputs xtx⟨t⟩ (such as words) one at a time, and remember some information/context through the hidden layer activations that get passed from one time-step to the next. This allows a uni-directional RNN to take information from the past to process later inputs. A bidirection RNN can take context from both the past and the future.

Notation:

  • Superscript [l][l] denotes an object associated with the lthlth layer.

    • Example: a[4]a[4] is the 4th4th layer activation. W[5]W[5] and b[5]b[5] are the 5th5th layer parameters.
  • Superscript (i)(i) denotes an object associated with the ithith example.

    • Example: x(i)x(i) is the ithith training example input.
  • Superscript t⟨t⟩ denotes an object at the tthtth time-step.

    • Example: xtx⟨t⟩ is the input x at the tthtth time-step. x(i)tx(i)⟨t⟩ is the input at the tthtth timestep of example ii.
  • Lowerscript ii denotes the ithith entry of a vector.

    • Example: a[l]iai[l] denotes the ithith entry of the activations in layer ll.

We assume that you are already familiar with numpy and/or have completed the previous courses of the specialization. Let's get started!

Let's first import all the packages that you will need during this assignment.

In [ ]:
import numpy as np
from rnn_utils import *

1 - Forward propagation for the basic Recurrent Neural Network

Later this week, you will generate music using an RNN. The basic RNN that you will implement has the structure below. In this example, Tx=TyTx=Ty.

Figure 1: Basic RNN model

Here's how you can implement an RNN:

Steps:

  1. Implement the calculations needed for one time-step of the RNN.
  2. Implement a loop over TxTx time-steps in order to process all the inputs, one at a time.

Let's go!

1.1 - RNN cell

A Recurrent neural network can be seen as the repetition of a single cell. You are first going to implement the computations for a single time-step. The following figure describes the operations for a single time-step of an RNN cell.

Figure 2: Basic RNN cell. Takes as input  xtx⟨t⟩ (current input) and  at1a⟨t−1⟩ (previous hidden state containing information from the past), and outputs  ata⟨t⟩ which is given to the next RNN cell and also used to predict  yty⟨t⟩

Exercise: Implement the RNN-cell described in Figure (2).

Instructions:

  1. Compute the hidden state with tanh activation: at=tanh(Waaat1+Waxxt+ba)a⟨t⟩=tanh⁡(Waaa⟨t−1⟩+Waxx⟨t⟩+ba).
  2. Using your new hidden state ata⟨t⟩, compute the prediction ŷ t=softmax(Wyaat+by)y^⟨t⟩=softmax(Wyaa⟨t⟩+by). We provided you a function: softmax.
  3. Store (at,at1,xt,parameters)(a⟨t⟩,a⟨t−1⟩,x⟨t⟩,parameters) in cache
  4. Return ata⟨t⟩ , yty⟨t⟩ and cache

We will vectorize over mm examples. Thus, xtx⟨t⟩ will have dimension (nx,m)(nx,m), and ata⟨t⟩ will have dimension (na,m)(na,m).

In [ ]:
# GRADED FUNCTION: rnn_cell_forward
def rnn_cell_forward(xt, a_prev, parameters):
    """
    Implements a single forward step of the RNN-cell as described in Figure (2)
    Arguments:
    xt -- your input data at timestep "t", numpy array of shape (n_x, m).
    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias, numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    Returns:
    a_next -- next hidden state, of shape (n_a, m)
    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
    cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
    """
    
    # Retrieve parameters from "parameters"
    Wax = parameters["Wax"]
    Waa = parameters["Waa"]
    Wya = parameters["Wya"]
    ba = parameters["ba"]
    by = parameters["by"]
    
    ### START CODE HERE ### (≈2 lines)
    # compute next activation state using the formula given above
    a_next = None
    # compute output of the current cell using the formula given above
    yt_pred = None   
    ### END CODE HERE ###
    
    # store values you need for backward propagation in cache
    cache = (a_next, a_prev, xt, parameters)
    
    return a_next, yt_pred, cache
In [ ]:
np.random.seed(1)
xt = np.random.randn(3,10)
a_prev = np.random.randn(5,10)
Waa = np.random.randn(5,5)
Wax = np.random.randn(5,3)
Wya = np.random.randn(2,5)
ba = np.random.randn(5,1)
by = np.random.randn(2,1)
parameters = {
                "Waa": Waa, "Wax": Wax, "Wya": Wya, "ba": ba, "by": by}
a_next, yt_pred, cache = rnn_cell_forward(xt, a_prev, parameters)
print("a_next[4] = ", a_next[4])
print("a_next.shape = ", a_next.shape)
print("yt_pred[1] =", yt_pred[1])
print("yt_pred.shape = ", yt_pred.shape)

Expected Output:

a_next[4]: [ 0.59584544 0.18141802 0.61311866 0.99808218 0.85016201 0.99980978 -0.18887155 0.99815551 0.6531151 0.82872037]
a_next.shape: (5, 10)
yt[1]: [ 0.9888161 0.01682021 0.21140899 0.36817467 0.98988387 0.88945212 0.36920224 0.9966312 0.9982559 0.17746526]
yt.shape: (2, 10)

1.2 - RNN forward pass

You can see an RNN as the repetition of the cell you've just built. If your input sequence of data is carried over 10 time steps, then you will copy the RNN cell 10 times. Each cell takes as input the hidden state from the previous cell (at1a⟨t−1⟩) and the current time-step's input data (xtx⟨t⟩). It outputs a hidden state (ata⟨t⟩) and a prediction (yty⟨t⟩) for this time-step.

Figure 3: Basic RNN. The input sequence  x=(x1,x2,...,xTx)x=(x⟨1⟩,x⟨2⟩,...,x⟨Tx⟩) is carried over  TxTx time steps. The network outputs  y=(y1,y2,...,yTx)y=(y⟨1⟩,y⟨2⟩,...,y⟨Tx⟩).

Exercise: Code the forward propagation of the RNN described in Figure (3).

Instructions:

  1. Create a vector of zeros (aa) that will store all the hidden states computed by the RNN.
  2. Initialize the "next" hidden state as a0a0 (initial hidden state).
  3. Start looping over each time step, your incremental index is tt :
    • Update the "next" hidden state and the cache by running rnn_cell_forward
    • Store the "next" hidden state in aa (
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值