本周的习题有点多,主要是python不熟悉,然后时间不够,提醒说马上过期才开始看的视频,optional部分没有写完
Building your Recurrent Neural Network - Step by Step
Welcome to Course 5's first assignment! In this assignment, you will implement your first Recurrent Neural Network in numpy.
Recurrent Neural Networks (RNN) are very effective for Natural Language Processing and other sequence tasks because they have "memory". They can read inputs x⟨t⟩x⟨t⟩ (such as words) one at a time, and remember some information/context through the hidden layer activations that get passed from one time-step to the next. This allows a uni-directional RNN to take information from the past to process later inputs. A bidirection RNN can take context from both the past and the future.
Notation:
Superscript [l][l] denotes an object associated with the lthlth layer.
- Example: a[4]a[4] is the 4th4th layer activation. W[5]W[5] and b[5]b[5] are the 5th5th layer parameters.
Superscript (i)(i) denotes an object associated with the ithith example.
- Example: x(i)x(i) is the ithith training example input.
Superscript ⟨t⟩⟨t⟩ denotes an object at the tthtth time-step.
- Example: x⟨t⟩x⟨t⟩ is the input x at the tthtth time-step. x(i)⟨t⟩x(i)⟨t⟩ is the input at the tthtth timestep of example ii.
Lowerscript ii denotes the ithith entry of a vector.
- Example: a[l]iai[l] denotes the ithith entry of the activations in layer ll.
We assume that you are already familiar with numpy
and/or have completed the previous courses of the specialization. Let's get started!
Let's first import all the packages that you will need during this assignment.
import numpy as np
from rnn_utils import *
Here's how you can implement an RNN:
Steps:
- Implement the calculations needed for one time-step of the RNN.
- Implement a loop over TxTx time-steps in order to process all the inputs, one at a time.
Let's go!
1.1 - RNN cell
A Recurrent neural network can be seen as the repetition of a single cell. You are first going to implement the computations for a single time-step. The following figure describes the operations for a single time-step of an RNN cell.
Exercise: Implement the RNN-cell described in Figure (2).
Instructions:
- Compute the hidden state with tanh activation: a⟨t⟩=tanh(Waaa⟨t−1⟩+Waxx⟨t⟩+ba)a⟨t⟩=tanh(Waaa⟨t−1⟩+Waxx⟨t⟩+ba).
- Using your new hidden state a⟨t⟩a⟨t⟩, compute the prediction ŷ ⟨t⟩=softmax(Wyaa⟨t⟩+by)y^⟨t⟩=softmax(Wyaa⟨t⟩+by). We provided you a function:
softmax
. - Store (a⟨t⟩,a⟨t−1⟩,x⟨t⟩,parameters)(a⟨t⟩,a⟨t−1⟩,x⟨t⟩,parameters) in cache
- Return a⟨t⟩a⟨t⟩ , y⟨t⟩y⟨t⟩ and cache
We will vectorize over mm examples. Thus, x⟨t⟩x⟨t⟩ will have dimension (nx,m)(nx,m), and a⟨t⟩a⟨t⟩ will have dimension (na,m)(na,m).
# GRADED FUNCTION: rnn_cell_forward
def rnn_cell_forward(xt, a_prev, parameters):
"""
Implements a single forward step of the RNN-cell as described in Figure (2)
Arguments:
xt -- your input data at timestep "t", numpy array of shape (n_x, m).
a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
parameters -- python dictionary containing:
Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
ba -- Bias, numpy array of shape (n_a, 1)
by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
Returns:
a_next -- next hidden state, of shape (n_a, m)
yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
"""
# Retrieve parameters from "parameters"
Wax = parameters["Wax"]
Waa = parameters["Waa"]
Wya = parameters["Wya"]
ba = parameters["ba"]
by = parameters["by"]
### START CODE HERE ### (≈2 lines)
# compute next activation state using the formula given above
a_next = None
# compute output of the current cell using the formula given above
yt_pred = None
### END CODE HERE ###
# store values you need for backward propagation in cache
cache = (a_next, a_prev, xt, parameters)
return a_next, yt_pred, cache
np.random.seed(1)
xt = np.random.randn(3,10)
a_prev = np.random.randn(5,10)
Waa = np.random.randn(5,5)
Wax = np.random.randn(5,3)
Wya = np.random.randn(2,5)
ba = np.random.randn(5,1)
by = np.random.randn(2,1)
parameters = {
"Waa": Waa, "Wax": Wax, "Wya": Wya, "ba": ba, "by": by}
a_next, yt_pred, cache = rnn_cell_forward(xt, a_prev, parameters)
print("a_next[4] = ", a_next[4])
print("a_next.shape = ", a_next.shape)
print("yt_pred[1] =", yt_pred[1])
print("yt_pred.shape = ", yt_pred.shape)
Expected Output:
a_next[4]: | [ 0.59584544 0.18141802 0.61311866 0.99808218 0.85016201 0.99980978 -0.18887155 0.99815551 0.6531151 0.82872037] |
a_next.shape: | (5, 10) |
yt[1]: | [ 0.9888161 0.01682021 0.21140899 0.36817467 0.98988387 0.88945212 0.36920224 0.9966312 0.9982559 0.17746526] |
yt.shape: | (2, 10) |
1.2 - RNN forward pass
You can see an RNN as the repetition of the cell you've just built. If your input sequence of data is carried over 10 time steps, then you will copy the RNN cell 10 times. Each cell takes as input the hidden state from the previous cell (a⟨t−1⟩a⟨t−1⟩) and the current time-step's input data (x⟨t⟩x⟨t⟩). It outputs a hidden state (a⟨t⟩a⟨t⟩) and a prediction (y⟨t⟩y⟨t⟩) for this time-step.
Exercise: Code the forward propagation of the RNN described in Figure (3).
Instructions:
- Create a vector of zeros (aa) that will store all the hidden states computed by the RNN.
- Initialize the "next" hidden state as a0a0 (initial hidden state).
- Start looping over each time step, your incremental index is tt :
- Update the "next" hidden state and the cache by running
rnn_cell_forward
- Store the "next" hidden state in aa (
- Update the "next" hidden state and the cache by running