Building your Recurrent Neural Network - Step by Step
Welcome to Course 5’s first assignment! In this assignment, you will implement your first Recurrent Neural Network in numpy.
Recurrent Neural Networks (RNN) are very effective for Natural Language Processing and other sequence tasks because they have “memory”. They can read inputs x⟨t⟩ x ⟨ t ⟩ (such as words) one at a time, and remember some information/context through the hidden layer activations that get passed from one time-step to the next. This allows a uni-directional RNN to take information from the past to process later inputs. A bidirection RNN can take context from both the past and the future.
Notation:
- Superscript [l] [ l ] denotes an object associated with the lth l t h layer.
- Example: a[4] a [ 4 ] is the 4th 4 t h layer activation. W[5] W [ 5 ] and b[5] b [ 5 ] are the 5th 5 t h layer parameters.
Superscript (i) ( i ) denotes an object associated with the ith i t h example.
- Example: x(i) x ( i ) is the ith i t h training example input.
Superscript ⟨t⟩ ⟨ t ⟩ denotes an object at the tth t t h time-step.
- Example: x⟨t⟩ x ⟨ t ⟩ is the input x at the tth t t h time-step. x(i)⟨t⟩ x ( i ) ⟨ t ⟩ is the input at the tth t t h timestep of example i i .
Lowerscript denotes the ith i t h entry of a vector.
- Example: a[l]i a i [ l ] denotes the ith i t h entry of the activations in layer l l .
We assume that you are already familiar with numpy and/or have completed the previous courses of the specialization. Let’s get started!
Let’s first import all the packages that you will need during this assignment.
import numpy as np
from rnn_utils import *
1 - Forward propagation for the basic Recurrent Neural Network
Later this week, you will generate music using an RNN. The basic RNN that you will implement has the structure below. In this example, .

Here’s how you can implement an RNN:
Steps:
1. Implement the calculations needed for one time-step of the RNN.
2. Implement a loop over Tx T x time-steps in order to process all the inputs, one at a time.
Let’s go!
1.1 - RNN cell
A Recurrent neural network can be seen as the repetition of a single cell. You are first going to implement the computations for a single time-step. The following figure describes the operations for a single time-step of an RNN cell.
Exercise: Implement the RNN-cell described in Figure (2).
Instructions:
1. Compute the hidden state with tanh activation: a⟨t⟩=tanh(Waaa⟨t−1⟩+Waxx⟨t⟩+ba) a ⟨ t ⟩ = tanh ( W a a a ⟨ t − 1 ⟩ + W a x x ⟨ t ⟩ + b a ) .
2. Using your new hidden state a⟨t⟩ a ⟨ t ⟩ , compute the prediction y^⟨t⟩=softmax(Wyaa⟨t⟩+by) y ^ ⟨ t ⟩ = s o f t m a x ( W y a a ⟨ t ⟩ + b y ) . We provided you a function: softmax.
3. Store (a⟨t⟩,a⟨t−1⟩,x⟨t⟩,parameters) ( a ⟨ t ⟩ , a ⟨ t − 1 ⟩ , x ⟨ t ⟩ , p a r a m e t e r s ) in cache
4. Return a⟨t⟩ a ⟨ t ⟩ , y⟨t⟩ y ⟨ t ⟩ and cache
We will vectorize over m m examples. Thus, will have dimension (nx,m) ( n x , m ) , and a⟨t⟩ a ⟨ t ⟩ will have dimension (na,m) ( n a , m ) .
# GRADED FUNCTION: rnn_cell_forward
def rnn_cell_forward(xt, a_prev, parameters):
"""
Implements a single forward step of the RNN-cell as described in Figure (2)
Arguments:
xt -- your input data at timestep "t", numpy array of shape (n_x, m).
a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
parameters -- python dictionary containing:
Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
ba -- Bias, numpy array of shape (n_a, 1)
by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
Returns:
a_next -- next hidden state, of shape (n_a, m)
yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
"""
# Retrieve parameters from "parameters"
Wax = parameters["Wax"]
Waa = parameters["Waa"]
Wya = parameters["Wya"]
ba = parameters["ba"]
by = parameters["by"]
### START CODE HERE ### (≈2 lines)
# compute next activation state using the formula given above
a_next = np.tanh(np.dot(Wax,xt)+np.dot(Waa,a_prev)+ba)
# compute output of the current cell using the formula given above
yt_pred =softmax(np.dot(Wya,a_next)+by)
### END CODE HERE ###
# store values you need for backward propagation in cache
cache = (a_next, a_prev, xt, parameters)
return a_next, yt_pred, cache
np.random.seed(1)
xt = np.random.randn(3,10)
a_prev = np.random.randn(5,10)
Waa = np.random.randn(5,5)
Wax = np.random.randn(5,3)
Wya = np.random.randn(2,5)
ba = np.random.randn(5,1)
by = np.random.randn(2,1)
parameters = {
"Waa": Waa, "Wax": Wax, "Wya": Wya, "ba": ba, "by": by}
a_next, yt_pred, cache = rnn_cell_forward(xt, a_prev, parameters)
print("a_next[4] = ", a_next[4])
print("a_next.shape = ", a_next.shape)
print("yt_pred[1] =", yt_pred[1])
print("yt_pred.shape = ", yt_pred.shape)
a_next[4] = [ 0.59584544 0.18141802 0.61311866 0.99808218 0.85016201 0.99980978
-0.18887155 0.99815551 0.6531151 0.82872037]
a_next.shape = (5, 10)
yt_pred[1] = [ 0.9888161 0.01682021 0.21140899 0.36817467 0.98988387 0.88945212
0.36920224 0.9966312 0.9982559 0.17746526]
yt_pred.shape = (2, 10)
Expected Output:
| **a_next[4]**: | [ 0.59584544 0.18141802 0.61311866 0.99808218 0.85016201 0.99980978 -0.18887155 0.99815551 0.6531151 0.82872037] |
| **a_next.shape**: | (5, 10) |
| **yt[1]**: | [ 0.9888161 0.01682021 0.21140899 0.36817467 0.98988387 0.88945212 0.36920224 0.9966312 0.9982559 0.17746526] |
| **yt.shape**: | (2, 10) |
1.2 - RNN forward pass
You can see an RNN as the repetition of the cell you’ve just built. If your input sequence of data is carried over 10 time steps, then you will copy the RNN cell 10 times. Each cell takes as input the hidden state from the previous cell ( a⟨t−1⟩ a ⟨ t − 1 ⟩ ) and the current time-step’s input data ( x⟨t⟩ x ⟨ t ⟩ ). It outputs a hidden state ( a⟨t⟩ a ⟨ t ⟩ ) and a prediction ( y⟨t⟩ y ⟨ t ⟩ ) for this time-step.

在本课程的首次作业中,您将使用numpy实现第一个递归神经网络(RNN)。RNN因其能记住过去信息而特别适用于自然语言处理和其他序列任务。本文将逐步介绍基本RNN和长短期记忆(LSTM)网络的前向传播,并提供可选的反向传播部分。
最低0.47元/天 解锁文章
3117

被折叠的 条评论
为什么被折叠?



