Building a Recurrent Neural Network Step by Step

在本课程的首次作业中,您将使用numpy实现第一个递归神经网络(RNN)。RNN因其能记住过去信息而特别适用于自然语言处理和其他序列任务。本文将逐步介绍基本RNN和长短期记忆(LSTM)网络的前向传播,并提供可选的反向传播部分。
摘要由CSDN通过智能技术生成

Building your Recurrent Neural Network - Step by Step

Welcome to Course 5’s first assignment! In this assignment, you will implement your first Recurrent Neural Network in numpy.

Recurrent Neural Networks (RNN) are very effective for Natural Language Processing and other sequence tasks because they have “memory”. They can read inputs xt x ⟨ t ⟩ (such as words) one at a time, and remember some information/context through the hidden layer activations that get passed from one time-step to the next. This allows a uni-directional RNN to take information from the past to process later inputs. A bidirection RNN can take context from both the past and the future.

Notation:
- Superscript [l] [ l ] denotes an object associated with the lth l t h layer.
- Example: a[4] a [ 4 ] is the 4th 4 t h layer activation. W[5] W [ 5 ] and b[5] b [ 5 ] are the 5th 5 t h layer parameters.

  • Superscript (i) ( i ) denotes an object associated with the ith i t h example.

    • Example: x(i) x ( i ) is the ith i t h training example input.
  • Superscript t ⟨ t ⟩ denotes an object at the tth t t h time-step.

    • Example: xt x ⟨ t ⟩ is the input x at the tth t t h time-step. x(i)t x ( i ) ⟨ t ⟩ is the input at the tth t t h timestep of example i i .
  • Lowerscript i denotes the ith i t h entry of a vector.

    • Example: a[l]i a i [ l ] denotes the ith i t h entry of the activations in layer l l .

We assume that you are already familiar with numpy and/or have completed the previous courses of the specialization. Let’s get started!

Let’s first import all the packages that you will need during this assignment.

import numpy as np
from rnn_utils import *

1 - Forward propagation for the basic Recurrent Neural Network

Later this week, you will generate music using an RNN. The basic RNN that you will implement has the structure below. In this example, T x = T y .

Here’s how you can implement an RNN:

Steps:
1. Implement the calculations needed for one time-step of the RNN.
2. Implement a loop over Tx T x time-steps in order to process all the inputs, one at a time.

Let’s go!

1.1 - RNN cell

A Recurrent neural network can be seen as the repetition of a single cell. You are first going to implement the computations for a single time-step. The following figure describes the operations for a single time-step of an RNN cell.


Figure 2: Basic RNN cell. Takes as input xt x ⟨ t ⟩ (current input) and at1 a ⟨ t − 1 ⟩ (previous hidden state containing information from the past), and outputs at a ⟨ t ⟩ which is given to the next RNN cell and also used to predict yt y ⟨ t ⟩

Exercise: Implement the RNN-cell described in Figure (2).

Instructions:
1. Compute the hidden state with tanh activation: at=tanh(Waaat1+Waxxt+ba) a ⟨ t ⟩ = tanh ⁡ ( W a a a ⟨ t − 1 ⟩ + W a x x ⟨ t ⟩ + b a ) .
2. Using your new hidden state at a ⟨ t ⟩ , compute the prediction y^t=softmax(Wyaat+by) y ^ ⟨ t ⟩ = s o f t m a x ( W y a a ⟨ t ⟩ + b y ) . We provided you a function: softmax.
3. Store (at,at1,xt,parameters) ( a ⟨ t ⟩ , a ⟨ t − 1 ⟩ , x ⟨ t ⟩ , p a r a m e t e r s ) in cache
4. Return at a ⟨ t ⟩ , yt y ⟨ t ⟩ and cache

We will vectorize over m m examples. Thus, x t will have dimension (nx,m) ( n x , m ) , and at a ⟨ t ⟩ will have dimension (na,m) ( n a , m ) .

# GRADED FUNCTION: rnn_cell_forward

def rnn_cell_forward(xt, a_prev, parameters):
    """
    Implements a single forward step of the RNN-cell as described in Figure (2)

    Arguments:
    xt -- your input data at timestep "t", numpy array of shape (n_x, m).
    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias, numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    Returns:
    a_next -- next hidden state, of shape (n_a, m)
    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
    cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
    """

    # Retrieve parameters from "parameters"
    Wax = parameters["Wax"]
    Waa = parameters["Waa"]
    Wya = parameters["Wya"]
    ba = parameters["ba"]
    by = parameters["by"]

    ### START CODE HERE ### (≈2 lines)
    # compute next activation state using the formula given above
    a_next = np.tanh(np.dot(Wax,xt)+np.dot(Waa,a_prev)+ba)
    # compute output of the current cell using the formula given above
    yt_pred =softmax(np.dot(Wya,a_next)+by)
    ### END CODE HERE ###

    # store values you need for backward propagation in cache
    cache = (a_next, a_prev, xt, parameters)

    return a_next, yt_pred, cache
np.random.seed(1)
xt = np.random.randn(3,10)
a_prev = np.random.randn(5,10)
Waa = np.random.randn(5,5)
Wax = np.random.randn(5,3)
Wya = np.random.randn(2,5)
ba = np.random.randn(5,1)
by = np.random.randn(2,1)
parameters = {
  "Waa": Waa, "Wax": Wax, "Wya": Wya, "ba": ba, "by": by}

a_next, yt_pred, cache = rnn_cell_forward(xt, a_prev, parameters)
print("a_next[4] = ", a_next[4])
print("a_next.shape = ", a_next.shape)
print("yt_pred[1] =", yt_pred[1])
print("yt_pred.shape = ", yt_pred.shape)
a_next[4] =  [ 0.59584544  0.18141802  0.61311866  0.99808218  0.85016201  0.99980978
 -0.18887155  0.99815551  0.6531151   0.82872037]
a_next.shape =  (5, 10)
yt_pred[1] = [ 0.9888161   0.01682021  0.21140899  0.36817467  0.98988387  0.88945212
  0.36920224  0.9966312   0.9982559   0.17746526]
yt_pred.shape =  (2, 10)

Expected Output:

**a_next[4]**: [ 0.59584544 0.18141802 0.61311866 0.99808218 0.85016201 0.99980978 -0.18887155 0.99815551 0.6531151 0.82872037]
**a_next.shape**: (5, 10)
**yt[1]**: [ 0.9888161 0.01682021 0.21140899 0.36817467 0.98988387 0.88945212 0.36920224 0.9966312 0.9982559 0.17746526]
**yt.shape**: (2, 10)

1.2 - RNN forward pass

You can see an RNN as the repetition of the cell you’ve just built. If your input sequence of data is carried over 10 time steps, then you will copy the RNN cell 10 times. Each cell takes as input the hidden state from the previous cell ( at1 a ⟨ t − 1 ⟩ ) and the current time-step’s input data ( xt x ⟨ t ⟩ ). It outputs a hidden state ( at a ⟨ t ⟩ ) and a prediction ( yt y ⟨ t ⟩ ) for this time-step.


Figure 3: Basic RNN. The input sequence x=(x1,x2,...,x
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值