1.RNN-- 使用numpy构建RNN单元

最新推荐文章于 2024-06-06 17:01:20 发布

xf8964

最新推荐文章于 2024-06-06 17:01:20 发布

阅读量1.6k

点赞数 3

分类专栏： RNN

本文链接：https://blog.csdn.net/xf8964/article/details/90524276

版权

RNN 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

使用numpy构建RNN单元

0. 使用numpy构建一个简单的RNN
1.什么是RNN
2.构建一个RNN
3. LSTM

0. 使用numpy构建一个简单的RNN

import numpy as np
timesteps = 10
input_features = 4
output_features = 8

inputs = np.random.random((timesteps, input_features))
print('inputs shape is ', inputs.shape)
# print(inputs)

state_t = np.zeros((output_features,))
print('state_t shape is ', state_t.shape)
# print(state_t)
W = np.random.random((output_features, input_features))
print('W shape is ', W.shape)
U = np.random.random((output_features, output_features))
print('U shape is ', U.shape)
b = np.random.random((output_features, ))
print('b shape is ', b.shape)
successive_outputs = []
for input_t in inputs:
    output_t = np.tanh(np.dot(W, input_t) +  np.dot(U, state_t) +b)
    successive_outputs.append(output_t)
    state_t = output_t

inal_output_sequence = np.stack(successive_outputs, axis=0)

print(inal_output_sequence[0])

1.什么是RNN

我们在训练深度神经网络的时候，不同的数据之间没有必然的联系，比如预测房价，前一个房子的特征数据不影响下一座房子的特征数据，并且两个房子的价格互不影响，影响其价格的只有其自己的特征数据（地理位置，面积大小，房间个数，等条件），但是当我们在训练预测天气，或者处理文本，语言翻译的时候，我们就要考虑数据在时间轴上的关系，比如今天的气候会对明天的天气造成影响，文本的上下文影响，前半句的意思会对下半句造成影响。这里有个共同的特点就是上个时间步的数据会对下个时间步的输出造成影响，这个时候就得考虑循环神经网络(RNN)了
在这里插入图片描述
每个时间步的输入对应一个时间步的输出，其中一个方框就是一个RNN单元，前一个时间步（t-1）的隐形状态会对下一个时间步(t)造成影响，我们把隐藏状态计做 $a^{<t>}$ ，每个RNN单元都一个输入 $x^{<t>}$ ，输出 $y^{<t>}$ ，输出状态 $a^{<t>}$ ，上个时间步的输出状态作为下个时间步的状态输入

2.构建一个RNN

接下来我们使用numpy来实现RNN

2.1 RNN call

一个循环神经网络可以看成是重复一个单个的单元。首先你必须执行一个单个时间步的计算。下面的图片描述了在一个时间步执行一个RNN单元的操作
在这里插入图片描述

$x^{<t>}$ ：当前输入
$a^{<t-1>}$ ：包含过去信息的上一个单元的隐藏状态
$a^{<t>}$ ：输出状态，下一个RNN单元的输入状态
$y^{<t>}$ ：预测结果

练习：执行如上图描述的RNN单元

介绍

使用双曲线激活函数计算隐藏状态 $a^{t} = tanh(W_{aa} a^{t-1} + W_{ax}x^{t} +b_a)$
使用上一步获得的隐藏状态 $a^{<t>}$ , 计算预测 $\hat{y}^{<t>} =softmax(W_{ya} a^{<t>} + b_y)$ 。这里使用 softmax激活函数
Store（ $a^{<t>} , a^{<t-1>}, x^{<t>}, parameters$ ) in cache
返回 $a^{<t>}, y^{<t>}$ 并保存

我们要向量化 m 例子，例如， $x^{<t>}$ 的形状为（ $n_x$ , m）, and $a^{<t>}$ 的形状为（ $n_a$ , m）的矩阵

def rnn_cell_forward(xt, a_prev, parameters):
    """
    Implements a single forward step of the RNN-cell as described in Figure (2)
    
    Arguments:
    xt -- your input data at timestep 't' , numpy array of shape (n_x, m).
    a_prev -- Hidden state at timestep 't', numpy array of shape (n_a, m).
    parameters -- python dictionary containing:
                            Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                            Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                            Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                            ba --  Bias, numpy array of shape (n_a, 1)
                            by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    Returns:
    a_next -- next hidden state, of shape (n_a, m)
    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
    cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
    """
    # Retrieve parameters from "parameters"
    Wax = parameters["Wax"]
    Waa = parameters["Waa"]
    Wya = parameters["Wya"]
    ba = parameters["ba"]
    by = parameters["by"]
    
    # compute next activation state using the formula given above
    a_next = np.tanh(np.dot(Waa, a_prev) + np.dot(Wax, xt) + ba)
    # compute output of the current cell using the formula given above
    yt_pred = softmax(np.dot(Wya, a_next) + by)
    
    # store values you need for backward propagation in cache
    cache = (a_next, a_prev, xt, parameters)
    
    return a_next, yt_pred, cache

np.random.seed(1)
xt = np.random.randn(3,10)
print('xt shape ', xt.shape)
a_prev = np.random.randn(5,10)
print('a_prev shape ', a_prev.shape)
Waa = np.random.randn(5,5)
print('Waa shape ', Waa.shape)
Wax = np.random.randn(5,3)
print('Wax shape ', Wax.shape)
Wya = np.random.randn(2,5)
print('Wya shape ', Wya.shape)
ba = np.random.randn(5,1)
print('ba shape ', ba.shape)
by = np.random.randn(2,1)
print('by shape ', by.shape)

parameters = {"Waa": Waa, "Wax": Wax, "Wya": Wya, "ba": ba, "by": by}
# print(parameters)

a_next, yt_pred, cache = rnn_cell_forward(xt, a_prev, parameters)

print('a_next shape = ', a_next.shape)
print('yt_pred shape = ', yt_pred.shape)

print("a_next[4] = ", a_next[4])
print("a_next.shape = ", a_next.shape)
print("yt_pred[1] =", yt_pred[1])
print("yt_pred.shape = ", yt_pred.shape)

2.2 RNN 前向传播

理解三维数组

可以将三维数组理解为一个立方体，我们将一个3x3的三维数组放到一个立方体中，x[0]表示为立方体的的所有第0行，理解为立方体的水平的最上面的一层面，x[0][0] 表示第0层中的第0列
对于三维数组的理解可以参考numpy中三维数据的理解

import numpy as np
x = np.arange(27)
print(x)
x = np.reshape(x, (3,3,3))
print('(行，列， 通道)', x.shape)
print(x)
print('第0行', x[0])

print(x[:, :, 0]) # 表示第0纵深切面

还可以先看一下numpy中不同维度的数组相乘
如下图，我们可以通过循环使用单个RNN单元构成一个RNN。如果你要输入一个包含10个时间序列的数据，你需要复制RNN单元10次。每一个单元的隐藏状态输入（ $a^{<t-1>}$ ）都是上一个单元的隐藏状态输出，并且当前输入为 $x^{<y>}$ 。他的输出隐藏状态是 $a^{<t>}$ ，预测是 $y^{<t>}$
在这里插入图片描述

输入序列： $x = (x^{(1)}, x^{(2)} , ........ , x^{(T_x)})$
输出： $y = (y^{(1)}, y^{(2)} , ........ , y^{(T_x)})$

练习代码实现的前向传播，就如上图描述的RNN网络

介绍

定义一个全零向量 a ，用来保存RNN计算的隐藏状态
初始化下一个隐藏状态为 $a_0$
开始按照时间步循环，步进索引为时间 t
- 通过函数 rnn_cell_forward更新 next 隐藏状态和记忆缓存(cache)
- 保存 next 隐藏状态到 a （ $t^{<th>}$ position）
- 保存预测到 y
- 添加缓存记忆到 caches
返回 a，y 和 caches

def rnn_forward(x, a0, parameters):
    """
    Implement the forward propagation of the recurrent neural network described in Figure (3).
    Arguments:
    x -- Input data for every time-step, of shape (n_x, m, T_x).
    a0 -- Initial hidden state, of shape (n_a, m)
    parameters -- python dictionary containing:
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)

    Returns:
    a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
    y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
    caches -- tuple of values needed for the backward pass, contains (list of caches, x)
    """
    # Initialize "caches" which will contain the list of all caches
    caches = []
    
    # Retrieve dimensions from shapes of x and parameters["Wya"]
    n_x, m, T_x = x.shape
    n_y, n_a = parameters["Wya"].shape
    
    # initialize "a" and "y" with zeros (≈2 lines)
    a = np.zeros((n_a, m, T_x))
    y_pred = np.zeros((n_y, m, T_x))
    
    # Initialize a_next (≈1 line)
    a_next = a0
    
    # loop over all time-step:
    for t in range(T_x):
        # Update next hidden state, compute the prediction, get the cache (≈1 line)
        a_next, yt_pred, cache = rnn_cell_forward(x[:, :, t], a_next, parameters)
        # save the value of the new "next " hidden state in a 
        a[:, :, t] = a_next
        # Save the value of the prediction in y (≈1 line)
        y_pred[:, :, t] = yt_pred
        # append "cache" to "caches"
        caches.append(cache)
    # store values needed for backward propagation in cache
    caches = (caches, x)
    return a, y_pred, caches

np.random.seed(1)
x = np.random.randn(3,10,4)
a0 = np.random.randn(5,10)
Waa = np.random.randn(5,5)
Wax = np.random.randn(5,3)
Wya = np.random.randn(2,5)
ba = np.random.randn(5,1)
by = np.random.randn(2,1)
parameters = {"Waa": Waa, "Wax": Wax, "Wya": Wya, "ba": ba, "by": by}

a, y_pred, caches = rnn_forward(x, a0, parameters)

print("a[4][1] = ", a[4][1])
print("a.shape = ", a.shape)
print("y_pred[1][3] =", y_pred[1][3])
print("y_pred.shape = ", y_pred.shape)
print("caches[1][1][3] =", caches[1][1])
print("len(caches) = ", len(caches))

3. LSTM

3.1 什么是LSTM

下面的图片展示了LSTM单元
在这里插入图片描述

$a^{<t-1>}$ 短记忆
$c^{<t-1>}$ 长记忆

3.2 遗忘门

为了说明这个例子，假设我们正在从一段文字中读取一写单词，并且我们想使用 LSTM 来解析并存储语法结构，比如主语是单数还是复数。如果主语从单数变成了复数，我们就得找到一个方法来忘记存储的主语的单复数性质的记忆值。在一个LSTM中遗忘门如下

遗忘权重： $\Gamma ^{<t>}_f = \sigma(W_f[a^{<t-1>} , x^{<t>} ] + b_f)$

遗忘门： $\Gamma ^{<t>}_f * c^{<t -1 >}$

这里 $W_f$ 是权重，决定遗忘门能忘记多少。我们结合矩阵 $a^{<t-1>} , x^{<t>}]$ ，并将结果和 $W_f$ 进行矩阵相乘，等式结果$\Gamma ^{}_f $是一个向量，他们的值在 0 到 1 之间。这个遗忘门结果，就是前面经过得到的向量，将之与上一个 L S T M 的状态$ c^{} $（就是长记忆）进行矩阵的元素相乘。如果$ \Gamma ^{}_f $向量中的一个元素为 0 ，那么就说明要遗忘$ c^{}$相对应的记忆。如果是1，就保存记忆

3.3 更新门

一旦我们忘记了说讨论的主语是单数，我们就需要找到一个方法来更新新的主语属性问新的复数，接下来就是一个更新们

学习权重
$\Gamma ^{<t>}_u = \sigma(W_u[a^{<t-1>} , x^{<t>} ] + b_u)$

和遗忘门类似，$\Gamma ^{}_u $也是一个0-1的向量，为了计算 $c^{<t>}$ ，他将和 $\tilde{c}^{<t>}$ 元素级别矩阵乘积，

学习门

为了更新新的主语属性，我们需要将上个LSTM的短期记忆（ $a^{<t-1>}$ ）和当前输入结合，并计算出当前的学习到的内容，公式如下

$\tilde{c}^{<t>} = tanh(W_c[a^{<t-1>} , x^{<t>} ] + b_c)$

更新门

我们将上面的公式结合，根据遗忘后记忆的内容和新学习到的内容得到更新后的内容，得到下一个LSTM的长记忆，这个计算成为更新门

$c^{<t>} = \Gamma ^{<t>}_f *c^{<t-1>} + \Gamma ^{<t>}_u * \tilde{c}^{<t>}$

3.4 输出门

为了得到下一个LSTM的短记忆（ $a^{<t-1>}$ ），我们需要需要使用以下公式来得到

输出权重

$\Gamma ^{<t>}_o = \sigma(W_o[a^{<t-1>} , x^{<t>} ] + b_o)$

输出短记忆

$a^{<t>} = \Gamma ^{<t>}_o * tanh(c^{<t>})$

3.5 LSTM cell

还可以先看一下numpy中不同维度的数组相乘
介绍

结合 $a^{<t-1>} 和 x^{<t>}$ 称为一个单独的矩阵：concat =
$\left[ \begin{array}{ccc} a^{<t-1>} \\ x^{<t>} \\ \end{array}\right]$

def lstm_cell_forward(xt, a_prev, c_prev, parameters):
    """
    实现一个单独的LSTM单元
    Arguments:
    xt -- your input data at timestep "t", numpy array of shape (n_x, m).
    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
    c_prev -- Memory state at timestep "t-1", numpy array of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
                        bf -- Bias of the forget gate, numpy array of shape (n_a, 1)
                        Wi -- Weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)
                        bi -- Bias of the update gate, numpy array of shape (n_a, 1)
                        Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)
                        bc --  Bias of the first "tanh", numpy array of shape (n_a, 1)
                        Wo -- Weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x)
                        bo --  Bias of the output gate, numpy array of shape (n_a, 1)
                        Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)

    Returns:
    a_next -- next hidden state, of shape (n_a, m)
    c_next -- next memory state, of shape (n_a, m)
    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
    cache -- tuple of values needed for the backward pass, contains (a_next, c_next, a_prev, c_prev, xt, parameters)

    Note: ft/it/ot stand for the forget/update/output gates, cct stands for the candidate value (c tilde),
          c stands for the memory value
    
    """
    # 提取权重参数
    Wf = parameters["Wf"]
    bf = parameters["bf"]
    Wi = parameters["Wi"]
    bi = parameters["bi"]
    Wc = parameters["Wc"]
    bc = parameters["bc"]
    Wo = parameters["Wo"]
    bo = parameters["bo"]
    Wy = parameters["Wy"]
    by = parameters["by"]
    
    # 获得输入xt 和Wy的形状
    n_x, m = xt.shape
    n_y, n_a = Wy.shape
    
    #  链接a_prev and xt
    concat = np.zeros((n_x + n_a, m))
    concat[: n_a, :] = a_prev
    concat[n_a :, :] = xt
    
    # 计算 ft，it, cct, c_next, ot, a_next
    ft = sigmoid(np.dot(Wf, concat) + bf)
    # print("ft shape = ", ft.shape)
    it = sigmoid(np.dot(Wi, concat) + bi)
    # print("it shape = ", it.shape)
    cct = np.tanh(np.dot(Wc, concat) + bc)
    # print("cct shape = ", cct.shape)
    c_next = ft * c_prev + it * cct
    # print("c_next shape = ", c_next.shape)
    ot = sigmoid(np.dot(Wo, concat) + bo)
    # print("ot shape = ", ot.shape)
    a_next = ot * np.tanh(c_next)
    # print("a_next shape = ", a_next.shape)
    # 计算LSTM的预测
    yt_pred = softmax(np.dot(Wy, a_next) + by)
    # print("yt_pred shape = ", yt_pred.shape)
    # 存储反向传播的信息到 cache
    cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)
    
    return a_next, c_next, yt_pred, cache

3.6 LSTM 的前向传播

在这里插入图片描述
$c^{<0>}$ 初始化为全0

def lstm_forward(x, a0, parameters):
    """
    Arguments:
    x -- Input data for every time-step, of shape (n_x, m, T_x).
    a0 -- Initial hidden state, of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
                        bf -- Bias of the forget gate, numpy array of shape (n_a, 1)
                        Wi -- Weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)
                        bi -- Bias of the update gate, numpy array of shape (n_a, 1)
                        Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)
                        bc -- Bias of the first "tanh", numpy array of shape (n_a, 1)
                        Wo -- Weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x)
                        bo -- Bias of the output gate, numpy array of shape (n_a, 1)
                        Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)

    Returns:
    a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
    y -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
    caches -- tuple of values needed for the backward pass, contains (list of all the caches, x)
    """
    # 初始化存储
    caches = []
    
    # 获得x 和 parameters['Wy'] 的形状
    n_x, m, T_x = x.shape
    n_y, n_a = parameters['Wy'].shape
    
    # 初始化 a, c, and y 为全0
    a = np.zeros((n_a, m, T_x))
    c = np.zeros((n_a, m, T_x))
    y = np.zeros((n_y, m, T_x))
    
    # 初始化 a_next, c_next
    a_next = a0
    c_next = np.zeros((n_a, m))
    
    # 按时间步循环
    for t in range(T_x):
        # Update next hidden state, next memory state, compute the prediction, get the cache
        a_next, c_next, yt, cache = lstm_cell_forward(x[:, :, t], a_next, c_next, parameters)
        
        # Save the value of the new "next" hidden state in a
        a[:, :, t] = a_next
        
        # Save the value of the prediction in y
        y[:, :, t] = yt
        
        # Save the value of the next cell state 
        c[:, :, t] = c_next
        
        # Append the cache into caches
        caches.append(cache)
        
    caches = (caches, x)
    
    return a, y, c, caches

到现在为止，我们已经能够构建了一个基本的CNN和一个基本的LSTM的前向传播，当使用一个深度学习框架的时候，实现前向传播足以构建一个优异的系统

xf8964

关注

3
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
1.RNN-- 使用numpy构建RNN单元

使用numpy构建RNN单元0. 使用numpy构建一个简单的RNN1.什么是RNN2.构建一个RNN2.1 RNN call2.2 RNN 前向传播3. LSTM3.1 什么是LSTM3.2 遗忘门3.3 更新门3.4 输出门3.5 LSTM cell3.6 LSTM 的前向传播0. 使用numpy构建一个简单的RNN1.什么是RNN2.构建一个RNN2.1 RNN call2.2 R...
复制链接

扫一扫

专栏目录