《NLP知识点第二讲》LSTM的原理与代码实现

最新推荐文章于 2024-07-31 07:40:50 发布

古仔8934

最新推荐文章于 2024-07-31 07:40:50 发布

阅读量1k

点赞数 2

分类专栏： AI算法终端应用

本文链接：https://blog.csdn.net/luccao/article/details/82464362

版权

AI算法终端应用专栏收录该内容

3 篇文章 2 订阅

订阅专栏

上一讲我们讲到了普通RNN的原理与代码实现，但是普通RNN存在一个缺点就是梯度消失的问题，并且不能解决长期依赖问题。LSTM通过改变RNN的隐藏层，解决了梯度消失问题，并且实现了网络信息的深层次连接。

1. LSTM_cell_forward

这里写图片描述

LSTM只是改变了RNN的隐藏层，各个gate的输出是0到1，也就是他们可以选择是否保留之前的信息。1就是保存，0就是舍弃。

具体代码实现如下：

import numpy as np

def lstm_cell_forward(x_t, a_prev, c_prev, parameters):
    /*
        x_t: shape(n_x, m)
        a_prev: shape(n_a, m)
        c_prev: shape(n_a, m)

        Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
        bf -- Bias of the forget gate, numpy array of shape (n_a, 1)
        Wi -- Weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)
        bi -- Bias of the update gate, numpy array of shape (n_a, 1)
        Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)
        bc -- Bias of the first "tanh", numpy array of shape (n_a, 1)
        Wo -- Weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x)
        bo -- Bias of the output gate, numpy array of shape (n_a, 1)
        Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)

        a_next -- next hidden state, of shape (n_a, m)
        c_next -- next memory state, of shape (n_a, m)
        yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
        cache -- tuple of values needed for the backward pass, contains (a_next, c_next, a_prev, c_prev, xt, parameters)
    */

        Wf = parameters["Wf"]
        bf = parameters["bf"]
        Wi = parameters["Wi"]
        bi = parameters["bi"]
        Wc = parameters["Wc"]
        bc = parameters["bc"]
        Wo = parameters["Wo"]
        bo = parameters["bo"]
        Wy = parameters["Wy"]
        by = parameters["by"]

        n_x, m = xt.shape
        n_y, n_a = Wy.shape

        //Concatenate a_prev and xt 
        concat = np.zeros((n_a + n_x, m))
        concat[: n_a, :] = a_prev
        concat[n_a :, :] = x_t

        ft = sigmoid(np.dot(Wf, concat) + bf)
        it = sigmoid(np.numpy(Wi, concat) + bi)
        cct = np.tanh(np.dot(Wc, concat) + bc)
        c_next = ft * c_prev + it * cct
        ot = sigmoid(np.dot(Wo, concat) + bo)
        a_next = ot * np.tanh(c_next)

        yt_pred = softmax(np.dot(Wy, a_next) + by)

        cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)

        return a_next, c_next, yt_pred, cache

2. LSTM_forward

这里写图片描述

import numpy as np

def lstm_forward(x, a0, parameters):
    /*
        x: shape(n_x, m, T_x)
        a0: shape(n_a, m)

        Wf: shape(n_a, n_a + n_x)
        bf: shape(n_a, 1)
        Wi: shape(n_a, n_a + n_x)
        bi: shape(n_a, 1)
        Wc: shape(n_a, n_a + n_x)
        bc: shape(n_a, 1)
        Wo: shape(n_a, n_a + n_x)
        bo: shape(n_a, 1)
        Wy: shape(n_y, n_a)
        by: shape(n_y, 1)

        a: shape(n_a, m, T_x)
        y: shape(n_y, m, T_x)
        caches: shape(caches, X)
    */
    caches = []

    n_x, m, T_x = x.shape
    n_y, n_a = parameters["Wy"].shape

    a = np.zeros((n_a, m, T_x))
    y = np.zeros((n_y, m, T_x))
    c = np.zeros((n_a, m, T_x))

    a_next = a0
    c_next = np.zeros((a_next.shape))

    for t in rang(T_x):
        a_prev = 
        a_next, c_next, yt_pred, cache = lstm_cell_forward(x[:,:,t], a_next, c_next, parameters)
        a[:,:,t] = a_next
        y[:,:,t] = yt_pred
        c[:,:,t] = c_next
        caches.append(cache)
    caches = (caches,x)
    return a, y, c, caches