一问带你看懂循环神经网络小黑匣内部结构——LSTM

罗小丰同学

已于 2022-06-01 15:33:37 修改

阅读量2.5k

点赞数 12

分类专栏： dnn 算法 python 文章标签： lstm rnn 深度学习

于 2019-04-12 16:09:21 首次发布

本文链接：https://blog.csdn.net/weixin_42078618/article/details/89227129

版权

本文通过keras源码详细解析LSTM的工作原理，解释了如何通过输入预测下一个时间步，并介绍了LSTM的内部计算过程，包括输入、隐藏状态和门控机制。通过实例展示了LSTM在one-hot编码后的处理流程，以及LSTMCell的关键参数和计算逻辑。

摘要由CSDN通过智能技术生成

原文链接https://www.luoyoyo.com/articles/04a1458ec185ad2fd05038f89d526641

今天给大家分享分享循环神经网络（以LSTM为研究对象）的内部计算逻辑，本次博客从keras源码，并结合一位博主的博客对其进行详细剖析。博客：[译] 理解 LSTM(Long Short-Term Memory, LSTM) 网络 - wangduo - 博客园，这是一篇非常经典且详细的博客，大家一定要抽时间去过一遍，并仔细思考。探讨之前，假设各位看官已经有RNN的一丢丢基础，线性代数的一丢丢基础和常见深度学习的一丢丢基础。

OK，开始吧。

以上是引自上述博客的图片，表达的是lstm结构的框架，但是其实结构会更加复杂一些，咱们稍后作详解，但是会基于这个结构图做调整。

1、ont-hot编码

公 [0 0 0 0 1 0 0 0 0 0]
主 [0 0 0 1 0 0 0 0 0 0]
很 [0 0 1 0 0 0 0 0 0 0]
漂 [0 1 0 0 0 0 0 0 0 0]
亮 [1 0 0 0 0 0 0 0 0 0]

咱们假设有一句话“公主很漂亮”，经过one-hot编码后形成shape=(5, 10)的张量(假设语料库总共有10个字, 所以是(5 , 10))，这个一句话在lstm过程中，是这样的：

最初的循环神经网络要做的事就是，通过公预测主，通过主预测很，通过很预测漂，通过漂预测亮，然后通过上一个步骤预测下一个步骤的过程，我们把他称为“时间片”操作，“公主很漂亮”就分成了5个时间片，通常称为“time_step”。

具体的过程是：输入x1=“主”，经过LSTM，h1就会得出“很”，这个h1就是“短时记忆”，c1就会得出一个状态（张量），这个状态c1就是“长时记忆”；接下来h1会跟x2结合（这不是简单加法，咋们后续谈这个“结合”），参与计算该时间片的操作，c1也会参与到本次操作的计算中来，经过LSTM，h2得出“漂”，c2得出新的状态；如此循环！

总结出来就是：本次输入结合上次输出的“短时记忆” 和上次输出的“长时记忆” 经过 LSTM单元，得出下一次的“短时记忆”以及下一次的“长时记忆”。这就是循环神经网络要做的事。

好，咱们结合源码，咱们重新画这张图：

以下是keras LSTMCell的源码，有兴趣的移步一下

class LSTMCell(Layer):
    """Cell class for the LSTM layer.

    # Arguments
        units: Positive integer, dimensionality of the output space.
        activation: Activation function to use
            (see [activations](../activations.md)).
            Default: hyperbolic tangent (`tanh`).
            If you pass `None`, no activation is applied
            (ie. "linear" activation: `a(x) = x`).
        recurrent_activation: Activation function to use
            for the recurrent step
            (see [activations](../activations.md)).
            Default: hard sigmoid (`hard_sigmoid`).
            If you pass `None`, no activation is applied
            (ie. "linear" activation: `a(x) = x`).x
        use_bias: Boolean, whether the layer uses a bias vector.
        kernel_initializer: Initializer for the `kernel` weights matrix,
            used for the linear transformation of the inputs
            (see [initializers](../initializers.md)).
        recurrent_initializer: Initializer for the `recurrent_kernel`
            weights matrix,
            used for the linear transformation of the recurrent state
            (see [initializers](../initializers.md)).
        bias_initializer: Initializer for the bias vector
            (see [initializers](../initializers.md)).
        unit_forget_bias: Boolean.
            If True, add 1 to the bias of the forget gate at initialization.
            Setting it to true will also force `bias_initializer="zeros"`.
            This is recommended in [Jozefowicz et al.]
            (http://www.jmlr.org/proceedings/papers/v37/jozefowicz15.pdf).
        kernel_regularizer: Regularizer function applied to
            the `kernel` weights matrix
            (see [regularizer](../regularizers.md)).
        recurrent_regularizer: Regularizer function applied to
            the `recurrent_kernel` weights matrix
            (see [regularizer](../regularizers.md)).
        bias_regularizer: Regularizer function applied to the bias vector
            (see [regularizer](../regularizers.md)).
        kernel_constraint: Constraint function applied to
            the `kernel` weights matrix
            (see [constraints](../constraints.md)).
        recurrent_constraint: Constraint fu