原文链接https://www.luoyoyo.com/articles/04a1458ec185ad2fd05038f89d526641
今天给大家分享分享循环神经网络(以LSTM为研究对象)的内部计算逻辑,本次博客从keras源码,并结合一位博主的博客对其进行详细剖析。博客:[译] 理解 LSTM(Long Short-Term Memory, LSTM) 网络 - wangduo - 博客园,这是一篇非常经典且详细的博客,大家一定要抽时间去过一遍,并仔细思考。探讨之前,假设各位看官已经有RNN的一丢丢基础,线性代数的一丢丢基础和常见深度学习的一丢丢基础。
OK,开始吧。
以上是引自上述博客的图片,表达的是lstm结构的框架,但是其实结构会更加复杂一些,咱们稍后作详解,但是会基于这个结构图做调整。
1、ont-hot编码
公 [0 0 0 0 1 0 0 0 0 0]
主 [0 0 0 1 0 0 0 0 0 0]
很 [0 0 1 0 0 0 0 0 0 0]
漂 [0 1 0 0 0 0 0 0 0 0]
亮 [1 0 0 0 0 0 0 0 0 0]
咱们假设有一句话“公主很漂亮”,经过one-hot编码后形成shape=(5, 10)的张量(假设语料库总共有10个字, 所以是(5 , 10)),这个一句话在lstm过程中,是这样的:
最初的循环神经网络要做的事就是,通过公预测主,通过主预测很,通过很预测漂,通过漂预测亮,然后通过上一个步骤预测下一个步骤的过程,我们把他称为“时间片”操作,“公主很漂亮”就分成了5个时间片,通常称为“time_step”。
具体的过程是:输入x1=“主”,经过LSTM,h1就会得出“很”,这个h1就是“短时记忆”,c1就会得出一个状态(张量),这个状态c1就是“长时记忆”;接下来h1会跟x2结合(这不是简单加法,咋们后续谈这个“结合”),参与计算该时间片的操作,c1也会参与到本次操作的计算中来,经过LSTM,h2得出“漂”,c2得出新的状态;如此循环!
总结出来就是:本次输入结合上次输出的“短时记忆” 和 上次输出的“长时记忆” 经过 LSTM单元,得出 下一次的“短时记忆”以及下一次的“长时记忆”。这就是循环神经网络要做的事。
好,咱们结合源码,咱们重新画这张图:
以下是keras LSTMCell的源码,有兴趣的移步一下
class LSTMCell(Layer):
"""Cell class for the LSTM layer.
# Arguments
units: Positive integer, dimensionality of the output space.
activation: Activation function to use
(see [activations](../activations.md)).
Default: hyperbolic tangent (`tanh`).
If you pass `None`, no activation is applied
(ie. "linear" activation: `a(x) = x`).
recurrent_activation: Activation function to use
for the recurrent step
(see [activations](../activations.md)).
Default: hard sigmoid (`hard_sigmoid`).
If you pass `None`, no activation is applied
(ie. "linear" activation: `a(x) = x`).x
use_bias: Boolean, whether the layer uses a bias vector.
kernel_initializer: Initializer for the `kernel` weights matrix,
used for the linear transformation of the inputs
(see [initializers](../initializers.md)).
recurrent_initializer: Initializer for the `recurrent_kernel`
weights matrix,
used for the linear transformation of the recurrent state
(see [initializers](../initializers.md)).
bias_initializer: Initializer for the bias vector
(see [initializers](../initializers.md)).
unit_forget_bias: Boolean.
If True, add 1 to the bias of the forget gate at initialization.
Setting it to true will also force `bias_initializer="zeros"`.
This is recommended in [Jozefowicz et al.]
(http://www.jmlr.org/proceedings/papers/v37/jozefowicz15.pdf).
kernel_regularizer: Regularizer function applied to
the `kernel` weights matrix
(see [regularizer](../regularizers.md)).
recurrent_regularizer: Regularizer function applied to
the `recurrent_kernel` weights matrix
(see [regularizer](../regularizers.md)).
bias_regularizer: Regularizer function applied to the bias vector
(see [regularizer](../regularizers.md)).
kernel_constraint: Constraint function applied to
the `kernel` weights matrix
(see [constraints](../constraints.md)).
recurrent_constraint: Constraint fu