深度学习基础技术分析6：LSTM（含代码分析）

最新推荐文章于 2024-04-17 17:39:31 发布

闵帆

最新推荐文章于 2024-04-17 17:39:31 发布

阅读量1k

点赞数

分类专栏：深度学习C++代码文章标签：机器学习

本文链接：https://blog.csdn.net/minfanphd/article/details/112257115

版权

深度学习C++代码专栏收录该内容

13 篇文章 9 订阅

订阅专栏

1. 模型图示

LSTM 模型如图1 所示。横向穿过 cell 上部的线分别称作 $\mathbf{c}$ 总线，下部的线称为 $\mathbf{h}$ 总线，这意味着 $\mathbf{c}_{t - 1}$ 与 $\mathbf{h}_{t - 1}$ 会对 $t$ 时刻的计算产生影响。其中:

从 $x_t$ 与下

在这里插入图片描述图1. LSTM 模型

2. 相关技术

LSTM 从名称来看，是用于处理长短时序。

3. 代码分析

程序代码见: https://github.com/garstka/char-rnn-java
为了学习它, 我又来逐个方法来分析.

// 前向传播核心代码
// acts 根据字符串存取实型二维数组
public void active(int t, Map<String, DoubleMatrix> acts) {
    // 获取 t 时刻输入
    DoubleMatrix x = acts.get("x" + t);
    // 上一时刻的 h 和 c
    DoubleMatrix preH = null, preC = null;
    if (t == 0) {
        preH = new DoubleMatrix(1, getOutSize());
        preC = preH.dup();
    } else {
        preH = acts.get("h" + (t - 1));
        preC = acts.get("c" + (t - 1));
    }
    
    DoubleMatrix i = Activer.logistic(x.mmul(Wxi).add(preH.mmul(Whi)).add(preC.mmul(Wci)).add(bi));
    DoubleMatrix f = Activer.logistic(x.mmul(Wxf).add(preH.mmul(Whf)).add(preC.mmul(Wcf)).add(bf));
    DoubleMatrix gc = Activer.tanh(x.mmul(Wxc).add(preH.mmul(Whc)).add(bc));
    DoubleMatrix c = f.mul(preC).add(i.mul(gc));
    DoubleMatrix o = Activer.logistic(x.mmul(Wxo).add(preH.mmul(Who)).add(c.mmul(Wco)).add(bo));
    DoubleMatrix gh = Activer.tanh(c);
    DoubleMatrix h = o.mul(gh);
    
    // 存储各个二维矩阵
    acts.put("i" + t, i);
    acts.put("f" + t, f);
    acts.put("gc" + t, gc);
    acts.put("c" + t, c);
    acts.put("o" + t, o);
    acts.put("gh" + t, gh);
    acts.put("h" + t, h);
}

在我运行的程序中， $x_t$ 为 one-hot 编码的 $\times 62$ 向量, $i_t$ 至 $h_t$ 均为 $\times 100$ 向量.

代码所表示的信息比图1 更丰富。矩阵变量之间要运算，很多时候要乘以权重矩阵。为使得结构更清晰，图1 牺牲了表达的准确性。以下将向量的计算翻译成数学表达式，这些向量都会被存储在模型中。

向量 $\mathbf{i}_t$ 表示 $t$ 时刻输入:
$\mathbf{i}_t = \sigma(\mathbf{W}^{xi} \cdot \mathbf{x}_t + \mathbf{W}^{hi} \cdot \mathbf{h}_{t - 1} + \mathbf{W}^{ci} \cdot \mathbf{c}_{t - 1} + bi) \tag{1}$
向量 $\mathbf{f}_t$ 表示遗忘:
$\mathbf{i}_t = \sigma(\mathbf{W}^{xf} \cdot \mathbf{x}_t + \mathbf{W}^{hf} \cdot \mathbf{h}_{t - 1} + \mathbf{W}^{cf} \cdot \mathbf{c}_{t - 1} + bf) \tag{2}$
向量 $\mathbf{gc}_t$ 表示
$\mathbf{gc}_t = tanh(\mathbf{W}^{xc} \cdot \mathbf{x}_t + \mathbf{W}^{hc} \cdot \mathbf{h}_{t - 1} + bc) \tag{3}$
向量 $\mathbf{c}_t$ 表示
$\mathbf{c}_t = \tanh(\mathbf{f} \odot \mathbf{c}_{t - 1} + \mathbf{i}_{t} \odot \mathbf{gc}_t) \tag{4}$
向量 $\mathbf{o}_t$ 表示
$\mathbf{o}_t = \sigma(\mathbf{W}^{xo} \cdot \mathbf{x}_t + \mathbf{W}^{ho} \cdot \mathbf{h}_{t - 1} + \mathbf{W}^{co} \cdot \mathbf{c}_t + bo) \tag{5}$
向量 $\mathbf{gh}_t$ 表示
$\mathbf{gh}_t = \tanh(\mathbf{c}_t) \tag{6}$
向量 $\mathbf{h}_t$ 表示本时刻输出.
$\mathbf{h}_t = \mathbf{o}_t \odot \mathbf{gh}_t \tag{7}$

闵帆

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
深度学习基础技术分析6：LSTM（含代码分析）

1. 模型图示有好几个门。2. 相关技术LSTM 指明了是长处理时序列3. 代码分析程序代码见: https://github.com/garstka/char-rnn-java为了学习它, 我又来逐个方法来分析.// 前向传播核心代码// acts 根据字符串存取实型二维数组public void active(int t, Map<String, DoubleMatrix> acts) { // 获取 t 时刻输入 DoubleMatrix x = acts
复制链接

扫一扫

专栏目录