【NLP】_08_LSTM & GRU

最新推荐文章于 2024-07-09 20:27:36 发布

DamonDT

最新推荐文章于 2024-07-09 20:27:36 发布

阅读量137

点赞数

分类专栏： NLP

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_34330456/article/details/99739269

版权

NLP 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

文章目录

【一】 Long Short Term Memory Network（长短期记忆）

通过三个门，遗忘门，输入门，输出门，控制信息的流动，解决梯度消失问题

【二】 Forget / Input / Output Gate（3个门）

Forget Gate（遗忘门，0~1）

$\bm {f ^ { ( t ) }} = \sigma ( w _ { f } \cdot h _ { (t - 1) } + u _ { f } \cdot x _ { t } + b_f) \;--\;(w_f, u_f, b_f)$

Input Gate（输入门，0~1）

$\bm {i ^ { ( t ) }} = \sigma ( w _ { i } \cdot h _ { (t - 1) } + u _ { i } \cdot x _ { t } + b_i) \;--\;(w_i, u_i, b_i)$

Ouput Gate（输出门，0~1）

$\bm {o ^ { ( t ) }} = \sigma ( w _ { o } \cdot h _ { (t - 1) } + u _ { o } \cdot x _ { t } + b_o) \;--\;(w_o, u_o, b_o)$

Extra Information（在 $t$ 时刻额外得到的信息）

$\bm {\tilde { c } ^ { ( t ) }}= tanh ( w _ { c } \cdot h _ { (t - 1) } + u _ { c } \cdot x _ { t } + b_c) \;--\;(w_c, u_c, b_c)$

Final Information（在 $t$ 时刻最终的信息， $\circ$ 表示向量相乘）

$\bm {{ c } ^ { ( t ) }} = f^{(t)} \circ c^{(t-1)} + i^{(t)} \circ \tilde { c } ^ { ( t ) }$

Final hidden layer（通过那些信息计算 $h_t$ ）

$\bm {{ h } _ { t }} = o^{(t)} \circ tanh{(c^{(t-1)})}$

【三】 LSTM 应用场景

【四】 Bi-LSTM（双向 LSTM）

【五】 RNN · LSTM · Bi-LSTM - 对比

【六】 GRU - Gate Recurrent Unit

模型架构

Update Gate（更新门）：在 $t$ 时刻会有新的数据，从该信息中心抽取多少的信息，放到 $h_t$ 里面

$\bm {u ^ { ( t ) }} = \sigma ( w _ { u } \cdot h _ { (t - 1) } + u _ { u } \cdot x _ { t } + b_u) \;--\;(w_u, u_u, b_u)$

Reset Gate（重置门）：和 LSTM 的 Forget Gate 类似，忘记或保留多少旧的信息

$\bm {r ^ { ( t ) }} = \sigma ( w _ { r } \cdot h _ { (t - 1) } + u _ { r } \cdot x _ { t } + b_r) \;--\;(w_r, u_r, b_r)$

Extra Information（在 $t$ 时刻额外得到的信息）

$\bm {\tilde { c } ^ { ( t ) }} = tanh ( w _ { c } \cdot (r^{(t)} \circ h _ { (t - 1) }) + u _ { c } \cdot x _ { t } + b_c) \;--\;(w_c, u_c, b_c)$

Final hidden layer（通过那些信息计算 $h_t$ ）

$\bm {{ h } _ { t }} = (1-u^{(t)}) \circ h_{(t-1)} + u^{(t)} \circ \tilde { c } ^ { ( t ) }$

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【NLP】_08_LSTM & GRU

【一】正确的矩阵初始化∂Loss / ∂W 求导过程中，会出想大量的 W 矩阵连乘将 W 矩阵初始化为接近于单位矩阵 【二】正则项公式控制 ∂Loss / ∂ht ≈ ∂Loss / ∂ht-1 【三】LSTM - Long_Short_Term_MemoryWrite Gate（输入门）：0 ~ 1Keep Gate（遗忘门）：0...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。