RNN LSTM GRU Attention transformer公式整理总结（超详细图文公式）

胤风

已于 2023-10-01 10:59:52 修改

阅读量5.4k

点赞数 2

分类专栏： NLP自然语言处理文章标签：深度学习 rnn lstm 神经网络 nlp

于 2020-08-11 23:58:24 首次发布

本文链接：https://blog.csdn.net/MoreAction_/article/details/107946186

版权

NLP自然语言处理专栏收录该内容

9 篇文章 10 订阅

订阅专栏

整理各种模型的公式，以后面试复习用

RNN

在这里插入图片描述
公式：
$h_{t}=f\left(W \cdot\left[h_{t-1}, x_{t}\right]+b\right)$
或者（从矩阵分块乘法来看实际上是等价的）：
$h_{t}=f\left(W \cdot h_{t-1}+U \cdot x_{t}+b\right)$

LSTM

在这里插入图片描述
公式：
遗忘门： $f_{t}=\sigma\left(W_{f} \cdot\left[h_{t-1}, x_{t}\right]+b_{f}\right)$
输入门： $i_{t}=\sigma\left(W_{i} \cdot\left[h_{t-1}, x_{t}\right]+b_{i}\right)$
细胞状态： $\tilde{C}_{t}=\tanh \left(W_{C} \cdot\left[h_{t-1}, x_{t}\right]+b_{C}\right)$
细胞更新： $C_{t}=f_{t} * C_{t-1}+i_{t} * \tilde{C}_{t}$
输出门： $o_{t}=\sigma\left(W_{o}\left[h_{t-1}, x_{t}\right]+b_{o}\right)$
输出： $h_{t}=o_{t} * \tanh \left(C_{t}\right)$

GRU

在这里插入图片描述
公式：
更新门： $z_{t}=\sigma\left(W_{z} \cdot\left[h_{t-1}, x_{t}\right]\right)$
重置门： $r_{t}=\sigma\left(W_{r} \cdot\left[h_{t-1}, x_{t}\right]\right)$
当前状态： $\tilde{h}_{t}=\tanh \left(W \cdot\left[r_{t} * h_{t-1}, x_{t}\right]\right)$
更新： $h_{t}=\left(1-z_{t}\right) * h_{t-1}+z_{t} * \tilde{h}_{t}$

RNN相关推荐阅读：https://www.jianshu.com/p/4b4701beba92

Attention机制

在这里插入图片描述
Attention有很多计算方法，下面的公式只是比较常用的一种，计算方法和transformer中的qkv类似，下面公式以解码器第一个状态为例，Encoder输入长度为m， $\mathrm{W}$ 为参数，自动学习获得。

公式：
计算 $\mathrm{q}$ ： $\mathrm{q}_{0}=\mathbf{W}_{Q} \cdot \mathrm{s}_{0}$
计算 $\mathrm{k}$ ： $\mathrm{k}_{i}=\mathbf{W}_{K} \cdot \mathbf{h}_{i},$ for $i = 1$ to $m$
计算每个位置得分： $\tilde{\alpha}_{i}=\mathrm{k}_{i}^{T} \mathrm{q}_{0},$ for $i = 1$ to $m$
softmax归一化： $\left[\alpha_{1}, \cdots, \alpha_{m}\right]=\operatorname{Softmax}\left(\left[\tilde{\alpha}_{1}, \cdots, \tilde{\alpha}_{m}\right]\right)$ （softmax公式想必很熟了）
最后，计算得到当前的 context vector： $c_{0}=\alpha_{1} \mathbf{h}_{1}+\cdots+\alpha_{m} \mathbf{h}_{m}$

Transformer

在这里插入图片描述
transformer的公式不太好写，下面只给出几个关键公式

公式：
计算 $Q 、 K 、 V$ ： $Q=W^{Q} * X$ ， $K=W^{K} * X$ ， $V=W^{V} * X$
计算self Attention：Attention $V)=\operatorname{softmax}\left(\frac{Q K^{T}}{\sqrt{d_{k}}}\right) V$
前馈网络层： $\operatorname{FFN}(Z)=\max \left(0, Z W_{1}+b_{1}\right) W_{2}+b_{2}$
位置编码：
$i)=\sin \left(\frac{p o s}{10000^{\frac{2 i}{d_{m o d e l}}}}\right)$ ， $i+1)=\cos \left(\frac{p o s}{10000^{\frac{2 i}{d_{m o d e l}}}}\right)$