循环神经网络简述-快速复习

YCH带带我

已于 2023-04-08 11:34:53 修改

阅读量399

点赞数

分类专栏：人工智能文章标签： rnn 深度学习人工智能

于 2023-04-08 11:30:54 首次发布

本文链接：https://blog.csdn.net/weixin_45745314/article/details/130026574

版权

人工智能专栏收录该内容

8 篇文章 1 订阅

订阅专栏

循环神经网络

具有隐状态的循环神经网络

$\begin{split}\begin{aligned} \mathbf{H}_t &= \phi(\mathbf{X}_t \mathbf{W}_{xh} + \mathbf{H}_{t-1} \mathbf{W}_{hh} + \mathbf{b}_h).\\ \mathbf{O}_t &= \mathbf{H}_t \mathbf{W}_{hq} + \mathbf{b}_q. \end{aligned}\end{split}$

基于循环神经网络的字符级语言模型：输入序列和标签序列分别为“machin”和“achine”

困惑度（Perplexity）:
$\begin{split}\begin{aligned} \frac{1}{n} \sum_{t=1}^n -\log P(x_t \mid x_{t-1}, \ldots, x_1),\\ \exp\left(-\frac{1}{n} \sum_{t=1}^n \log P(x_t \mid x_{t-1}, \ldots, x_1)\right). \end{aligned}\end{split}$

梯度裁剪:
$\begin{split}\begin{aligned} \mathbf{g} \leftarrow \min\left(1, \frac{\theta}{\|\mathbf{g}\|}\right) \mathbf{g}. \end{aligned}\end{split}$

门控循环单元（GRU）

GRU

$\begin{split}\begin{aligned} \mathbf{R}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xr} + \mathbf{H}_{t-1} \mathbf{W}_{hr} + \mathbf{b}_r),\\ \mathbf{Z}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xz} + \mathbf{H}_{t-1} \mathbf{W}_{hz} + \mathbf{b}_z),\\ \tilde{\mathbf{H}}_t &= \tanh(\mathbf{X}_t \mathbf{W}_{xh} + \left(\mathbf{R}_t \odot \mathbf{H}_{t-1}\right) \mathbf{W}_{hh} + \mathbf{b}_h),\\ \mathbf{H}_t &= \mathbf{Z}_t \odot \mathbf{H}_{t-1} + (1 - \mathbf{Z}_t) \odot \tilde{\mathbf{H}}_t. \end{aligned}\end{split}$

长短期记忆网络（LSTM）

LSTM

$\begin{split}\begin{aligned} \mathbf{I}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xi} + \mathbf{H}_{t-1} \mathbf{W}_{hi} + \mathbf{b}_i),\\ \mathbf{F}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xf} + \mathbf{H}_{t-1} \mathbf{W}_{hf} + \mathbf{b}_f),\\ \mathbf{O}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xo} + \mathbf{H}_{t-1} \mathbf{W}_{ho} + \mathbf{b}_o), \\ \mathbf{C}_t &= \text{tanh}(\mathbf{X}_t \mathbf{W}_{xc} + \mathbf{H}_{t-1} \mathbf{W}_{hc} + \mathbf{b}_c),\\ \tilde{\mathbf{C}}_t &= \text{tanh}(\mathbf{X}_t \mathbf{W}_{xc} + \mathbf{H}_{t-1} \mathbf{W}_{hc} + \mathbf{b}_c),\\ \mathbf{H}_t &= \mathbf{O}_t \odot \tanh(\mathbf{C}_t). \end{aligned}\end{split}$

深层神经网洛

深度循环神经网络

$\begin{split}\begin{aligned} \mathbf{H}_t^{(l)} &= \phi_l(\mathbf{H}_t^{(l-1)} \mathbf{W}_{xh}^{(l)} + \mathbf{H}_{t-1}^{(l)} \mathbf{W}_{hh}^{(l)} + \mathbf{b}_h^{(l)}),\\ \mathbf{O}_t &= \mathbf{H}_t^{(L)} \mathbf{W}_{hq} + \mathbf{b}_q, \end{aligned}\end{split}$

双向循环神经网络

双向循环神经网络架构

$\begin{split}\begin{aligned} \overrightarrow{\mathbf{H}}_t &= \phi(\mathbf{X}_t \mathbf{W}_{xh}^{(f)} + \overrightarrow{\mathbf{H}}_{t-1} \mathbf{W}_{hh}^{(f)} + \mathbf{b}_h^{(f)}),\\ \overleftarrow{\mathbf{H}}_t &= \phi(\mathbf{X}_t \mathbf{W}_{xh}^{(b)} + \overleftarrow{\mathbf{H}}_{t+1} \mathbf{W}_{hh}^{(b)} + \mathbf{b}_h^{(b)}),\\ \mathbf{O}_t &= \mathbf{H}_t \mathbf{W}_{hq} + \mathbf{b}_q. \end{aligned}\end{split}$

推理有缺陷, 主要是特征提取

编码器和解码器

编码器-解码器架构

序列到序列学习（seq2seq）

使用循环神经网络编码器和循环神经网络解码器的序列到序列学习

循环神经网络编码器-解码器模型中的层

使用循环神经网络编码器-解码器逐词元地预测输出序列

模型评估：
$\begin{split}\begin{aligned} \exp\left(\min\left(0, 1 - \frac{\mathrm{len}_{\text{label}}}{\mathrm{len}_{\text{pred}}}\right)\right) \prod_{n=1}^k p_n^{1/2^n}, \end{aligned}\end{split}$