循环神经网络简述-快速复习

循环神经网络


具有隐状态的循环神经网络

H t = ϕ ( X t W x h + H t − 1 W h h + b h ) . O t = H t W h q + b q . \begin{split}\begin{aligned} \mathbf{H}_t &= \phi(\mathbf{X}_t \mathbf{W}_{xh} + \mathbf{H}_{t-1} \mathbf{W}_{hh} + \mathbf{b}_h).\\ \mathbf{O}_t &= \mathbf{H}_t \mathbf{W}_{hq} + \mathbf{b}_q. \end{aligned}\end{split} HtOt=ϕ(XtWxh+Ht1Whh+bh).=HtWhq+bq.

基于循环神经网络的字符级语言模型:输入序列和标签序列分别为“machin”和“achine”

困惑度(Perplexity):
1 n ∑ t = 1 n − log ⁡ P ( x t ∣ x t − 1 , … , x 1 ) , exp ⁡ ( − 1 n ∑ t = 1 n log ⁡ P ( x t ∣ x t − 1 , … , x 1 ) ) . \begin{split}\begin{aligned} \frac{1}{n} \sum_{t=1}^n -\log P(x_t \mid x_{t-1}, \ldots, x_1),\\ \exp\left(-\frac{1}{n} \sum_{t=1}^n \log P(x_t \mid x_{t-1}, \ldots, x_1)\right). \end{aligned}\end{split} n1t=1nlogP(xtxt1,,x1),exp(n1t=1nlogP(xtxt1,,x1)).

梯度裁剪:
g ← min ⁡ ( 1 , θ ∥ g ∥ ) g . \begin{split}\begin{aligned} \mathbf{g} \leftarrow \min\left(1, \frac{\theta}{\|\mathbf{g}\|}\right) \mathbf{g}. \end{aligned}\end{split} gmin(1,gθ)g.

门控循环单元(GRU)


GRU

R t = σ ( X t W x r + H t − 1 W h r + b r ) , Z t = σ ( X t W x z + H t − 1 W h z + b z ) , H ~ t = tanh ⁡ ( X t W x h + ( R t ⊙ H t − 1 ) W h h + b h ) , H t = Z t ⊙ H t − 1 + ( 1 − Z t ) ⊙ H ~ t . \begin{split}\begin{aligned} \mathbf{R}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xr} + \mathbf{H}_{t-1} \mathbf{W}_{hr} + \mathbf{b}_r),\\ \mathbf{Z}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xz} + \mathbf{H}_{t-1} \mathbf{W}_{hz} + \mathbf{b}_z),\\ \tilde{\mathbf{H}}_t &= \tanh(\mathbf{X}_t \mathbf{W}_{xh} + \left(\mathbf{R}_t \odot \mathbf{H}_{t-1}\right) \mathbf{W}_{hh} + \mathbf{b}_h),\\ \mathbf{H}_t &= \mathbf{Z}_t \odot \mathbf{H}_{t-1} + (1 - \mathbf{Z}_t) \odot \tilde{\mathbf{H}}_t. \end{aligned}\end{split} RtZtH~tHt=σ(XtWxr+Ht1Whr+br),=σ(XtWxz+Ht1Whz+bz),=tanh(XtWxh+(RtHt1)Whh+bh),=ZtHt1+(1Zt)H~t.

长短期记忆网络(LSTM)


LSTM

I t = σ ( X t W x i + H t − 1 W h i + b i ) , F t = σ ( X t W x f + H t − 1 W h f + b f ) , O t = σ ( X t W x o + H t − 1 W h o + b o ) , C t = tanh ( X t W x c + H t − 1 W h c + b c ) , C ~ t = tanh ( X t W x c + H t − 1 W h c + b c ) , H t = O t ⊙ tanh ⁡ ( C t ) . \begin{split}\begin{aligned} \mathbf{I}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xi} + \mathbf{H}_{t-1} \mathbf{W}_{hi} + \mathbf{b}_i),\\ \mathbf{F}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xf} + \mathbf{H}_{t-1} \mathbf{W}_{hf} + \mathbf{b}_f),\\ \mathbf{O}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xo} + \mathbf{H}_{t-1} \mathbf{W}_{ho} + \mathbf{b}_o), \\ \mathbf{C}_t &= \text{tanh}(\mathbf{X}_t \mathbf{W}_{xc} + \mathbf{H}_{t-1} \mathbf{W}_{hc} + \mathbf{b}_c),\\ \tilde{\mathbf{C}}_t &= \text{tanh}(\mathbf{X}_t \mathbf{W}_{xc} + \mathbf{H}_{t-1} \mathbf{W}_{hc} + \mathbf{b}_c),\\ \mathbf{H}_t &= \mathbf{O}_t \odot \tanh(\mathbf{C}_t). \end{aligned}\end{split} ItFtOtCtC~tHt=σ(XtWxi+Ht1Whi+bi),=σ(XtWxf+Ht1Whf+bf),=σ(XtWxo+Ht1Who+bo),=tanh(XtWxc+Ht1Whc+bc),=tanh(XtWxc+Ht1Whc+bc),=Ottanh(Ct).

深层神经网洛


深度循环神经网络

H t ( l ) = ϕ l ( H t ( l − 1 ) W x h ( l ) + H t − 1 ( l ) W h h ( l ) + b h ( l ) ) , O t = H t ( L ) W h q + b q , \begin{split}\begin{aligned} \mathbf{H}_t^{(l)} &= \phi_l(\mathbf{H}_t^{(l-1)} \mathbf{W}_{xh}^{(l)} + \mathbf{H}_{t-1}^{(l)} \mathbf{W}_{hh}^{(l)} + \mathbf{b}_h^{(l)}),\\ \mathbf{O}_t &= \mathbf{H}_t^{(L)} \mathbf{W}_{hq} + \mathbf{b}_q, \end{aligned}\end{split} Ht(l)Ot=ϕl(Ht(l1)Wxh(l)+Ht1(l)Whh(l)+bh(l)),=Ht(L)Whq+bq,

双向循环神经网络


双向循环神经网络架构

H → t = ϕ ( X t W x h ( f ) + H → t − 1 W h h ( f ) + b h ( f ) ) , H ← t = ϕ ( X t W x h ( b ) + H ← t + 1 W h h ( b ) + b h ( b ) ) , O t = H t W h q + b q . \begin{split}\begin{aligned} \overrightarrow{\mathbf{H}}_t &= \phi(\mathbf{X}_t \mathbf{W}_{xh}^{(f)} + \overrightarrow{\mathbf{H}}_{t-1} \mathbf{W}_{hh}^{(f)} + \mathbf{b}_h^{(f)}),\\ \overleftarrow{\mathbf{H}}_t &= \phi(\mathbf{X}_t \mathbf{W}_{xh}^{(b)} + \overleftarrow{\mathbf{H}}_{t+1} \mathbf{W}_{hh}^{(b)} + \mathbf{b}_h^{(b)}),\\ \mathbf{O}_t &= \mathbf{H}_t \mathbf{W}_{hq} + \mathbf{b}_q. \end{aligned}\end{split} H tH tOt=ϕ(XtWxh(f)+H t1Whh(f)+bh(f)),=ϕ(XtWxh(b)+H t+1Whh(b)+bh(b)),=HtWhq+bq.

推理有缺陷, 主要是特征提取

编码器和解码器


编码器-解码器架构

序列到序列学习(seq2seq)


使用循环神经网络编码器和循环神经网络解码器的序列到序列学习
循环神经网络编码器-解码器模型中的层
使用循环神经网络编码器-解码器逐词元地预测输出序列

模型评估:
exp ⁡ ( min ⁡ ( 0 , 1 − l e n label l e n pred ) ) ∏ n = 1 k p n 1 / 2 n , \begin{split}\begin{aligned} \exp\left(\min\left(0, 1 - \frac{\mathrm{len}_{\text{label}}}{\mathrm{len}_{\text{pred}}}\right)\right) \prod_{n=1}^k p_n^{1/2^n}, \end{aligned}\end{split} exp(min(0,1lenpredlenlabel))n=1kpn1/2n,

束搜索


束搜索过程(束宽:2,输出序列的最大长度:3)

加入短句子惩罚项,防止总是选择短句子

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值