基本循环神经网络误差反向传播公式

1.前向传播公式

z t = U ⋅ x t + W ⋅ h t − 1 + B z^{t}=U\cdot x^{t}+W\cdot h^{t-1}+B zt=Uxt+Wht1+B h t = f ( z t ) h^{t}=f\left(z^{t}\right) ht=f(zt) o t = V ⋅ h t + C o^{t}=V\cdot h^{t}+C ot=Vht+C y t = g ( o t ) y^{t}=g\left(o^{t}\right) yt=g(ot) L = ∑ t = 1 T l t L=\sum_{t=1}^{T}l^{t} L=t=1Tlt l t = e ( y t ) l^{t}=e\left(y^{t}\right) lt=e(yt)

2. ∂ L ∂ V \frac{\partial L}{\partial V} VL ∂ L ∂ C \frac{\partial L}{\partial C} CL

d L = t r ( ∑ t = 1 T ∂ l t ∂ y t T ⋅ d y t ) dL=tr\left(\sum_{t=1}^{T}\frac{\partial l^{t}}{\partial y^{t}}^{T}\cdot dy^{t}\right) dL=tr(t=1TytltTdyt) = t r ( ∑ t = 1 T ∂ l t ∂ y t T ⋅ ( g ′ ( o t ) ⊙ ( d V ⋅ h t + d C ) ) ) =tr\left(\sum_{t=1}^{T}\frac{\partial l^{t}}{\partial y^{t}}^{T}\cdot \left(g^{'}\left(o^{t}\right)\odot\left(dV\cdot h^{t}+dC\right)\right)\right) =tr(t=1TytltT(g(ot)(dVht+dC))) = t r ( ∑ t = 1 T ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ ( d V ⋅ h t + d C ) ) =tr\left(\sum_{t=1}^{T}\left(\frac{\partial l^{t}}{\partial y^{t}}^{T}\odot \left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot\left(dV\cdot h^{t}+dC\right)\right) =tr(t=1T(ytltT(g(ot))T)(dVht+dC)) = t r ( ∑ t = 1 T h t ⋅ ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ d V + ∑ t = 1 T ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ d C ) =tr\left(\sum_{t=1}^{T}h^{t}\cdot\left(\frac{\partial l^{t}}{\partial y^{t}}^{T}\odot \left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot dV+\sum_{t=1}^{T}\left(\frac{\partial l^{t}}{\partial y^{t}}^{T}\odot \left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot dC\right) =tr(t=1Tht(ytltT(g(ot))T)dV+t=1T(ytltT(g(ot))T)dC)

∂ L ∂ V = ∑ t = 1 T ( ∂ l t ∂ y t ⊙ g ′ ( o t ) ) ⋅ ( h t ) T \frac{\partial L}{\partial V}=\sum_{t=1}^{T}\left(\frac{\partial l^{t}}{\partial y^{t}}\odot g^{'}\left(o^{t}\right)\right)\cdot \left(h^{t}\right)^{T} VL=t=1T(ytltg(ot))(ht)T ∂ L ∂ C = ∑ t = 1 T ∂ l t ∂ y t ⊙ g ′ ( o t ) \frac{\partial L}{\partial C}=\sum_{t=1}^{T}\frac{\partial l^{t}}{\partial y^{t}}\odot g^{'}\left(o^{t}\right) CL=t=1Tytltg(ot)

3. ∂ L ∂ U \frac{\partial L}{\partial U} UL, ∂ L ∂ W \frac{\partial L}{\partial W} WL, ∂ L ∂ B \frac{\partial L}{\partial B} BL

d L = t r ( ∑ t = 1 T ∂ L ∂ z t T ⋅ d z t ) dL=tr\left(\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}}^{T}\cdot dz^{t}\right) dL=tr(t=1TztLTdzt) = t r ( ∑ t = 1 T ∂ L ∂ z t T ⋅ ( d U ⋅ x t + d W ⋅ h t − 1 + d B ) ) =tr\left(\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}}^{T}\cdot \left(dU\cdot x^{t}+dW\cdot h^{t-1}+dB\right)\right) =tr(t=1TztLT(dUxt+dWht1+dB)) = t r ( ∑ t = 1 T x t ⋅ ∂ L ∂ z t T ⋅ d U + ∑ t = 1 T h t − 1 ⋅ ∂ L ∂ z t T ⋅ d W + ∑ t = 1 T ∂ L ∂ z t T ⋅ d B ) =tr\left(\sum_{t=1}^{T}x^{t}\cdot \frac{\partial L}{\partial z^{t}}^{T}\cdot dU+\sum_{t=1}^{T}h^{t-1}\cdot\frac{\partial L}{\partial z^{t}}^{T}\cdot dW+\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}}^{T}\cdot dB\right) =tr(t=1TxtztLTdU+t=1Tht1ztLTdW+t=1TztLTdB)

∂ L ∂ U = ∑ t = 1 T ∂ L ∂ z t ⋅ ( x t ) T \frac{\partial L}{\partial U}=\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}}\cdot \left(x^{t}\right)^{T} UL=t=1TztL(xt)T ∂ L ∂ W = ∑ t = 1 T ∂ L ∂ z t ⋅ ( h t − 1 ) T \frac{\partial L}{\partial W}=\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}}\cdot\left(h^{t-1}\right)^{T} WL=t=1TztL(ht1)T ∂ L ∂ B = ∑ t = 1 T ∂ L ∂ z t \frac{\partial L}{\partial B}=\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}} BL=t=1TztL

4. ∂ L ∂ x t \frac{\partial L}{\partial x^{t}} xtL

d L = t r ( ∂ L ∂ z t T ⋅ d z t ) dL=tr\left(\frac{\partial L}{\partial z^{t}}^{T}\cdot dz^{t}\right) dL=tr(ztLTdzt) = t r ( ∂ L ∂ z t T ⋅ U ⋅ d x t ) =tr\left(\frac{\partial L}{\partial z^{t}}^{T}\cdot U\cdot dx^{t}\right) =tr(ztLTUdxt)

∂ L ∂ x t = U T ⋅ ∂ L ∂ z t \frac{\partial L}{\partial x^{t}}=U^{T}\cdot\frac{\partial L}{\partial z^{t}} xtL=UTztL

5. ∂ L ∂ z t \frac{\partial L}{\partial z^{t}} ztL

5.1 ∂ l t ∂ z t \frac{\partial l^{t}}{\partial z^{t}} ztlt

d l t = t r ( ∂ l t ∂ y t T ⋅ d y t ) dl^{t}=tr\left(\frac{\partial l^{t}}{\partial y^{t}}^T\cdot dy^{t}\right) dlt=tr(ytltTdyt) = t r ( ∂ l t ∂ y t T ⋅ ( g ′ ( o t ) ⊙ d o t ) ) =tr\left(\frac{\partial l^{t}}{\partial y^{t}}^T\cdot\left(g^{'}\left(o^{t}\right)\odot do^{t}\right)\right) =tr(ytltT(g(ot)dot)) = t r ( ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ V ⋅ d h t ) =tr\left(\left(\frac{\partial l^{t}}{\partial y^{t}}^T\odot\left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot V\cdot dh^{t}\right) =tr((ytltT(g(ot))T)Vdht) = t r ( ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ V ⋅ ( f ′ ( z t ) ⊙ d z t ) ) =tr\left(\left(\frac{\partial l^{t}}{\partial y^{t}}^T\odot\left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot V\cdot \left(f^{'}\left(z^{t}\right)\odot dz^{t}\right)\right) =tr((ytltT(g(ot))T)V(f(zt)dzt)) = t r { [ ( ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ V ) ⊙ ( f ′ ( z t ) ) T ] ⋅ d z t } =tr\left\{\left[\left(\left(\frac{\partial l^{t}}{\partial y^{t}}^T\odot\left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot V\right)\odot \left(f^{'}\left(z^{t}\right)\right)^{T}\right]\cdot dz^{t}\right\} =tr{[((ytltT(g(ot))T)V)(f(zt))T]dzt}

∂ l t ∂ z t = ( V T ⋅ ( ∂ l t ∂ y t ⊙ g ′ ( o t ) ) ) ⊙ f ′ ( z t ) \frac{\partial l^{t}}{\partial z^{t}} =\left(V^{T}\cdot\left(\frac{\partial l^{t}}{\partial y^{t}}\odot g^{'}\left(o^{t}\right)\right)\right)\odot f^{'}\left(z^{t}\right) ztlt=(VT(ytltg(ot)))f(zt)

5.2 ∂ l k ∂ z t , ( k ≥ t + 1 ) \frac{\partial l^{k}}{\partial z^{t}},\left(k\ge t+1\right) ztlk,(kt+1)

d l k = t r ( ∂ l k ∂ z t + 1 T ⋅ d z t + 1 ) dl^{k}=tr\left(\frac{\partial l^{k}}{\partial z^{t+1}}^{T}\cdot dz^{t+1}\right) dlk=tr(zt+1lkTdzt+1) = t r ( ∂ l k ∂ z t + 1 T ⋅ W ⋅ ( f ′ ( z t ) ⊙ d z t ) ) =tr\left(\frac{\partial l^{k}}{\partial z^{t+1}}^{T}\cdot W\cdot \left(f^{'}\left(z^{t}\right)\odot dz^{t}\right)\right) =tr(zt+1lkTW(f(zt)dzt)) = t r { ( ( ∂ l k ∂ z t + 1 T ⋅ W ) ⊙ ( f ′ ( z t ) ) T ) ⋅ d z t } =tr\left\{\left(\left(\frac{\partial l^{k}}{\partial z^{t+1}}^{T}\cdot W\right)\odot \left(f^{'}\left(z^{t}\right)\right)^{T}\right)\cdot dz^{t}\right\} =tr{((zt+1lkTW)(f(zt))T)dzt}

∂ l k ∂ z t = ( W T ⋅ ∂ l k ∂ z t + 1 ) ⊙ f ′ ( z t ) \frac{\partial l^{k}}{\partial z^{t}} =\left(W^{T}\cdot\frac{\partial l^{k}}{\partial z^{t+1}}\right)\odot f^{'}\left(z^{t}\right) ztlk=(WTzt+1lk)f(zt)

5.3 ∂ L ∂ z t \frac{\partial L}{\partial z^{t}} ztL递推公式

∂ L ∂ z t = ∂ l t ∂ z t + ∑ k = t + 1 T ∂ l k ∂ z t \frac{\partial L}{\partial z^{t}}=\frac{\partial l^{t}}{\partial z^{t}}+\sum_{k=t+1}^{T}\frac{\partial l^{k}}{\partial z^{t}} ztL=ztlt+k=t+1Tztlk = ∂ l t ∂ z t + ( W T ⋅ ∑ k = t + 1 T ∂ l k ∂ z t + 1 ) ⊙ f ′ ( z t ) =\frac{\partial l^{t}}{\partial z^{t}}+\left(W^{T}\cdot\sum_{k=t+1}^{T}\frac{\partial l^{k}}{\partial z^{t+1}}\right)\odot f^{'}\left(z^{t}\right) =ztlt+(WTk=t+1Tzt+1lk)f(zt) = ( V T ⋅ ( ∂ l t ∂ y t ⊙ g ′ ( o t ) ) ) ⊙ f ′ ( z t ) + ( W T ⋅ ∂ L ∂ z t + 1 ) ⊙ f ′ ( z t ) =\left(V^{T}\cdot\left(\frac{\partial l^{t}}{\partial y^{t}}\odot g^{'}\left(o^{t}\right)\right)\right)\odot f^{'}\left(z^{t}\right)+\left(W^{T}\cdot\frac{\partial L}{\partial z^{t+1}}\right)\odot f^{'}\left(z^{t}\right) =(VT(ytltg(ot)))f(zt)+(WTzt+1L)f(zt)

6.补充说明

只有比 t t t时刻更晚的 l t , l t + 1 , … , l T l^{t},l^{t+1},\dots ,l^{T} lt,lt+1,,lT才会与 z t z^{t} zt有关,所以当 k < t k<t k<t时, ∂ l k ∂ z t = 0 \frac{\partial l^{k}}{\partial z^{t}}=0 ztlk=0 及 : ∂ L ∂ z t = ∑ k = 1 T ∂ l k ∂ z t = ∑ k = t T ∂ l k ∂ z t 及:\frac{\partial L}{\partial z^{t}}=\sum_{k=1}^{T}\frac{\partial l^{k}}{\partial z^{t}}=\sum_{k=t}^{T}\frac{\partial l^{k}}{\partial z^{t}} :ztL=k=1Tztlk=k=tTztlk

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值