目录
- 1.前向传播公式
- 2. ∂ L ∂ V \frac{\partial L}{\partial V} ∂V∂L与 ∂ L ∂ C \frac{\partial L}{\partial C} ∂C∂L
- 3. ∂ L ∂ U \frac{\partial L}{\partial U} ∂U∂L, ∂ L ∂ W \frac{\partial L}{\partial W} ∂W∂L, ∂ L ∂ B \frac{\partial L}{\partial B} ∂B∂L
- 4. ∂ L ∂ x t \frac{\partial L}{\partial x^{t}} ∂xt∂L
- 5. ∂ L ∂ z t \frac{\partial L}{\partial z^{t}} ∂zt∂L
- 6.补充说明
1.前向传播公式
z t = U ⋅ x t + W ⋅ h t − 1 + B z^{t}=U\cdot x^{t}+W\cdot h^{t-1}+B zt=U⋅xt+W⋅ht−1+B h t = f ( z t ) h^{t}=f\left(z^{t}\right) ht=f(zt) o t = V ⋅ h t + C o^{t}=V\cdot h^{t}+C ot=V⋅ht+C y t = g ( o t ) y^{t}=g\left(o^{t}\right) yt=g(ot) L = ∑ t = 1 T l t L=\sum_{t=1}^{T}l^{t} L=t=1∑Tlt l t = e ( y t ) l^{t}=e\left(y^{t}\right) lt=e(yt)
2. ∂ L ∂ V \frac{\partial L}{\partial V} ∂V∂L与 ∂ L ∂ C \frac{\partial L}{\partial C} ∂C∂L
d L = t r ( ∑ t = 1 T ∂ l t ∂ y t T ⋅ d y t ) dL=tr\left(\sum_{t=1}^{T}\frac{\partial l^{t}}{\partial y^{t}}^{T}\cdot dy^{t}\right) dL=tr(t=1∑T∂yt∂ltT⋅dyt) = t r ( ∑ t = 1 T ∂ l t ∂ y t T ⋅ ( g ′ ( o t ) ⊙ ( d V ⋅ h t + d C ) ) ) =tr\left(\sum_{t=1}^{T}\frac{\partial l^{t}}{\partial y^{t}}^{T}\cdot \left(g^{'}\left(o^{t}\right)\odot\left(dV\cdot h^{t}+dC\right)\right)\right) =tr(t=1∑T∂yt∂ltT⋅(g′(ot)⊙(dV⋅ht+dC))) = t r ( ∑ t = 1 T ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ ( d V ⋅ h t + d C ) ) =tr\left(\sum_{t=1}^{T}\left(\frac{\partial l^{t}}{\partial y^{t}}^{T}\odot \left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot\left(dV\cdot h^{t}+dC\right)\right) =tr(t=1∑T(∂yt∂ltT⊙(g′(ot))T)⋅(dV⋅ht+dC)) = t r ( ∑ t = 1 T h t ⋅ ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ d V + ∑ t = 1 T ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ d C ) =tr\left(\sum_{t=1}^{T}h^{t}\cdot\left(\frac{\partial l^{t}}{\partial y^{t}}^{T}\odot \left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot dV+\sum_{t=1}^{T}\left(\frac{\partial l^{t}}{\partial y^{t}}^{T}\odot \left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot dC\right) =tr(t=1∑Tht⋅(∂yt∂ltT⊙(g′(ot))T)⋅dV+t=1∑T(∂yt∂ltT⊙(g′(ot))T)⋅dC)
∂ L ∂ V = ∑ t = 1 T ( ∂ l t ∂ y t ⊙ g ′ ( o t ) ) ⋅ ( h t ) T \frac{\partial L}{\partial V}=\sum_{t=1}^{T}\left(\frac{\partial l^{t}}{\partial y^{t}}\odot g^{'}\left(o^{t}\right)\right)\cdot \left(h^{t}\right)^{T} ∂V∂L=t=1∑T(∂yt∂lt⊙g′(ot))⋅(ht)T ∂ L ∂ C = ∑ t = 1 T ∂ l t ∂ y t ⊙ g ′ ( o t ) \frac{\partial L}{\partial C}=\sum_{t=1}^{T}\frac{\partial l^{t}}{\partial y^{t}}\odot g^{'}\left(o^{t}\right) ∂C∂L=t=1∑T∂yt∂lt⊙g′(ot)
3. ∂ L ∂ U \frac{\partial L}{\partial U} ∂U∂L, ∂ L ∂ W \frac{\partial L}{\partial W} ∂W∂L, ∂ L ∂ B \frac{\partial L}{\partial B} ∂B∂L
d L = t r ( ∑ t = 1 T ∂ L ∂ z t T ⋅ d z t ) dL=tr\left(\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}}^{T}\cdot dz^{t}\right) dL=tr(t=1∑T∂zt∂LT⋅dzt) = t r ( ∑ t = 1 T ∂ L ∂ z t T ⋅ ( d U ⋅ x t + d W ⋅ h t − 1 + d B ) ) =tr\left(\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}}^{T}\cdot \left(dU\cdot x^{t}+dW\cdot h^{t-1}+dB\right)\right) =tr(t=1∑T∂zt∂LT⋅(dU⋅xt+dW⋅ht−1+dB)) = t r ( ∑ t = 1 T x t ⋅ ∂ L ∂ z t T ⋅ d U + ∑ t = 1 T h t − 1 ⋅ ∂ L ∂ z t T ⋅ d W + ∑ t = 1 T ∂ L ∂ z t T ⋅ d B ) =tr\left(\sum_{t=1}^{T}x^{t}\cdot \frac{\partial L}{\partial z^{t}}^{T}\cdot dU+\sum_{t=1}^{T}h^{t-1}\cdot\frac{\partial L}{\partial z^{t}}^{T}\cdot dW+\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}}^{T}\cdot dB\right) =tr(t=1∑Txt⋅∂zt∂LT⋅dU+t=1∑Tht−1⋅∂zt∂LT⋅dW+t=1∑T∂zt∂LT⋅dB)
∂ L ∂ U = ∑ t = 1 T ∂ L ∂ z t ⋅ ( x t ) T \frac{\partial L}{\partial U}=\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}}\cdot \left(x^{t}\right)^{T} ∂U∂L=t=1∑T∂zt∂L⋅(xt)T ∂ L ∂ W = ∑ t = 1 T ∂ L ∂ z t ⋅ ( h t − 1 ) T \frac{\partial L}{\partial W}=\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}}\cdot\left(h^{t-1}\right)^{T} ∂W∂L=t=1∑T∂zt∂L⋅(ht−1)T ∂ L ∂ B = ∑ t = 1 T ∂ L ∂ z t \frac{\partial L}{\partial B}=\sum_{t=1}^{T}\frac{\partial L}{\partial z^{t}} ∂B∂L=t=1∑T∂zt∂L
4. ∂ L ∂ x t \frac{\partial L}{\partial x^{t}} ∂xt∂L
d L = t r ( ∂ L ∂ z t T ⋅ d z t ) dL=tr\left(\frac{\partial L}{\partial z^{t}}^{T}\cdot dz^{t}\right) dL=tr(∂zt∂LT⋅dzt) = t r ( ∂ L ∂ z t T ⋅ U ⋅ d x t ) =tr\left(\frac{\partial L}{\partial z^{t}}^{T}\cdot U\cdot dx^{t}\right) =tr(∂zt∂LT⋅U⋅dxt)
∂ L ∂ x t = U T ⋅ ∂ L ∂ z t \frac{\partial L}{\partial x^{t}}=U^{T}\cdot\frac{\partial L}{\partial z^{t}} ∂xt∂L=UT⋅∂zt∂L
5. ∂ L ∂ z t \frac{\partial L}{\partial z^{t}} ∂zt∂L
5.1 ∂ l t ∂ z t \frac{\partial l^{t}}{\partial z^{t}} ∂zt∂lt
d l t = t r ( ∂ l t ∂ y t T ⋅ d y t ) dl^{t}=tr\left(\frac{\partial l^{t}}{\partial y^{t}}^T\cdot dy^{t}\right) dlt=tr(∂yt∂ltT⋅dyt) = t r ( ∂ l t ∂ y t T ⋅ ( g ′ ( o t ) ⊙ d o t ) ) =tr\left(\frac{\partial l^{t}}{\partial y^{t}}^T\cdot\left(g^{'}\left(o^{t}\right)\odot do^{t}\right)\right) =tr(∂yt∂ltT⋅(g′(ot)⊙dot)) = t r ( ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ V ⋅ d h t ) =tr\left(\left(\frac{\partial l^{t}}{\partial y^{t}}^T\odot\left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot V\cdot dh^{t}\right) =tr((∂yt∂ltT⊙(g′(ot))T)⋅V⋅dht) = t r ( ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ V ⋅ ( f ′ ( z t ) ⊙ d z t ) ) =tr\left(\left(\frac{\partial l^{t}}{\partial y^{t}}^T\odot\left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot V\cdot \left(f^{'}\left(z^{t}\right)\odot dz^{t}\right)\right) =tr((∂yt∂ltT⊙(g′(ot))T)⋅V⋅(f′(zt)⊙dzt)) = t r { [ ( ( ∂ l t ∂ y t T ⊙ ( g ′ ( o t ) ) T ) ⋅ V ) ⊙ ( f ′ ( z t ) ) T ] ⋅ d z t } =tr\left\{\left[\left(\left(\frac{\partial l^{t}}{\partial y^{t}}^T\odot\left(g^{'}\left(o^{t}\right)\right)^{T}\right)\cdot V\right)\odot \left(f^{'}\left(z^{t}\right)\right)^{T}\right]\cdot dz^{t}\right\} =tr{[((∂yt∂ltT⊙(g′(ot))T)⋅V)⊙(f′(zt))T]⋅dzt}
∂ l t ∂ z t = ( V T ⋅ ( ∂ l t ∂ y t ⊙ g ′ ( o t ) ) ) ⊙ f ′ ( z t ) \frac{\partial l^{t}}{\partial z^{t}} =\left(V^{T}\cdot\left(\frac{\partial l^{t}}{\partial y^{t}}\odot g^{'}\left(o^{t}\right)\right)\right)\odot f^{'}\left(z^{t}\right) ∂zt∂lt=(VT⋅(∂yt∂lt⊙g′(ot)))⊙f′(zt)
5.2 ∂ l k ∂ z t , ( k ≥ t + 1 ) \frac{\partial l^{k}}{\partial z^{t}},\left(k\ge t+1\right) ∂zt∂lk,(k≥t+1)
d l k = t r ( ∂ l k ∂ z t + 1 T ⋅ d z t + 1 ) dl^{k}=tr\left(\frac{\partial l^{k}}{\partial z^{t+1}}^{T}\cdot dz^{t+1}\right) dlk=tr(∂zt+1∂lkT⋅dzt+1) = t r ( ∂ l k ∂ z t + 1 T ⋅ W ⋅ ( f ′ ( z t ) ⊙ d z t ) ) =tr\left(\frac{\partial l^{k}}{\partial z^{t+1}}^{T}\cdot W\cdot \left(f^{'}\left(z^{t}\right)\odot dz^{t}\right)\right) =tr(∂zt+1∂lkT⋅W⋅(f′(zt)⊙dzt)) = t r { ( ( ∂ l k ∂ z t + 1 T ⋅ W ) ⊙ ( f ′ ( z t ) ) T ) ⋅ d z t } =tr\left\{\left(\left(\frac{\partial l^{k}}{\partial z^{t+1}}^{T}\cdot W\right)\odot \left(f^{'}\left(z^{t}\right)\right)^{T}\right)\cdot dz^{t}\right\} =tr{((∂zt+1∂lkT⋅W)⊙(f′(zt))T)⋅dzt}
∂ l k ∂ z t = ( W T ⋅ ∂ l k ∂ z t + 1 ) ⊙ f ′ ( z t ) \frac{\partial l^{k}}{\partial z^{t}} =\left(W^{T}\cdot\frac{\partial l^{k}}{\partial z^{t+1}}\right)\odot f^{'}\left(z^{t}\right) ∂zt∂lk=(WT⋅∂zt+1∂lk)⊙f′(zt)
5.3 ∂ L ∂ z t \frac{\partial L}{\partial z^{t}} ∂zt∂L递推公式
∂ L ∂ z t = ∂ l t ∂ z t + ∑ k = t + 1 T ∂ l k ∂ z t \frac{\partial L}{\partial z^{t}}=\frac{\partial l^{t}}{\partial z^{t}}+\sum_{k=t+1}^{T}\frac{\partial l^{k}}{\partial z^{t}} ∂zt∂L=∂zt∂lt+k=t+1∑T∂zt∂lk = ∂ l t ∂ z t + ( W T ⋅ ∑ k = t + 1 T ∂ l k ∂ z t + 1 ) ⊙ f ′ ( z t ) =\frac{\partial l^{t}}{\partial z^{t}}+\left(W^{T}\cdot\sum_{k=t+1}^{T}\frac{\partial l^{k}}{\partial z^{t+1}}\right)\odot f^{'}\left(z^{t}\right) =∂zt∂lt+(WT⋅k=t+1∑T∂zt+1∂lk)⊙f′(zt) = ( V T ⋅ ( ∂ l t ∂ y t ⊙ g ′ ( o t ) ) ) ⊙ f ′ ( z t ) + ( W T ⋅ ∂ L ∂ z t + 1 ) ⊙ f ′ ( z t ) =\left(V^{T}\cdot\left(\frac{\partial l^{t}}{\partial y^{t}}\odot g^{'}\left(o^{t}\right)\right)\right)\odot f^{'}\left(z^{t}\right)+\left(W^{T}\cdot\frac{\partial L}{\partial z^{t+1}}\right)\odot f^{'}\left(z^{t}\right) =(VT⋅(∂yt∂lt⊙g′(ot)))⊙f′(zt)+(WT⋅∂zt+1∂L)⊙f′(zt)
6.补充说明
只有比 t t t时刻更晚的 l t , l t + 1 , … , l T l^{t},l^{t+1},\dots ,l^{T} lt,lt+1,…,lT才会与 z t z^{t} zt有关,所以当 k < t k<t k<t时, ∂ l k ∂ z t = 0 \frac{\partial l^{k}}{\partial z^{t}}=0 ∂zt∂lk=0 及 : ∂ L ∂ z t = ∑ k = 1 T ∂ l k ∂ z t = ∑ k = t T ∂ l k ∂ z t 及:\frac{\partial L}{\partial z^{t}}=\sum_{k=1}^{T}\frac{\partial l^{k}}{\partial z^{t}}=\sum_{k=t}^{T}\frac{\partial l^{k}}{\partial z^{t}} 及:∂zt∂L=k=1∑T∂zt∂lk=k=t∑T∂zt∂lk