Elman神经网络

标题:Elman神经网络

by:Z.H.Gao

一.网络结构

N—L—M拓展N—L—L—M,样本数为n。
Fig.1	Elman网络样本逐个输入模式
F i g . 1 E l m a n 网 络 样 本 逐 个 输 入 模 式 Fig.1 Elman网络样本逐个输入模式 Fig.1Elman
Fig.2	Elman网络单个样本网络结构
F i g . 2 E l m a n 网 络 单 个 样 本 网 络 结 构 Fig.2 Elman网络单个样本网络结构 Fig.2Elman

二.正向传播

h i = f ( t e m p h i ) = f ( v x i + b i n + u h i − 1 ) {h^i}{\rm{ = }}f\left( {temp{h^i}} \right) = f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right) hi=f(temphi)=f(vxi+bin+uhi1)
y i = f ( t e m p y i ) = f ( w h i ) = f ( w ⋅ f ( v x i + b i n + u h i − 1 ) ) {y^i} = f\left( {temp{y^i}} \right) = f\left( {w{h^i}} \right) = f\left( {w \cdot f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right)} \right) yi=f(tempyi)=f(whi)=f(wf(vxi+bin+uhi1))

1. 所有样本逐个计算

随着输入数据的不断增加,自循环的结构把上一次的状态传递给当前输入,一起作为新的输入数据进行当前轮次的训练和学习,一直到输入或者训练结束,最终得到的输出即为最终的预测结果。

2. 如果输入为[ x 1 × N i x_{1 \times N}^i x1×Ni],那么输入权值[ v N × L {v_{N \times L}} vN×L]

隐藏层:[ h 1 × L i h_{1 \times L}^i h1×Li]
承接层:[ h 1 × L i − 1 h_{1 \times L}^{i - 1} h1×Li1]
承接层与隐藏层之间链接权值:[ u L × L {u_{L \times L}} uL×L]
输出链接权值:[ w L × M {w_{L \times M}} wL×M]
输出层:[ y 1 × M i y_{1 \times M}^i y1×Mi]
此时i表示样本,每次输入一个样本。因为存在承接层所以通常逐个输入样本,当然也可以逐批输入。

三.反向计算 (Back Propagation Through Time, BPTT)

1. 已知:

h i = f ( t e m p h i ) = f ( v x i + b i n + u h i − 1 ) {h^i}{\rm{ = }}f\left( {temp{h^i}} \right) = f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right) hi=f(temphi)=f(vxi+bin+uhi1)
y i = f ( t e m p y i ) = f ( w h i ) = f ( w ⋅ f ( v x i + b i n + u h i − 1 ) ) {y^i} = f\left( {temp{y^i}} \right) = f\left( {w{h^i}} \right) = f\left( {w \cdot f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right)} \right) yi=f(tempyi)=f(whi)=f(wf(vxi+bin+uhi1))
其中,i为样本序号。

2. 参数w的梯度计算

设计损失函数为 J ( Y , T a r g e t ) J\left( {Y,Target} \right) J(Y,Target),那么对于单个样本i。
注:统一使用表示元素乘法,×表示矩阵乘法*
∂ J i ∂ t e m p y i = ∂ J i ∂ y i ∗ ∂ y i ∂ t e m p y i = ∂ J i ∂ y i ∗ f ′ ( t e m p y i ) \frac{{\partial {J^i}}}{{\partial temp{y^i}}} = \frac{{\partial {J^i}}}{{\partial {y^i}}}*\frac{{\partial {y^i}}}{{\partial temp{y^i}}} = \frac{{\partial {J^i}}}{{\partial {y^i}}}*f'\left( {temp{y^i}} \right) tempyiJi=yiJitempyiyi=yiJif(tempyi)
∂ J i ∂ w = ∂ J i ∂ y i ∗ ∂ y i ∂ t e m p y i × ∂ t e m p y i ∂ w = ( h i ) T × ∂ J i ∂ y i ∗ f ′ ( t e m p y i ) \frac{{\partial {J^i}}}{{\partial w}} = \frac{{\partial {J^i}}}{{\partial {y^i}}}*\frac{{\partial {y^i}}}{{\partial temp{y^i}}} \times \frac{{\partial temp{y^i}}}{{\partial w}} = {\left( {{h^i}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial {y^i}}}*f'\left( {temp{y^i}} \right) wJi=yiJitempyiyi×wtempyi=(hi)T×yiJif(tempyi)
∂ J i ∂ h i = ∂ J i ∂ y i ∗ ∂ y i ∂ t e m p y i × ∂ t e m p y i ∂ h i = ∂ J i ∂ y i ∗ f ′ ( t e m p y i ) × w T \frac{{\partial {J^i}}}{{\partial {h^i}}} = \frac{{\partial {J^i}}}{{\partial {y^i}}}*\frac{{\partial {y^i}}}{{\partial temp{y^i}}} \times \frac{{\partial temp{y^i}}}{{\partial {h^i}}} = \frac{{\partial {J^i}}}{{\partial {y^i}}}*f'\left( {temp{y^i}} \right) \times {w^T} hiJi=yiJitempyiyi×hitempyi=yiJif(tempyi)×wT
那么对于所有的样本,w的梯度计算等同于普通前馈神经网络
∂ J ∂ w = ∑ i = 1 n ( h i ) T × ∂ J i ∂ y i ∗ f ′ ( t e m p y i ) = H T × ∂ J ∂ Y ∗ f ′ ( t e m p Y ) \frac{{\partial J}}{{\partial w}} = \sum\limits_{i = 1}^n {{{\left( {{h^i}} \right)}^T} \times \frac{{\partial {J^i}}}{{\partial {y^i}}}*f'\left( {temp{y^i}} \right)} = {H^T} \times \frac{{\partial J}}{{\partial Y}}*f'\left( {tempY} \right) wJ=i=1n(hi)T×yiJif(tempyi)=HT×YJf(tempY)
同理, ∂ J ∂ H = ∂ J ∂ Y ∗ f ′ ( t e m p Y ) × w T \frac{{\partial J}}{{\partial H}} = \frac{{\partial J}}{{\partial Y}}*f'\left( {tempY} \right) \times {w^T} HJ=YJf(tempY)×wT

3. 隐藏层hi与hi-1承接层的梯度计算

参数u的计算关系到当前样本与之前样本的链接,需要用“循环”计算梯度。
∂ J ∂ t e m p H = ∂ J ∂ H ∗ f ′ ( t e m p H ) = [ ∂ J ∂ Y ∗ f ′ ( t e m p Y ) × w T ] ∗ f ′ ( t e m p H ) \frac{{\partial J}}{{\partial tempH}} = \frac{{\partial J}}{{\partial H}}*f'\left( {tempH} \right) = \left[ {\frac{{\partial J}}{{\partial Y}}*f'\left( {tempY} \right) \times {w^T}} \right]*f'\left( {tempH} \right) tempHJ=HJf(tempH)=[YJf(tempY)×wT]f(tempH)
则, ∂ J i ∂ t e m p h i = ∂ J ∂ t e m p H ( i , : ) \frac{{\partial {J^i}}}{{\partial temp{h^i}}} = \frac{{\partial J}}{{\partial tempH}}\left( {i,:} \right) temphiJi=tempHJ(i,:),循环的重点,每次计算单个样本:
∂ J i ∂ h i − 1 = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 = ∂ J i ∂ t e m p h i × u T ∂ J i ∂ t e m p h i − 1 = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 = ( ∂ J i ∂ t e m p h i × u T ) ∗ f ′ ( t e m p h i − 1 ) \begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 1}}} \right) \end{array} hi1Ji=temphiJihi1temphi=temphiJi×uTtemphi1Ji=temphiJihi1temphitemphi1hi1=(temphiJi×uT)f(temphi1)
∂ J i ∂ h i − 2 = ∂ J i ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 = ∂ J i ∂ t e m p h i − 1 × u T ∂ J i ∂ t e m p h i − 2 = ∂ J i ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 ∗ ∂ h i − 2 ∂ t e m p h i − 2 = ( ∂ J i ∂ t e m p h i − 1 × u T ) ∗ f ′ ( t e m p h i − 2 ) \begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}}*\frac{{\partial {h^{i - 2}}}}{{\partial temp{h^{i - 2}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 2}}} \right) \end{array} hi2Ji=temphi1Jihi2temphi1=temphi1Ji×uTtemphi2Ji=temphi1Jihi2temphi1temphi2hi2=(temphi1Ji×uT)f(temphi2)
循环是为了计算当前样本误差Ji受前k次样本的影响。在计算上是利用当前样本误差Ji去计算前k次网络与当前网络之间的链接权值u。

4. 参数u与v的梯度计算

对于单个样本i而言,其对当前网络的影响可以计算相应的梯度:
∂ J i ∂ u = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ u = ( h i − 1 ) T × ∂ J i ∂ t e m p h i ∂ J i ∂ v = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ x i = ( x i ) T × ∂ J i ∂ t e m p h i \begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial u}} = {\left( {{h^{i - 1}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^i}}}\\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {x^i}}} = {\left( {{x^i}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \end{array} uJi=temphiJiutemphi=(hi1)T×temphiJivJi=temphiJixitemphi=(xi)T×temphiJi
那么前k个样本对于单个样本i的影响,都需要通过参数u和v,有
∂ J i ∂ u = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ u = ∑ k = 1 i ( h k − 1 ) T × ∂ J i ∂ t e m p h k ∂ J i ∂ v = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ x i = ∑ k = 1 i ( x k ) T × ∂ J i ∂ t e m p h k \begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial u}} = \sum\limits_{k = 1}^i {\left( {{h^{k - 1}}} \right)^T} \times{\frac{{\partial {J^i}}}{{\partial temp{h^k}}}} \\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {x^i}}} = \sum\limits_{k = 1}^i{\left( {{x^k}} \right)^T} \times {\frac{{\partial {J^i}}}{{\partial temp{h^k}}}} \end{array} uJi=temphiJiutemphi=k=1i(hk1)T×temphkJivJi=temphiJixitemphi=k=1i(xk)T×temphkJi

四.通式(针对第i个样本)

假设k=3,显然有
∂ J i ∂ h i − 1 = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 = ∂ J i ∂ t e m p h i × u T ∂ J i ∂ t e m p h i − 1 = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 = ( ∂ J i ∂ t e m p h i × u T ) ∗ f ′ ( t e m p h i − 1 ) \begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 1}}} \right) \end{array} hi1Ji=temphiJihi1temphi=temphiJi×uTtemphi1Ji=temphiJihi1temphitemphi1hi1=(temphiJi×uT)f(temphi1)
∂ J i ∂ h i − 2 = ∂ J i ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 = ∂ J i ∂ t e m p h i − 1 × u T ∂ J i ∂ t e m p h i − 2 = ∂ J i ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 ∗ ∂ h i − 2 ∂ t e m p h i − 2 = ( ∂ J i ∂ t e m p h i − 1 × u T ) ∗ f ′ ( t e m p h i − 2 ) \begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}}*\frac{{\partial {h^{i - 2}}}}{{\partial temp{h^{i - 2}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 2}}} \right) \end{array} hi2Ji=temphi1Jihi2temphi1=temphi1Ji×uTtemphi2Ji=temphi1Jihi2temphi1temphi2hi2=(temphi1Ji×uT)f(temphi2)
∂ J i ∂ h i − 3 = ∂ J i ∂ t e m p h i − 2 ∗ ∂ t e m p h i − 2 ∂ h i − 3 = ∂ J i ∂ t e m p h i − 2 × u T ∂ J i ∂ t e m p h i − 3 = ∂ J i ∂ t e m p h i − 2 ∗ ∂ t e m p h i − 2 ∂ h i − 3 ∗ ∂ h i − 3 ∂ t e m p h i − 3 = ( ∂ J i ∂ t e m p h i − 2 × u T ) ∗ f ′ ( t e m p h i − 3 ) \begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - 3}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}}*\frac{{\partial temp{h^{i - 2}}}}{{\partial {h^{i - 3}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 3}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}}*\frac{{\partial temp{h^{i - 2}}}}{{\partial {h^{i - 3}}}}*\frac{{\partial {h^{i - 3}}}}{{\partial temp{h^{i - 3}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 3}}} \right) \end{array} hi3Ji=temphi2Jihi3temphi2=temphi2Ji×uTtemphi3Ji=temphi2Jihi3temphi2temphi3hi3=(temphi2Ji×uT)f(temphi3)
可以归纳其通式:
∂ J i ∂ h i − k = ∂ J i ∂ t e m p h i − k + 1 × u T ∂ J i ∂ t e m p h i − 3 = ( ∂ J i ∂ t e m p h i − k + 1 × u T ) ∗ f ′ ( t e m p h i − k ) \begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - k}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - k + 1}}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 3}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^{i - k + 1}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - k}}} \right) \end{array} hikJi=temphik+1Ji×uTtemphi3Ji=(temphik+1Ji×uT)f(temphik)
相应的对于参数u和v有:
∂ J i ∂ u = ∂ J i ∂ t e m p h i × ∂ t e m p h i ∂ u = ( h i − 1 ) T × ∂ J i ∂ t e m p h i ∂ J i ∂ v = ∂ J i ∂ t e m p h i × ∂ t e m p h i ∂ x i = ( x i ) T × ∂ J i ∂ t e m p h i \begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times \frac{{\partial temp{h^i}}}{{\partial u}} = {\left( {{h^{i - 1}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^i}}}\\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times \frac{{\partial temp{h^i}}}{{\partial {x^i}}} = {\left( {{x^i}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \end{array} uJi=temphiJi×utemphi=(hi1)T×temphiJivJi=temphiJi×xitemphi=(xi)T×temphiJi
∂ J i ∂ u = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 × ∂ t e m p h i − 1 ∂ u = ( h i − 2 ) T × ∂ J i ∂ t e m p h i − 1 ∂ J i ∂ v = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 × ∂ t e m p h i − 1 ∂ v = ( x i − 1 ) T × ∂ J i ∂ t e m p h i − 1 \begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}} \times \frac{{\partial temp{h^{i - 1}}}}{{\partial u}} = {\left( {{h^{i - 2}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}}\\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}} \times \frac{{\partial temp{h^{i - 1}}}}{{\partial v}} = {\left( {{x^{i - 1}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} \end{array} uJi=temphiJihi1temphitemphi1hi1×utemphi1=(hi2)T×temphi1JivJi=temphiJihi1temphitemphi1hi1×vtemphi1=(xi1)T×temphi1Ji
∂ J i ∂ u = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 ∗ ∂ h i − 1 ∂ t e m p h i − 2 × ∂ t e m p h i − 2 ∂ u = ( h i − 3 ) T × ∂ J i ∂ t e m p h i − 2 ∂ J i ∂ v = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 ∗ ∂ h i − 2 ∂ t e m p h i − 2 × ∂ t e m p h i − 2 ∂ v = ( x i − 2 ) T × ∂ J i ∂ t e m p h i − 2 \begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 2}}}} \times \frac{{\partial temp{h^{i - 2}}}}{{\partial u}} = {\left( {{h^{i - 3}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}}\\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}}*\frac{{\partial {h^{i - 2}}}}{{\partial temp{h^{i - 2}}}} \times \frac{{\partial temp{h^{i - 2}}}}{{\partial v}} = {\left( {{x^{i - 2}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}} \end{array} uJi=temphiJihi1temphitemphi1hi1hi2temphi1temphi2hi1×utemphi2=(hi3)T×temphi2JivJi=temphiJihi1temphitemphi1hi1hi2temphi1temphi2hi2×vtemphi2=(xi2)T×temphi2Ji
通过将反向传播到前k层的链接权值u和v求和,得到最终的梯度结果:
∂ J i ∂ u = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ u = ∑ k = 1 i ( h k − 1 ) T × ∂ J i ∂ t e m p h k ∂ J i ∂ v = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ x i = ∑ k = 1 i ( x k ) T × ∂ J i ∂ t e m p h k \begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial u}} = \sum\limits_{k = 1}^i {\left( {{h^{k - 1}}} \right)^T} \times{\frac{{\partial {J^i}}}{{\partial temp{h^k}}}} \\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {x^i}}} = \sum\limits_{k = 1}^i{\left( {{x^k}} \right)^T} \times {\frac{{\partial {J^i}}}{{\partial temp{h^k}}}} \end{array} uJi=temphiJiutemphi=k=1i(hk1)T×temphkJivJi=temphiJixitemphi=k=1i(xk)T×temphkJi

Matlab实现代码

https://blog.csdn.net/vendetta_gg/article/details/106444683

参考文献

[1] https://zhuanlan.zhihu.com/p/26891871
[2] https://zhuanlan.zhihu.com/p/26892413
[3] https://zybuluo.com/hanbingtao/note/541458

  • 3
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值