标题:Elman神经网络
by:Z.H.Gao
一.网络结构
N—L—M拓展N—L—L—M,样本数为n。
F
i
g
.
1
E
l
m
a
n
网
络
样
本
逐
个
输
入
模
式
Fig.1 Elman网络样本逐个输入模式
Fig.1Elman网络样本逐个输入模式
F
i
g
.
2
E
l
m
a
n
网
络
单
个
样
本
网
络
结
构
Fig.2 Elman网络单个样本网络结构
Fig.2Elman网络单个样本网络结构
二.正向传播
h
i
=
f
(
t
e
m
p
h
i
)
=
f
(
v
x
i
+
b
i
n
+
u
h
i
−
1
)
{h^i}{\rm{ = }}f\left( {temp{h^i}} \right) = f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right)
hi=f(temphi)=f(vxi+bin+uhi−1)
y
i
=
f
(
t
e
m
p
y
i
)
=
f
(
w
h
i
)
=
f
(
w
⋅
f
(
v
x
i
+
b
i
n
+
u
h
i
−
1
)
)
{y^i} = f\left( {temp{y^i}} \right) = f\left( {w{h^i}} \right) = f\left( {w \cdot f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right)} \right)
yi=f(tempyi)=f(whi)=f(w⋅f(vxi+bin+uhi−1))
1. 所有样本逐个计算
随着输入数据的不断增加,自循环的结构把上一次的状态传递给当前输入,一起作为新的输入数据进行当前轮次的训练和学习,一直到输入或者训练结束,最终得到的输出即为最终的预测结果。
2. 如果输入为[ x 1 × N i x_{1 \times N}^i x1×Ni],那么输入权值[ v N × L {v_{N \times L}} vN×L]
隐藏层:[
h
1
×
L
i
h_{1 \times L}^i
h1×Li]
承接层:[
h
1
×
L
i
−
1
h_{1 \times L}^{i - 1}
h1×Li−1]
承接层与隐藏层之间链接权值:[
u
L
×
L
{u_{L \times L}}
uL×L]
输出链接权值:[
w
L
×
M
{w_{L \times M}}
wL×M]
输出层:[
y
1
×
M
i
y_{1 \times M}^i
y1×Mi]
此时i表示样本,每次输入一个样本。因为存在承接层所以通常逐个输入样本,当然也可以逐批输入。
三.反向计算 (Back Propagation Through Time, BPTT)
1. 已知:
h
i
=
f
(
t
e
m
p
h
i
)
=
f
(
v
x
i
+
b
i
n
+
u
h
i
−
1
)
{h^i}{\rm{ = }}f\left( {temp{h^i}} \right) = f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right)
hi=f(temphi)=f(vxi+bin+uhi−1)
y
i
=
f
(
t
e
m
p
y
i
)
=
f
(
w
h
i
)
=
f
(
w
⋅
f
(
v
x
i
+
b
i
n
+
u
h
i
−
1
)
)
{y^i} = f\left( {temp{y^i}} \right) = f\left( {w{h^i}} \right) = f\left( {w \cdot f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right)} \right)
yi=f(tempyi)=f(whi)=f(w⋅f(vxi+bin+uhi−1))
其中,i为样本序号。
2. 参数w的梯度计算
设计损失函数为
J
(
Y
,
T
a
r
g
e
t
)
J\left( {Y,Target} \right)
J(Y,Target),那么对于单个样本i。
注:统一使用表示元素乘法,×表示矩阵乘法*
∂
J
i
∂
t
e
m
p
y
i
=
∂
J
i
∂
y
i
∗
∂
y
i
∂
t
e
m
p
y
i
=
∂
J
i
∂
y
i
∗
f
′
(
t
e
m
p
y
i
)
\frac{{\partial {J^i}}}{{\partial temp{y^i}}} = \frac{{\partial {J^i}}}{{\partial {y^i}}}*\frac{{\partial {y^i}}}{{\partial temp{y^i}}} = \frac{{\partial {J^i}}}{{\partial {y^i}}}*f'\left( {temp{y^i}} \right)
∂tempyi∂Ji=∂yi∂Ji∗∂tempyi∂yi=∂yi∂Ji∗f′(tempyi)
∂
J
i
∂
w
=
∂
J
i
∂
y
i
∗
∂
y
i
∂
t
e
m
p
y
i
×
∂
t
e
m
p
y
i
∂
w
=
(
h
i
)
T
×
∂
J
i
∂
y
i
∗
f
′
(
t
e
m
p
y
i
)
\frac{{\partial {J^i}}}{{\partial w}} = \frac{{\partial {J^i}}}{{\partial {y^i}}}*\frac{{\partial {y^i}}}{{\partial temp{y^i}}} \times \frac{{\partial temp{y^i}}}{{\partial w}} = {\left( {{h^i}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial {y^i}}}*f'\left( {temp{y^i}} \right)
∂w∂Ji=∂yi∂Ji∗∂tempyi∂yi×∂w∂tempyi=(hi)T×∂yi∂Ji∗f′(tempyi)
∂
J
i
∂
h
i
=
∂
J
i
∂
y
i
∗
∂
y
i
∂
t
e
m
p
y
i
×
∂
t
e
m
p
y
i
∂
h
i
=
∂
J
i
∂
y
i
∗
f
′
(
t
e
m
p
y
i
)
×
w
T
\frac{{\partial {J^i}}}{{\partial {h^i}}} = \frac{{\partial {J^i}}}{{\partial {y^i}}}*\frac{{\partial {y^i}}}{{\partial temp{y^i}}} \times \frac{{\partial temp{y^i}}}{{\partial {h^i}}} = \frac{{\partial {J^i}}}{{\partial {y^i}}}*f'\left( {temp{y^i}} \right) \times {w^T}
∂hi∂Ji=∂yi∂Ji∗∂tempyi∂yi×∂hi∂tempyi=∂yi∂Ji∗f′(tempyi)×wT
那么对于所有的样本,w的梯度计算等同于普通前馈神经网络
∂
J
∂
w
=
∑
i
=
1
n
(
h
i
)
T
×
∂
J
i
∂
y
i
∗
f
′
(
t
e
m
p
y
i
)
=
H
T
×
∂
J
∂
Y
∗
f
′
(
t
e
m
p
Y
)
\frac{{\partial J}}{{\partial w}} = \sum\limits_{i = 1}^n {{{\left( {{h^i}} \right)}^T} \times \frac{{\partial {J^i}}}{{\partial {y^i}}}*f'\left( {temp{y^i}} \right)} = {H^T} \times \frac{{\partial J}}{{\partial Y}}*f'\left( {tempY} \right)
∂w∂J=i=1∑n(hi)T×∂yi∂Ji∗f′(tempyi)=HT×∂Y∂J∗f′(tempY)
同理,
∂
J
∂
H
=
∂
J
∂
Y
∗
f
′
(
t
e
m
p
Y
)
×
w
T
\frac{{\partial J}}{{\partial H}} = \frac{{\partial J}}{{\partial Y}}*f'\left( {tempY} \right) \times {w^T}
∂H∂J=∂Y∂J∗f′(tempY)×wT
3. 隐藏层hi与hi-1承接层的梯度计算
参数u的计算关系到当前样本与之前样本的链接,需要用“循环”计算梯度。
∂
J
∂
t
e
m
p
H
=
∂
J
∂
H
∗
f
′
(
t
e
m
p
H
)
=
[
∂
J
∂
Y
∗
f
′
(
t
e
m
p
Y
)
×
w
T
]
∗
f
′
(
t
e
m
p
H
)
\frac{{\partial J}}{{\partial tempH}} = \frac{{\partial J}}{{\partial H}}*f'\left( {tempH} \right) = \left[ {\frac{{\partial J}}{{\partial Y}}*f'\left( {tempY} \right) \times {w^T}} \right]*f'\left( {tempH} \right)
∂tempH∂J=∂H∂J∗f′(tempH)=[∂Y∂J∗f′(tempY)×wT]∗f′(tempH)
则,
∂
J
i
∂
t
e
m
p
h
i
=
∂
J
∂
t
e
m
p
H
(
i
,
:
)
\frac{{\partial {J^i}}}{{\partial temp{h^i}}} = \frac{{\partial J}}{{\partial tempH}}\left( {i,:} \right)
∂temphi∂Ji=∂tempH∂J(i,:),循环的重点,每次计算单个样本:
∂
J
i
∂
h
i
−
1
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
h
i
−
1
=
∂
J
i
∂
t
e
m
p
h
i
×
u
T
∂
J
i
∂
t
e
m
p
h
i
−
1
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
h
i
−
1
∗
∂
h
i
−
1
∂
t
e
m
p
h
i
−
1
=
(
∂
J
i
∂
t
e
m
p
h
i
×
u
T
)
∗
f
′
(
t
e
m
p
h
i
−
1
)
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 1}}} \right) \end{array}
∂hi−1∂Ji=∂temphi∂Ji∗∂hi−1∂temphi=∂temphi∂Ji×uT∂temphi−1∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1=(∂temphi∂Ji×uT)∗f′(temphi−1)
∂
J
i
∂
h
i
−
2
=
∂
J
i
∂
t
e
m
p
h
i
−
1
∗
∂
t
e
m
p
h
i
−
1
∂
h
i
−
2
=
∂
J
i
∂
t
e
m
p
h
i
−
1
×
u
T
∂
J
i
∂
t
e
m
p
h
i
−
2
=
∂
J
i
∂
t
e
m
p
h
i
−
1
∗
∂
t
e
m
p
h
i
−
1
∂
h
i
−
2
∗
∂
h
i
−
2
∂
t
e
m
p
h
i
−
2
=
(
∂
J
i
∂
t
e
m
p
h
i
−
1
×
u
T
)
∗
f
′
(
t
e
m
p
h
i
−
2
)
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}}*\frac{{\partial {h^{i - 2}}}}{{\partial temp{h^{i - 2}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 2}}} \right) \end{array}
∂hi−2∂Ji=∂temphi−1∂Ji∗∂hi−2∂temphi−1=∂temphi−1∂Ji×uT∂temphi−2∂Ji=∂temphi−1∂Ji∗∂hi−2∂temphi−1∗∂temphi−2∂hi−2=(∂temphi−1∂Ji×uT)∗f′(temphi−2)
循环是为了计算当前样本误差Ji受前k次样本的影响。在计算上是利用当前样本误差Ji去计算前k次网络与当前网络之间的链接权值u。
4. 参数u与v的梯度计算
对于单个样本i而言,其对当前网络的影响可以计算相应的梯度:
∂
J
i
∂
u
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
u
=
(
h
i
−
1
)
T
×
∂
J
i
∂
t
e
m
p
h
i
∂
J
i
∂
v
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
x
i
=
(
x
i
)
T
×
∂
J
i
∂
t
e
m
p
h
i
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial u}} = {\left( {{h^{i - 1}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^i}}}\\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {x^i}}} = {\left( {{x^i}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \end{array}
∂u∂Ji=∂temphi∂Ji∗∂u∂temphi=(hi−1)T×∂temphi∂Ji∂v∂Ji=∂temphi∂Ji∗∂xi∂temphi=(xi)T×∂temphi∂Ji
那么前k个样本对于单个样本i的影响,都需要通过参数u和v,有
∂
J
i
∂
u
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
u
=
∑
k
=
1
i
(
h
k
−
1
)
T
×
∂
J
i
∂
t
e
m
p
h
k
∂
J
i
∂
v
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
x
i
=
∑
k
=
1
i
(
x
k
)
T
×
∂
J
i
∂
t
e
m
p
h
k
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial u}} = \sum\limits_{k = 1}^i {\left( {{h^{k - 1}}} \right)^T} \times{\frac{{\partial {J^i}}}{{\partial temp{h^k}}}} \\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {x^i}}} = \sum\limits_{k = 1}^i{\left( {{x^k}} \right)^T} \times {\frac{{\partial {J^i}}}{{\partial temp{h^k}}}} \end{array}
∂u∂Ji=∂temphi∂Ji∗∂u∂temphi=k=1∑i(hk−1)T×∂temphk∂Ji∂v∂Ji=∂temphi∂Ji∗∂xi∂temphi=k=1∑i(xk)T×∂temphk∂Ji
四.通式(针对第i个样本)
假设k=3,显然有
∂
J
i
∂
h
i
−
1
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
h
i
−
1
=
∂
J
i
∂
t
e
m
p
h
i
×
u
T
∂
J
i
∂
t
e
m
p
h
i
−
1
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
h
i
−
1
∗
∂
h
i
−
1
∂
t
e
m
p
h
i
−
1
=
(
∂
J
i
∂
t
e
m
p
h
i
×
u
T
)
∗
f
′
(
t
e
m
p
h
i
−
1
)
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 1}}} \right) \end{array}
∂hi−1∂Ji=∂temphi∂Ji∗∂hi−1∂temphi=∂temphi∂Ji×uT∂temphi−1∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1=(∂temphi∂Ji×uT)∗f′(temphi−1)
∂
J
i
∂
h
i
−
2
=
∂
J
i
∂
t
e
m
p
h
i
−
1
∗
∂
t
e
m
p
h
i
−
1
∂
h
i
−
2
=
∂
J
i
∂
t
e
m
p
h
i
−
1
×
u
T
∂
J
i
∂
t
e
m
p
h
i
−
2
=
∂
J
i
∂
t
e
m
p
h
i
−
1
∗
∂
t
e
m
p
h
i
−
1
∂
h
i
−
2
∗
∂
h
i
−
2
∂
t
e
m
p
h
i
−
2
=
(
∂
J
i
∂
t
e
m
p
h
i
−
1
×
u
T
)
∗
f
′
(
t
e
m
p
h
i
−
2
)
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}}*\frac{{\partial {h^{i - 2}}}}{{\partial temp{h^{i - 2}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 2}}} \right) \end{array}
∂hi−2∂Ji=∂temphi−1∂Ji∗∂hi−2∂temphi−1=∂temphi−1∂Ji×uT∂temphi−2∂Ji=∂temphi−1∂Ji∗∂hi−2∂temphi−1∗∂temphi−2∂hi−2=(∂temphi−1∂Ji×uT)∗f′(temphi−2)
∂
J
i
∂
h
i
−
3
=
∂
J
i
∂
t
e
m
p
h
i
−
2
∗
∂
t
e
m
p
h
i
−
2
∂
h
i
−
3
=
∂
J
i
∂
t
e
m
p
h
i
−
2
×
u
T
∂
J
i
∂
t
e
m
p
h
i
−
3
=
∂
J
i
∂
t
e
m
p
h
i
−
2
∗
∂
t
e
m
p
h
i
−
2
∂
h
i
−
3
∗
∂
h
i
−
3
∂
t
e
m
p
h
i
−
3
=
(
∂
J
i
∂
t
e
m
p
h
i
−
2
×
u
T
)
∗
f
′
(
t
e
m
p
h
i
−
3
)
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - 3}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}}*\frac{{\partial temp{h^{i - 2}}}}{{\partial {h^{i - 3}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 3}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}}*\frac{{\partial temp{h^{i - 2}}}}{{\partial {h^{i - 3}}}}*\frac{{\partial {h^{i - 3}}}}{{\partial temp{h^{i - 3}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 3}}} \right) \end{array}
∂hi−3∂Ji=∂temphi−2∂Ji∗∂hi−3∂temphi−2=∂temphi−2∂Ji×uT∂temphi−3∂Ji=∂temphi−2∂Ji∗∂hi−3∂temphi−2∗∂temphi−3∂hi−3=(∂temphi−2∂Ji×uT)∗f′(temphi−3)
可以归纳其通式:
∂
J
i
∂
h
i
−
k
=
∂
J
i
∂
t
e
m
p
h
i
−
k
+
1
×
u
T
∂
J
i
∂
t
e
m
p
h
i
−
3
=
(
∂
J
i
∂
t
e
m
p
h
i
−
k
+
1
×
u
T
)
∗
f
′
(
t
e
m
p
h
i
−
k
)
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial {h^{i - k}}}} = \frac{{\partial {J^i}}}{{\partial temp{h^{i - k + 1}}}} \times {u^T}\\ \frac{{\partial {J^i}}}{{\partial temp{h^{i - 3}}}} = \left( {\frac{{\partial {J^i}}}{{\partial temp{h^{i - k + 1}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - k}}} \right) \end{array}
∂hi−k∂Ji=∂temphi−k+1∂Ji×uT∂temphi−3∂Ji=(∂temphi−k+1∂Ji×uT)∗f′(temphi−k)
相应的对于参数u和v有:
∂
J
i
∂
u
=
∂
J
i
∂
t
e
m
p
h
i
×
∂
t
e
m
p
h
i
∂
u
=
(
h
i
−
1
)
T
×
∂
J
i
∂
t
e
m
p
h
i
∂
J
i
∂
v
=
∂
J
i
∂
t
e
m
p
h
i
×
∂
t
e
m
p
h
i
∂
x
i
=
(
x
i
)
T
×
∂
J
i
∂
t
e
m
p
h
i
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times \frac{{\partial temp{h^i}}}{{\partial u}} = {\left( {{h^{i - 1}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^i}}}\\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \times \frac{{\partial temp{h^i}}}{{\partial {x^i}}} = {\left( {{x^i}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^i}}} \end{array}
∂u∂Ji=∂temphi∂Ji×∂u∂temphi=(hi−1)T×∂temphi∂Ji∂v∂Ji=∂temphi∂Ji×∂xi∂temphi=(xi)T×∂temphi∂Ji
∂
J
i
∂
u
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
h
i
−
1
∗
∂
h
i
−
1
∂
t
e
m
p
h
i
−
1
×
∂
t
e
m
p
h
i
−
1
∂
u
=
(
h
i
−
2
)
T
×
∂
J
i
∂
t
e
m
p
h
i
−
1
∂
J
i
∂
v
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
h
i
−
1
∗
∂
h
i
−
1
∂
t
e
m
p
h
i
−
1
×
∂
t
e
m
p
h
i
−
1
∂
v
=
(
x
i
−
1
)
T
×
∂
J
i
∂
t
e
m
p
h
i
−
1
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}} \times \frac{{\partial temp{h^{i - 1}}}}{{\partial u}} = {\left( {{h^{i - 2}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}}\\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}} \times \frac{{\partial temp{h^{i - 1}}}}{{\partial v}} = {\left( {{x^{i - 1}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^{i - 1}}}} \end{array}
∂u∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1×∂u∂temphi−1=(hi−2)T×∂temphi−1∂Ji∂v∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1×∂v∂temphi−1=(xi−1)T×∂temphi−1∂Ji
∂
J
i
∂
u
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
h
i
−
1
∗
∂
h
i
−
1
∂
t
e
m
p
h
i
−
1
∗
∂
t
e
m
p
h
i
−
1
∂
h
i
−
2
∗
∂
h
i
−
1
∂
t
e
m
p
h
i
−
2
×
∂
t
e
m
p
h
i
−
2
∂
u
=
(
h
i
−
3
)
T
×
∂
J
i
∂
t
e
m
p
h
i
−
2
∂
J
i
∂
v
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
h
i
−
1
∗
∂
h
i
−
1
∂
t
e
m
p
h
i
−
1
∗
∂
t
e
m
p
h
i
−
1
∂
h
i
−
2
∗
∂
h
i
−
2
∂
t
e
m
p
h
i
−
2
×
∂
t
e
m
p
h
i
−
2
∂
v
=
(
x
i
−
2
)
T
×
∂
J
i
∂
t
e
m
p
h
i
−
2
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 2}}}} \times \frac{{\partial temp{h^{i - 2}}}}{{\partial u}} = {\left( {{h^{i - 3}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}}\\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {h^{i - 1}}}}*\frac{{\partial {h^{i - 1}}}}{{\partial temp{h^{i - 1}}}}*\frac{{\partial temp{h^{i - 1}}}}{{\partial {h^{i - 2}}}}*\frac{{\partial {h^{i - 2}}}}{{\partial temp{h^{i - 2}}}} \times \frac{{\partial temp{h^{i - 2}}}}{{\partial v}} = {\left( {{x^{i - 2}}} \right)^T} \times \frac{{\partial {J^i}}}{{\partial temp{h^{i - 2}}}} \end{array}
∂u∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1∗∂hi−2∂temphi−1∗∂temphi−2∂hi−1×∂u∂temphi−2=(hi−3)T×∂temphi−2∂Ji∂v∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1∗∂hi−2∂temphi−1∗∂temphi−2∂hi−2×∂v∂temphi−2=(xi−2)T×∂temphi−2∂Ji
通过将反向传播到前k层的链接权值u和v求和,得到最终的梯度结果:
∂
J
i
∂
u
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
u
=
∑
k
=
1
i
(
h
k
−
1
)
T
×
∂
J
i
∂
t
e
m
p
h
k
∂
J
i
∂
v
=
∂
J
i
∂
t
e
m
p
h
i
∗
∂
t
e
m
p
h
i
∂
x
i
=
∑
k
=
1
i
(
x
k
)
T
×
∂
J
i
∂
t
e
m
p
h
k
\begin{array}{l} \frac{{\partial {J^i}}}{{\partial u}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial u}} = \sum\limits_{k = 1}^i {\left( {{h^{k - 1}}} \right)^T} \times{\frac{{\partial {J^i}}}{{\partial temp{h^k}}}} \\ \frac{{\partial {J^i}}}{{\partial v}} = \frac{{\partial {J^i}}}{{\partial temp{h^i}}}*\frac{{\partial temp{h^i}}}{{\partial {x^i}}} = \sum\limits_{k = 1}^i{\left( {{x^k}} \right)^T} \times {\frac{{\partial {J^i}}}{{\partial temp{h^k}}}} \end{array}
∂u∂Ji=∂temphi∂Ji∗∂u∂temphi=k=1∑i(hk−1)T×∂temphk∂Ji∂v∂Ji=∂temphi∂Ji∗∂xi∂temphi=k=1∑i(xk)T×∂temphk∂Ji
Matlab实现代码
https://blog.csdn.net/vendetta_gg/article/details/106444683
参考文献
[1] https://zhuanlan.zhihu.com/p/26891871
[2] https://zhuanlan.zhihu.com/p/26892413
[3] https://zybuluo.com/hanbingtao/note/541458