1、符号设定
- a i − 1 a_{i-1} ai−1为第 i i i层输入向量
- x i x_{i} xi为第 i i i层输出向量
- W i W_{i} Wi为第 i i i层到下一层的权重矩阵
- σ \sigma σ 为每一层的sigmod激活函数
- o o o为最后一层的下标, y y y为最后一层的真实值, h h h为预测值
-
L
L
L 为一个样本的损失函数,其中
L
=
1
2
(
y
−
h
)
2
L=\frac{1}{2}(y-h)^2
L=21(y−h)2
根据设定,我们有以下等式
x i = σ ( a i − 1 ) x_{i}=\sigma(a_{i-1}) xi=σ(ai−1) a i = W i x i a_{i}=W_{i}x_{i} ai=Wixi h = σ ( a o ) = σ ( W o x o ) h=\sigma(a_{o})=\sigma(W_{o}x_{o}) h=σ(ao)=σ(Woxo)
2、对于输出层
∂
L
∂
W
o
=
∂
L
∂
h
∂
h
∂
a
o
∂
a
o
∂
W
o
=
(
h
−
y
)
h
(
1
−
h
)
x
o
\frac{\partial L}{\partial W_o}=\frac{\partial L}{\partial h}\frac{\partial h}{\partial a_o}\frac{\partial a_o}{\partial W_o}=(h-y)h(1-h)x_o
∂Wo∂L=∂h∂L∂ao∂h∂Wo∂ao=(h−y)h(1−h)xo
令误差
δ
o
=
∂
L
∂
a
o
=
(
h
−
y
)
h
(
1
−
h
)
\delta_o=\frac{\partial L}{\partial a_o}=(h-y)h(1-h)
δo=∂ao∂L=(h−y)h(1−h),则
∂
L
∂
W
o
=
δ
o
x
o
\frac{\partial L}{\partial W_o}=\delta_ox_o
∂Wo∂L=δoxo
对于
W
o
W_o
Wo梯度更新公式为:
W
o
=
W
o
−
η
δ
o
x
o
W_o=W_o-\eta\delta_ox_o
Wo=Wo−ηδoxo
3、对于上一个隐藏层 k k k
由于
a
o
=
W
o
x
o
=
W
o
σ
(
a
k
)
a_o=W_ox_o=W_o\sigma(a_k)
ao=Woxo=Woσ(ak),
a
k
=
W
k
x
k
a_k=W_kx_k
ak=Wkxk,则有
∂
L
∂
W
k
=
∂
L
∂
h
∂
h
∂
a
o
∂
a
o
∂
a
k
∂
a
k
∂
W
k
=
(
h
−
y
)
h
(
1
−
h
)
W
o
a
o
(
1
−
a
o
)
x
k
\frac{\partial L}{\partial W_k}=\frac{\partial L}{\partial h}\frac{\partial h}{\partial a_o}\frac{\partial a_o}{\partial a_k}\frac{\partial a_k}{\partial W_k}=(h-y)h(1-h)W_oa_o(1-a_o)x_k
∂Wk∂L=∂h∂L∂ao∂h∂ak∂ao∂Wk∂ak=(h−y)h(1−h)Woao(1−ao)xk
令隐藏层
k
k
k的误差
δ
k
=
∂
L
∂
a
k
=
δ
o
W
o
a
o
(
1
−
a
o
)
\delta_k=\frac{\partial L}{\partial a_k}=\delta_oW_oa_o(1-a_o)
δk=∂ak∂L=δoWoao(1−ao)则有
∂
L
∂
W
k
=
δ
k
x
k
\frac{\partial L}{\partial W_k}=\delta_kx_k
∂Wk∂L=δkxk
对于
W
k
W_k
Wk梯度更新公式为:
W
k
=
W
k
−
η
δ
k
x
k
W_k=W_k-\eta\delta_kx_k
Wk=Wk−ηδkxk
若存在隐藏层 k − 1 k-1 k−1,同理可得隐藏层 k − 1 k-1 k−1的误差 δ k − 1 = ∂ L ∂ a k − 1 = δ k W k a k ( 1 − a k ) \delta_{k-1}=\frac{\partial L}{\partial a_{k-1}}=\delta_kW_ka_k(1-a_k) δk−1=∂ak−1∂L=δkWkak(1−ak)则有 ∂ L ∂ W k − 1 = δ k − 1 x k − 1 \frac{\partial L}{\partial W_{k-1}}=\delta_{k-1}x_{k-1} ∂Wk−1∂L=δk−1xk−1对于 W k − 1 W_{k-1} Wk−1梯度更新公式为: W k − 1 = W k − 1 − η δ k − 1 x k − 1 W_{k-1}=W_{k-1}-\eta\delta_{k-1}x_{k-1} Wk−1=Wk−1−ηδk−1xk−1
4、计算顺序
根据2、3可知,当要更新网络中神经元的所有权重时,需从输出层开始并逐步后向计算各隐藏层的输入值和输出值和误差项