计算梯度的有效算法
1.1 链式法则
∂
C
∂
w
=
∂
z
∂
w
∂
C
∂
z
\frac{ \partial C}{\partial w} = \frac{\partial z}{\partial w}\frac{\partial C}{\partial z}
∂w∂C=∂w∂z∂z∂C
Forward pass : 计算所有参数的
∂
z
∂
w
=
?
\frac{\partial z}{\partial w} = ?
∂w∂z=?
Backward pass : 计算所有激活函数输入z的
∂
C
∂
z
\frac{\partial C}{\partial z}
∂z∂C
连接前面权值的输入值
∂
z
∂
w
1
=
x
1
\frac{\partial z}{\partial w_{1}} = x_{1}
∂w1∂z=x1
∂
z
∂
w
2
=
x
2
\frac{\partial z}{\partial w_{2}} = x_{2}
∂w2∂z=x2
for example:compute forward pass
Backward pass
总结为:
∂
C
∂
z
=
σ
′
(
z
)
[
w
3
∂
C
∂
z
′
+
w
4
∂
C
∂
z
′
′
]
\frac{\partial C}{\partial z} = \sigma^{'}(z)[w_{3}\frac{\partial C}{\partial z^{'}} + w_{4}\frac{\partial C}{\partial z^{''}}]
∂z∂C=σ′(z)[w3∂z′∂C+w4∂z′′∂C]
其中
σ
′
(
z
)
\sigma^{'}(z)
σ′(z)是一个常数,在forward pass中已经算出来了。
总结