Backpropagation
Gradient Descent
Network parameters
θ
=
{
w
1
,
w
2
,
.
.
.
,
b
1
,
b
2
,
.
.
.
}
θ=\{w_1,w_2,...,b_1,b_2,...\}
θ={w1,w2,...,b1,b2,...}
尽管理论上可以直接计算,但Network里参数太多,实际运用中难以计算。
解决方案:Backpropagation算法。
前置技能:Chain Rule 链式法则
L
(
θ
)
=
∑
n
=
1
N
C
n
(
θ
)
→
α
L
(
θ
)
α
w
=
∑
n
=
1
N
α
C
n
(
θ
)
α
w
L(θ)=\sum_{n=1}^N{C_n(θ)} \rightarrow \frac{αL(θ)}{αw}=\sum_{n=1}^{N}{\frac{αC_n(θ)}{αw}}
L(θ)=∑n=1NCn(θ)→αwαL(θ)=∑n=1NαwαCn(θ)
例:
α
C
α
w
=
α
z
α
w
α
c
α
z
\frac{αC}{αw}=\frac{αz}{αw}\frac{αc}{αz}
αwαC=αwαzαzαc
Forward pass:Compute
α
z
α
w
\frac{αz}{αw}
αwαz for all parameters
Backward pass:Compute
α
c
α
z
\frac{αc}{αz}
αzαc for all
z
z
z
- Forward pass: α z α w 1 = x 1 , α z α w 2 = x 2 \frac{αz}{αw_1}=x_1,\frac{αz}{αw_2}=x_2 αw1αz=x1,αw2αz=x2,容易计算
- Backward pass:
Case 1:
α
c
α
z
=
α
a
α
z
α
c
α
a
=
σ
′
(
z
)
α
c
α
a
\frac{αc}{αz}=\frac{αa}{αz}\frac{αc}{αa}=σ^{'}(z)\frac{αc}{αa}
αzαc=αzαaαaαc=σ′(z)αaαc
α
c
α
a
=
α
z
′
α
a
α
c
α
z
′
+
α
z
′
′
α
a
α
c
α
z
′
′
=
w
3
α
c
α
z
′
+
w
4
α
c
α
z
′
′
\frac{αc}{αa}=\frac{αz'}{αa}\frac{αc}{αz'}+\frac{αz''}{αa}\frac{αc}{αz''}=w_3\frac{αc}{αz'}+w_4\frac{αc}{αz''}
αaαc=αaαz′αz′αc+αaαz′′αz′′αc=w3αz′αc+w4αz′′αc,假设已经知道了
w
3
w_3
w3和
w
4
w_4
w4
∴
α
c
α
z
=
σ
′
(
z
)
[
w
3
α
c
α
z
′
+
w
4
α
c
α
z
′
′
]
∴\frac{αc}{αz}=σ'(z)[w_3\frac{αc}{αz'}+w_4\frac{αc}{αz''}]
∴αzαc=σ′(z)[w3αz′αc+w4αz′′αc]
再次展开:
α
c
α
z
′
=
α
y
1
α
z
′
α
c
α
y
1
\frac{αc}{αz'}=\frac{αy_1}{αz'}\frac{αc}{αy_1}
αz′αc=αz′αy1αy1αc,
α
c
α
z
′
′
=
α
y
2
α
z
′
′
α
c
α
y
2
\frac{αc}{αz''}=\frac{αy_2}{αz''}\frac{αc}{αy_2}
αz′′αc=αz′′αy2αy2αc
Case 2:
继续用Case 1的方法递归计算即可,知道结尾。
总结
就是用连式法则不断递归运算。