BP
每个epoch:
\qquad
每个batch:
\qquad\qquad
每个level (n = N, … to 1,即从后往前):
\qquad\qquad\qquad
分别计算出该层误差(对该层参数、该层输入数据)的导数:
\qquad\qquad\qquad\qquad
∂
L
∂
ω
n
=
∂
L
∂
x
n
+
1
∂
x
n
+
1
∂
ω
n
\frac{\partial L}{\partial \omega^{n}} = \frac{\partial L}{\partial x^{n+1}} \frac{\partial x^{n+1}}{\partial \omega^{n}}
∂ωn∂L=∂xn+1∂L∂ωn∂xn+1 (更新本level的
ω
n
\omega^{n}
ωn时即用)
\qquad\qquad\qquad\qquad
∂
L
∂
x
n
=
∂
L
∂
x
n
+
1
∂
x
n
+
1
∂
x
n
\frac{\partial L}{\partial x^{n}} = \frac{\partial L}{\partial x^{n+1}} \frac{\partial x^{n+1}}{\partial x^{n}}
∂xn∂L=∂xn+1∂L∂xn∂xn+1 (留给底一层的level用)
\qquad\qquad\qquad
更新参数:
\qquad\qquad\qquad\qquad
ω
n
←
ω
n
−
η
∂
L
∂
ω
n
\omega^{n} \leftarrow \omega^{n} - \eta \frac{\partial L}{\partial \omega^{n}}
ωn←ωn−η∂ωn∂L
\qquad\qquad\qquad\qquad
b
n
←
b
n
−
η
∂
L
∂
b
n
b^{n} \leftarrow b^{n} - \eta \frac{\partial L}{\partial b^{n}}
bn←bn−η∂bn∂L
Arg:
- ω \omega ω:omega(欧米茄)
- η \eta η:eta(艾塔)
Note:
- BP中的 ∂ L ∂ ω n \frac{\partial L}{\partial \omega^{n}} ∂ωn∂L 和 ∂ L ∂ x n \frac{\partial L}{\partial x^{n}} ∂xn∂L 的计算结果 来源于 对 前馈计算时 的 L = f ( w n x n ) L = f(w^{n}x^{n}) L=f(wnxn) 的求导 。
链式法则
∂ L ∂ ω n = ∂ L ∂ x n + 1 ∂ x n + 1 ∂ ω n = ∂ L ∂ x n + 2 ∂ x n + 2 ∂ x n + 1 ∂ x n + 1 ∂ ω n \frac{\partial L}{\partial \omega^{n}} = \frac{\partial L}{\partial x^{n+1}} \frac{\partial x^{n+1}}{\partial \omega^{n}} = \frac{\partial L}{\partial x^{n+2}} \frac{\partial x^{n+2}}{\partial x^{n+1}} \frac{\partial x^{n+1}}{\partial \omega^{n}} ∂ωn∂L=∂xn+1∂L∂ωn∂xn+1=∂xn+2∂L∂xn+1∂xn+2∂ωn∂xn+1
bp机制导致每隔一层, ∂ L ∂ ω i \frac{\partial L}{\partial \omega^{i}} ∂ωi∂L 指数级下降。