简要推导(求和范围为
k
∈
K
k\in K
k∈K)
∂
E
∂
w
i
j
=
∂
∂
w
i
j
(
O
k
−
t
k
)
2
2
\frac{\partial E}{\partial w_{ij}}=\frac{\partial}{\partial w_{ij}}\frac{(O_k-t_k)^2}{2}
∂wij∂E=∂wij∂2(Ok−tk)2
∂
E
∂
w
i
j
=
∑
(
O
k
−
t
k
)
∂
∂
w
i
j
σ
(
x
k
)
\frac{\partial E}{\partial w_{ij}}=\sum (O_k-t_k)\frac{\partial}{\partial w_{ij}}\sigma (x_k)
∂wij∂E=∑(Ok−tk)∂wij∂σ(xk)
∂
E
∂
w
i
j
=
∑
(
O
k
−
t
k
)
σ
(
x
k
)
(
1
−
σ
(
x
k
)
)
∂
x
k
∂
w
i
j
\frac{\partial E}{\partial w_{ij}}=\sum (O_k-t_k)\sigma (x_k)(1-\sigma(x_k))\frac{\partial x_k}{\partial w_{ij}}
∂wij∂E=∑(Ok−tk)σ(xk)(1−σ(xk))∂wij∂xk
∂
E
∂
w
i
j
=
∑
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
∂
x
k
∂
O
j
∂
O
j
∂
w
i
j
\frac{\partial E}{\partial w_{ij}}=\sum (O_k-t_k)O_k(1-O_k)\frac{\partial x_k}{\partial O_j}\frac{\partial O_j}{\partial w_{ij}}
∂wij∂E=∑(Ok−tk)Ok(1−Ok)∂Oj∂xk∂wij∂Oj
∂
E
∂
w
i
j
=
∑
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
W
j
k
∂
O
j
∂
w
i
j
\frac{\partial E}{\partial w_{ij}}=\sum (O_k-t_k)O_k(1-O_k)W_{jk}\frac{\partial O_j}{\partial w_{ij}}
∂wij∂E=∑(Ok−tk)Ok(1−Ok)Wjk∂wij∂Oj
∂
E
∂
w
i
j
=
O
j
(
1
−
O
j
)
∂
x
j
∂
w
i
j
∑
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
W
j
k
\frac{\partial E}{\partial w_{ij}}= O_j(1-O_j)\frac{\partial x_j}{\partial w_{ij}}\sum(O_k-t_k)O_k(1-O_k)W_{jk}
∂wij∂E=Oj(1−Oj)∂wij∂xj∑(Ok−tk)Ok(1−Ok)Wjk
∂
E
∂
w
i
j
=
O
j
(
1
−
O
j
)
O
i
∑
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
W
j
k
\frac{\partial E}{\partial w_{ij}}= O_j(1-O_j)O_i\sum(O_k-t_k)O_k(1-O_k)W_{jk}
∂wij∂E=Oj(1−Oj)Oi∑(Ok−tk)Ok(1−Ok)Wjk另
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
=
σ
k
(O_k-t_k)O_k(1-O_k)=\sigma_k
(Ok−tk)Ok(1−Ok)=σk,上式可化为
∂
E
∂
w
i
j
=
O
j
(
1
−
O
j
)
O
i
∑
σ
k
W
j
k
\frac{\partial E}{\partial w_{ij}}= O_j(1-O_j)O_i\sum \sigma_kW_{jk}
∂wij∂E=Oj(1−Oj)Oi∑σkWjk
3. 2D函数优化(反向传播算法)实现
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import pyplot as plt
import tensorflow as tf
defDBG(x):return(x[0]**2+x[1]-11)**2+(x[0]+ x[1]**2-7)**2
x = np.arange(-6,6,0.1)
y = np.arange(-6,6,0.1)
X, Y = np.meshgrid(x, y)
Z = DBG([X, Y])
fig = plt.figure('DBG')
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z)
ax.view_init(60,-30)
ax.set_xlabel('x')
ax.set_xlabel('y')
plt.show()# 设置初始点
x = tf.constant([3.,0.])for step inrange(150):with tf.GradientTape()as tape:
tape.watch([x])
y = DBG(x)# 加[0]避免出现“无法将序列与“float”类型的非int相乘”问题
grads = tape.gradient(y,[x])[0]
x -=0.01*grads
if step %20==0:print('step{}: x={} f(x)={}'.format(step, x.numpy(), y.numpy()))