序言:
虽然训练深度学习模型已经有一段时间了,但是总是觉得哪里不太对,最近想先从神经网络的前向和后向传播推导一遍,以后会再加上卷积。这里只是给出比较容易理解的BP算法解读。本文大部分参考https://blog.csdn.net/cc514981717/article/details/73832119,在其基础上加入一些自己的理解,如有侵权请告知删除。
下面给出一个简单的神经网络结构图1:
其中,i1,i2为输入层,h1,h2为隐藏层,o1,o2为输出层,b1,b2为偏置,sigmoid为激活函数。
前向传播
i1->h1:
n
e
t
h
1
=
i
1
×
w
1
+
i
2
×
w
2
+
b
1
×
1
net_{h1} = i_1 \times w_1 + i_2 \times w_2 + b_1 \times 1
neth1=i1×w1+i2×w2+b1×1
h1->sigmoid:
o
u
t
h
1
=
s
i
g
m
o
i
d
(
n
e
t
h
1
)
=
1
1
+
e
−
n
e
t
h
1
out_{h1} = sigmoid(net_{h1}) = \frac{1}{1 + e^{-net_{h1}}}
outh1=sigmoid(neth1)=1+e−neth11
net_h2、out_h2的计算同理
out_h1->o1:
n
e
t
o
1
=
o
u
t
h
1
×
w
5
+
o
u
t
h
2
×
w
6
+
b
2
×
1
net_{o1} = out_{h1} \times w_5 + out_{h2} \times w_6 + b_2 \times 1
neto1=outh1×w5+outh2×w6+b2×1
o1->sigmoid:
o
u
t
o
1
=
s
i
g
m
o
i
d
(
n
e
t
o
1
)
=
1
1
+
e
−
n
e
t
o
1
out_{o1} = sigmoid(net_{o1}) = \frac{1}{1 + e^{-net_{o1}}}
outo1=sigmoid(neto1)=1+e−neto11
net_o2、out_02的计算同理
后向传播
总误差:
E
t
o
t
a
l
=
∑
1
2
(
t
a
r
g
e
t
−
o
u
t
p
u
t
)
2
E_{total} = \sum\frac{1}{2}(target - output)^2
Etotal=∑21(target−output)2
隐藏层权值更新(假设对
w
5
w_5
w5):
∂
E
t
o
t
a
l
∂
w
5
\frac{\partial E_{total}}{\partial w_5}
∂w5∂Etotal
采用链式求导法则:
∂
E
t
o
t
a
l
∂
w
5
=
∂
E
t
o
t
a
l
∂
o
u
t
o
1
×
∂
o
u
t
o
1
∂
n
e
t
o
1
×
∂
n
e
t
o
1
∂
w
5
\frac{\partial E_{total}}{\partial w_5}=\frac{\partial E_{total}}{\partial out_{o1}} \times \frac{\partial out_{o1}}{\partial net_{o1}} \times \frac{\partial net_{o1}}{\partial w_5}
∂w5∂Etotal=∂outo1∂Etotal×∂neto1∂outo1×∂w5∂neto1
其中
E
t
o
t
a
l
=
1
2
(
g
t
o
1
−
o
u
t
o
1
)
2
+
1
2
(
g
t
o
2
−
o
u
t
o
2
)
2
E_{total} = \frac{1}{2}(gt_{o1}-out_{o1})^2 + \frac{1}{2}(gt_{o2} - out_{o2})^2
Etotal=21(gto1−outo1)2+21(gto2−outo2)2
所以
∂
E
t
o
t
a
l
∂
o
u
t
o
1
=
o
u
t
o
1
−
g
t
o
1
\frac{\partial E_{total}}{\partial out_{o1}} = out_{o1} - gt_{o1}
∂outo1∂Etotal=outo1−gto1
∂
o
u
t
o
1
∂
n
e
t
o
1
=
e
−
n
e
t
o
1
(
1
+
e
−
n
e
t
o
1
)
2
=
o
u
t
o
1
×
(
1
−
o
u
t
o
1
)
\frac{\partial out_{o1}}{\partial net_{o1}} = \frac{e^{-net_{o1}}}{(1+e^{-net_{o1}})^2 } = out_{o1} \times (1-out_{o1})
∂neto1∂outo1=(1+e−neto1)2e−neto1=outo1×(1−outo1)
∂
n
e
t
o
1
∂
w
5
=
o
u
t
h
1
\frac{\partial net_{o1}}{\partial w_5} = out_{h1}
∂w5∂neto1=outh1
故
∂
E
t
o
t
a
l
∂
w
5
=
∂
E
t
o
t
a
l
∂
o
u
t
o
1
×
∂
o
u
t
o
1
∂
n
e
t
o
1
×
∂
n
e
t
o
1
∂
w
5
=
(
o
u
t
o
1
−
g
t
o
1
)
×
o
u
t
o
1
(
1
−
o
u
t
o
1
)
×
o
u
t
h
1
\frac{\partial E_{total}}{\partial w_5}=\frac{\partial E_{total}}{\partial out_{o1}} \times \frac{\partial out_{o1}}{\partial net_{o1}} \times \frac{\partial net_{o1}}{\partial w_5} = (out_{o1} - gt_{o1}) \times out_{o1}(1-out_{o1}) \times out_{h1}
∂w5∂Etotal=∂outo1∂Etotal×∂neto1∂outo1×∂w5∂neto1=(outo1−gto1)×outo1(1−outo1)×outh1
其中,
g
t
o
1
gt_{o1}
gto1表示o1的ground truth
更新权重
对于
w
5
w_5
w5
w
5
+
=
w
5
−
η
×
∂
E
t
o
t
a
l
∂
w
5
w_5^+ = w_5 - \eta \times \frac{\partial E_{total}}{\partial w_5}
w5+=w5−η×∂w5∂Etotal
python代码
下面是一个简单的python写的神经网络
import numpy as np
def sigmoid(x, deriv = False):
if (deriv == True):
return x * (1-x)
return 1/(1+np.exp(-x))
x = np.array([[0,0,1],
[0,1,1],
[1,0,1],
[1,1,1],
[0,0,1]]
)
y = np.array([[0],
[1],
[1],
[0],
[0]]
)
np.random.seed(22)
w0 = 2 * np.random.random((3,4)) - 1
w1 = 2 * np.random.random((4,1)) - 1
for j in xrange(60000):
l0 = x
l1 = sigmoid(np.dot(l0, w0))
l2 = sigmoid(np.dot(l1, w1))
l2_error = y - l2
if (j%5000) == 0:
print 'Error'+str(np.mean(np.abs(l2_error)))
l2_delta = l2_error * sigmoid(l2, deriv=True)
l1_error = l2_delta.dot(w1.T)
l1_delta = l1_error * sigmoid(l1, deriv=True)
w1 += l1.T.dot(l2_delta)
w0 += l0.T.dot(l1_delta)
最后
第一次在csdn写公开的博客,请大家多多指教
参考:
1.https://blog.csdn.net/cc514981717/article/details/73832119
2.唐宇迪深度学习入门课程