本文仅供学习使用
本章需要提前学习机器学习相关内容
CV深度学习基础Ch02-BP神经网络
1. 机器学习算法回顾
1.1 梯度下降法回顾
梯度下降法(Gradient Descent,GD)
常用于求解无约束情况下凸函数(Convex Function)
的极小值,是一种迭代类型的算法,因为凸函数只有一个极值点,故求解出来的极小值点就是函数的最小值点。
J
(
θ
)
=
1
2
m
∑
i
=
1
m
(
h
θ
x
(
i
)
−
y
(
i
)
)
2
J(\theta )=\frac{1}{2m}\sum\limits_{i=1}^{m}{{{({{h}_{\theta }}{{x}^{(i)}}-{{y}^{(i)}})}^{2}}}
J(θ)=2m1i=1∑m(hθx(i)−y(i))2
θ
∗
=
arg
min
θ
J
(
θ
)
\theta *=\underset{\theta }{\mathop{\arg \min }}\,J(\theta )
θ∗=θargminJ(θ)
1.2 线性回归回顾
最基础的一种机器学习算法,常用来拟合线性关系的模型:
{
y
(
i
)
=
θ
T
x
(
i
)
+
ε
(
i
)
J
(
θ
)
=
1
2
∑
i
=
1
m
(
h
θ
x
(
i
)
−
y
(
i
)
)
2
θ
′
=
θ
−
α
∂
J
(
θ
)
∂
θ
\left\{ \begin{matrix} {{y}^{(i)}}={{\theta }^{T}}{{x}^{(i)}}+{{\varepsilon }^{(i)}} \\ J(\theta )=\frac{1}{2}\sum\limits_{i=1}^{m}{{{({{h}_{\theta }}{{x}^{(i)}}-{{y}^{(i)}})}^{2}}} \\ \theta '=\theta -\alpha \frac{\partial J(\theta )}{\partial \theta } \\ \end{matrix} \right.
⎩
⎨
⎧y(i)=θTx(i)+ε(i)J(θ)=21i=1∑m(hθx(i)−y(i))2θ′=θ−α∂θ∂J(θ)
1.3 Logistic回归回顾
基于线性回归扩展的一种分类算法:
{
p
=
h
θ
(
x
)
=
g
(
θ
T
x
)
=
1
1
+
e
−
θ
T
x
,
g
(
z
)
=
1
1
+
e
−
z
l
o
s
s
=
−
ℓ
(
θ
)
=
−
∑
i
=
1
m
(
y
(
i
)
ln
h
θ
(
x
(
i
)
)
+
(
1
−
y
(
i
)
)
ln
(
1
−
h
θ
(
x
(
i
)
)
)
θ
j
′
=
θ
j
−
α
∑
i
=
1
m
(
y
(
i
)
−
h
θ
(
x
(
i
)
)
)
x
j
(
i
)
g
′
(
z
)
=
g
(
z
)
(
1
−
g
(
z
)
)
\left\{ \begin{matrix} p={{h}_{\theta }}(x)=g({{\theta }^{T}}x)=\frac{1}{1+{{e}^{-{{\theta }^{T}}x}}},g(z)=\frac{1}{1+{{e}^{-z}}} \\ loss=-\ell (\theta )=-\sum\limits_{i=1}^{m}{({{y}^{(i)}}\ln {{h}_{\theta }}({{x}^{(i)}})+(1-{{y}^{(i)}})\ln (1-{{h}_{\theta }}({{x}^{(i)}}))} \\ {{\theta }_{j}}'={{\theta }_{j}}-\alpha \sum\limits_{i=1}^{m}{({{y}^{(i)}}-{{h}_{\theta }}({{x}^{(i)}})){{x}_{j}}^{(i)}} \\ g'(z)=g(z)(1-g(z)) \\ \end{matrix} \right.
⎩
⎨
⎧p=hθ(x)=g(θTx)=1+e−θTx1,g(z)=1+e−z1loss=−ℓ(θ)=−i=1∑m(y(i)lnhθ(x(i))+(1−y(i))ln(1−hθ(x(i)))θj′=θj−αi=1∑m(y(i)−hθ(x(i)))xj(i)g′(z)=g(z)(1−g(z))
2. 神经网络
2.1 神经网络之BP算法
神经网络的一种求解W/B的算法,分为信号正向传播(FP)
求损失,反向传播(BP)
回传误差; 根据误差值修改每层的权重,继续迭代。
BP算法也叫做δ算法
,以三层的感知器神经网络为例(假定现在隐层和输出层均存在相同类型的激活函数):
输出层误差:
E
=
1
2
(
d
−
O
)
2
=
1
2
∑
k
=
1
ℓ
(
d
k
−
O
k
)
2
E=\frac{1}{2}{{(d-O)}^{2}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-{{O}_{k}})}^{2}}}
E=21(d−O)2=21k=1∑ℓ(dk−Ok)2
隐层的误差:
E
=
1
2
∑
k
=
1
ℓ
(
d
k
−
f
(
n
e
t
k
)
)
2
=
1
2
∑
k
=
1
ℓ
(
d
k
−
f
(
∑
j
=
1
m
w
j
k
y
j
)
)
2
E=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(ne{{t}_{k}}))}^{2}}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=1}^{m}{{{w}_{jk}}{{y}_{j}}}))}^{2}}}
E=21k=1∑ℓ(dk−f(netk))2=21k=1∑ℓ(dk−f(j=1∑mwjkyj))2
输入层误差:
E
=
1
2
∑
k
=
1
ℓ
(
d
k
−
f
(
∑
j
=
0
m
w
j
k
f
(
n
e
t
k
)
)
)
2
=
1
2
∑
k
=
1
ℓ
(
d
k
−
f
(
∑
j
=
0
m
w
j
k
f
(
∑
i
=
1
m
υ
i
j
x
i
)
)
)
2
E=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(ne{{t}_{k}})}))}^{2}}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(\sum\limits_{i=1}^{m}{{{\upsilon }_{ij}}{{x}_{i}}})}))}^{2}}}
E=21k=1∑ℓ(dk−f(j=0∑mwjkf(netk)))2=21k=1∑ℓ(dk−f(j=0∑mwjkf(i=1∑mυijxi)))2
2.2 神经网络之SGD
误差E有了,那么为了使误差越来越小,可以采用随机梯度下降的方式进行ω和υ的求解,即求 得ω和υ使得误差E最小
BP算法例子
2.2.1 FP过程
b
=
(
0.35
,
0.65
)
b=(0.35,0.65)
b=(0.35,0.65)
w
=
(
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
)
w=\left( \begin{matrix} 0.1 & 0.15 & 0.2 & 0.25 & 0.3 & 0.35 \\ 0.4 & 0.45 & 0.5 & 0.55 & 0.6 & 0.65 \\ \end{matrix} \right)
w=(0.10.40.150.450.20.50.250.550.30.60.350.65)
E
=
1
2
∑
k
=
1
ℓ
(
d
k
−
f
(
∑
j
=
0
m
w
j
k
f
(
n
e
t
k
)
)
)
2
=
1
2
∑
k
=
1
ℓ
(
d
k
−
f
(
∑
j
=
0
m
w
j
k
f
(
∑
i
=
1
m
υ
i
j
x
i
)
)
)
2
E=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(ne{{t}_{k}})}))}^{2}}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(\sum\limits_{i=1}^{m}{{{\upsilon }_{ij}}{{x}_{i}}})}))}^{2}}}
E=21k=1∑ℓ(dk−f(j=0∑mwjkf(netk)))2=21k=1∑ℓ(dk−f(j=0∑mwjkf(i=1∑mυijxi)))2
o
u
t
h
1
=
1
1
+
e
−
n
e
t
h
1
=
1
1
+
e
−
2.35
=
0.912934
⇒
o
u
t
h
2
=
0.979164
⇒
o
u
t
h
2
=
0.995275
ou{{t}_{h1}}=\frac{1}{1+{{e}^{-ne{{t}_{h1}}}}}=\frac{1}{1+{{e}^{-2.35}}}=0.912934\Rightarrow ou{{t}_{h2}}=0.979164\Rightarrow ou{{t}_{h2}}=0.995275
outh1=1+e−neth11=1+e−2.351=0.912934⇒outh2=0.979164⇒outh2=0.995275
n
e
t
o
1
=
w
7
o
u
t
h
1
+
w
9
o
u
t
h
2
+
w
11
o
u
t
h
3
+
b
2
⋅
1
=
0.4
⋅
0.912934
+
0.5
⋅
0.979164
+
0.6
⋅
0.995275
+
0.65
⋅
1
=
2.1019206
ne{{t}_{o1}}={{w}_{7}}ou{{t}_{h1}}+{{w}_{9}}ou{{t}_{h2}}+{{w}_{11}}ou{{t}_{h3}}+{{b}_{2}}\cdot 1=0.4\cdot 0.912934+0.5\cdot 0.979164+0.6\cdot 0.995275+0.65\cdot 1=2.1019206
neto1=w7outh1+w9outh2+w11outh3+b2⋅1=0.4⋅0.912934+0.5⋅0.979164+0.6⋅0.995275+0.65⋅1=2.1019206
o
u
t
o
1
=
1
1
+
e
−
n
e
t
o
1
=
1
1
+
e
−
2.1019206
=
0.891090
⇒
o
u
t
o
2
=
0.904330
ou{{t}_{o1}}=\frac{1}{1+{{e}^{-ne{{t}_{o1}}}}}=\frac{1}{1+{{e}^{-2.1019206}}}=0.891090\Rightarrow ou{{t}_{o2}}=0.904330
outo1=1+e−neto11=1+e−2.10192061=0.891090⇒outo2=0.904330
E
o
1
=
1
2
(
targe
t
o
1
−
o
u
t
o
1
)
2
{{E}_{o1}}=\frac{1}{2}{{(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})}^{2}}
Eo1=21(targeto1−outo1)2
E
t
o
t
a
l
=
E
o
1
+
E
o
2
=
1
2
(
0.01
−
0.891090
)
2
+
1
2
(
0.99
−
0.904330
)
2
=
0.391829
{{E}_{total}}={{E}_{o1}}+{{E}_{o2}}\text{=}\frac{1}{2}{{(0.01-0.891090)}^{2}}+\frac{1}{2}{{(0.99-0.904330)}^{2}}=0.391829
Etotal=Eo1+Eo2=21(0.01−0.891090)2+21(0.99−0.904330)2=0.391829
2.2.2 BP过程
以W7为例:
∂
E
t
o
t
a
l
∂
w
7
=
∂
E
t
o
t
a
l
∂
o
u
t
o
1
⋅
∂
o
u
t
o
1
∂
n
e
t
o
1
⋅
∂
n
e
t
o
1
∂
w
7
\frac{\partial {{E}_{total}}}{\partial {{w}_{7}}}=\frac{\partial {{E}_{total}}}{\partial ou{{t}_{o1}}}\cdot \frac{\partial ou{{t}_{o1}}}{\partial ne{{t}_{o1}}}\cdot \frac{\partial ne{{t}_{o1}}}{\partial {{w}_{7}}}
∂w7∂Etotal=∂outo1∂Etotal⋅∂neto1∂outo1⋅∂w7∂neto1
∂
E
t
o
t
a
l
∂
o
u
t
o
1
=
2
⋅
1
2
(
targe
t
o
1
−
o
u
t
o
1
)
2
−
1
⋅
(
−
1
)
=
−
(
0.01
−
0.891090
)
=
0.88109
\frac{\partial {{E}_{total}}}{\partial ou{{t}_{o1}}}=2\cdot \frac{1}{2}{{(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})}^{2-1}}\cdot (-1)=-(0.01-0.891090)=0.88109
∂outo1∂Etotal=2⋅21(targeto1−outo1)2−1⋅(−1)=−(0.01−0.891090)=0.88109
o
u
t
o
1
=
1
1
+
e
−
n
e
t
o
1
,
∂
o
u
t
o
1
∂
n
e
t
o
1
=
o
u
t
o
1
(
1
−
o
u
t
o
1
)
=
0.891090
(
1
−
0.891090
)
=
0.097049
ou{{t}_{o1}}=\frac{1}{1+{{e}^{-ne{{t}_{o1}}}}},\frac{\partial ou{{t}_{o1}}}{\partial ne{{t}_{o1}}}=ou{{t}_{o1}}(1-ou{{t}_{o1}})=0.891090(1-0.891090)=0.097049
outo1=1+e−neto11,∂neto1∂outo1=outo1(1−outo1)=0.891090(1−0.891090)=0.097049
n
e
t
o
1
=
w
7
o
u
t
h
1
+
w
9
o
u
t
h
2
+
w
11
o
u
t
h
3
+
b
2
⋅
1
,
∂
n
e
t
o
1
∂
w
7
=
o
u
t
h
1
=
0.912934
ne{{t}_{o1}}={{w}_{7}}ou{{t}_{h1}}+{{w}_{9}}ou{{t}_{h2}}+{{w}_{11}}ou{{t}_{h3}}+{{b}_{2}}\cdot 1,\frac{\partial ne{{t}_{o1}}}{\partial {{w}_{7}}}=ou{{t}_{h1}}=0.912934
neto1=w7outh1+w9outh2+w11outh3+b2⋅1,∂w7∂neto1=outh1=0.912934
∂
E
t
o
t
a
l
∂
w
7
=
−
(
targe
t
o
1
−
o
u
t
o
1
)
⋅
o
u
t
o
1
⋅
(
1
−
o
u
t
o
1
)
⋅
o
u
t
h
1
=
0.88109
⋅
0.097049
⋅
0.912934
=
0.078064
\frac{\partial {{E}_{total}}}{\partial {{w}_{7}}}=-(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})\cdot ou{{t}_{o1}}\cdot (1-ou{{t}_{o1}})\cdot ou{{t}_{h1}}=0.88109\cdot 0.097049\cdot 0.912934=0.078064
∂w7∂Etotal=−(targeto1−outo1)⋅outo1⋅(1−outo1)⋅outh1=0.88109⋅0.097049⋅0.912934=0.078064
w
7
+
=
w
7
+
Δ
w
7
=
w
7
−
η
∂
E
t
o
t
a
l
∂
w
7
=
0.4
−
0.5
⋅
0.078064
=
0.360968
{{w}_{7}}^{+}={{w}_{7}}+\Delta {{w}_{7}}={{w}_{7}}-\eta \frac{\partial {{E}_{total}}}{\partial {{w}_{7}}}=0.4-0.5\cdot 0.078064=0.360968
w7+=w7+Δw7=w7−η∂w7∂Etotal=0.4−0.5⋅0.078064=0.360968
⇒ w 8 + = 0.453383 , w 9 + = 0.458137 , w 10 + = 0.553629 , w 11 + = 0.557448 , w 12 + = 0.653688 \Rightarrow {{w}_{8}}^{+}=0.453383,{{w}_{9}}^{+}=0.458137,{{w}_{10}}^{+}=0.553629,{{w}_{11}}^{+}=0.557448,{{w}_{12}}^{+}=0.653688 ⇒w8+=0.453383,w9+=0.458137,w10+=0.553629,w11+=0.557448,w12+=0.653688
以W1为例:
∂
E
t
o
t
a
l
∂
w
1
=
∂
E
t
o
t
a
l
∂
o
u
t
h
1
⋅
∂
o
u
t
h
1
∂
n
e
t
h
1
⋅
∂
n
e
t
h
1
∂
w
1
=
(
∂
E
o
1
∂
o
u
t
h
1
+
∂
E
o
2
∂
o
u
t
h
1
)
⋅
∂
o
u
t
h
1
∂
n
e
t
h
1
⋅
∂
n
e
t
h
1
∂
w
1
\frac{\partial {{E}_{total}}}{\partial {{w}_{1}}}=\frac{\partial {{E}_{total}}}{\partial ou{{t}_{h1}}}\cdot \frac{\partial ou{{t}_{h1}}}{\partial ne{{t}_{h1}}}\cdot \frac{\partial ne{{t}_{h1}}}{\partial {{w}_{1}}}=(\frac{\partial {{E}_{o1}}}{\partial ou{{t}_{h1}}}+\frac{\partial {{E}_{o2}}}{\partial ou{{t}_{h1}}})\cdot \frac{\partial ou{{t}_{h1}}}{\partial ne{{t}_{h1}}}\cdot \frac{\partial ne{{t}_{h1}}}{\partial {{w}_{1}}}
∂w1∂Etotal=∂outh1∂Etotal⋅∂neth1∂outh1⋅∂w1∂neth1=(∂outh1∂Eo1+∂outh1∂Eo2)⋅∂neth1∂outh1⋅∂w1∂neth1
∂
E
o
1
∂
o
u
t
h
1
=
∂
E
o
1
∂
o
u
t
o
1
⋅
∂
o
u
t
o
1
∂
n
e
t
o
1
⋅
∂
n
e
t
o
1
∂
o
u
t
h
1
=
−
(
targe
t
o
1
−
o
u
t
o
1
)
⋅
o
u
t
o
1
⋅
(
1
−
o
u
t
o
1
)
⋅
w
7
=
−
(
0.01
−
0.891090
)
⋅
0.891090
⋅
(
1
−
0.891090
)
⋅
0.360968
=
0.030866
\frac{\partial {{E}_{o1}}}{\partial ou{{t}_{h1}}}=\frac{\partial {{E}_{o1}}}{\partial ou{{t}_{o1}}}\cdot \frac{\partial ou{{t}_{o1}}}{\partial ne{{t}_{o1}}}\cdot \frac{\partial ne{{t}_{o1}}}{\partial ou{{t}_{h1}}}=-(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})\cdot ou{{t}_{o1}}\cdot (1-ou{{t}_{o1}})\cdot {{w}_{7}}=-(0.01-0.891090)\cdot 0.891090\cdot (1-0.891090)\cdot 0.360968=0.030866
∂outh1∂Eo1=∂outo1∂Eo1⋅∂neto1∂outo1⋅∂outh1∂neto1=−(targeto1−outo1)⋅outo1⋅(1−outo1)⋅w7=−(0.01−0.891090)⋅0.891090⋅(1−0.891090)⋅0.360968=0.030866
∂
E
o
2
∂
o
u
t
h
1
=
∂
E
o
2
∂
o
u
t
o
2
⋅
∂
o
u
t
o
2
∂
n
e
t
o
2
⋅
∂
n
e
t
o
2
∂
o
u
t
h
1
=
−
(
targe
t
o
2
−
o
u
t
o
2
)
⋅
o
u
t
o
2
⋅
(
1
−
o
u
t
o
2
)
⋅
w
8
\frac{\partial {{E}_{o2}}}{\partial ou{{t}_{h1}}}=\frac{\partial {{E}_{o2}}}{\partial ou{{t}_{o2}}}\cdot \frac{\partial ou{{t}_{o2}}}{\partial ne{{t}_{o2}}}\cdot \frac{\partial ne{{t}_{o2}}}{\partial ou{{t}_{h1}}}=-(\text{targe}{{\text{t}}_{o2}}-ou{{t}_{o2}})\cdot ou{{t}_{o2}}\cdot (1-ou{{t}_{o2}})\cdot {{w}_{8}}
∂outh1∂Eo2=∂outo2∂Eo2⋅∂neto2∂outo2⋅∂outh1∂neto2=−(targeto2−outo2)⋅outo2⋅(1−outo2)⋅w8
∂ E t o t a l ∂ w 1 = 0.011204 \frac{\partial {{E}_{total}}}{\partial {{w}_{1}}}=0.011204 ∂w1∂Etotal=0.011204
w 1 + = w 1 + Δ w 1 = w 1 − η ∂ E t o t a l ∂ w 1 = 0.1 − 0.5 ⋅ 0.011204 = 0.094534 {{w}_{1}}^{+}={{w}_{1}}+\Delta {{w}_{1}}={{w}_{1}}-\eta \frac{\partial {{E}_{total}}}{\partial {{w}_{1}}}=0.1-0.5\cdot 0.011204=0.094534 w1+=w1+Δw1=w1−η∂w1∂Etotal=0.1−0.5⋅0.011204=0.094534
w 1 = ( 0.094534 0.139069 0.198211 0.246422 0.299497 0.348993 0.360968 0.453383 0.458137 0.553629 0.557448 0.653688 ) {{w}^{1}}=\left( \begin{matrix} 0.094534 & 0.139069 & 0.198211 & 0.246422 & 0.299497 & 0.348993 \\ 0.360968 & 0.453383 & 0.458137 & 0.553629 & 0.557448 & 0.653688 \\ \end{matrix} \right) w1=(0.0945340.3609680.1390690.4533830.1982110.4581370.2464220.5536290.2994970.5574480.3489930.653688)
b 0 = ( 0.35 0.65 ) {{b}^{0}}=\left( \begin{matrix} 0.35 \\ 0.65 \\ \end{matrix} \right) b0=(0.350.65)
2.2.3 多次迭代效果
第10次迭代结果:
O
=
(
0.662866
,
0.908195
)
O=(0.662866,0.908195)
O=(0.662866,0.908195)
第100次迭代结果:
O
=
(
0.073889
,
0.945864
)
O=(0.073889,0.945864)
O=(0.073889,0.945864)
第1000次迭代结果:
O
=
(
0.022971
,
0.977675
)
O=(0.022971,0.977675)
O=(0.022971,0.977675)
w
1000
=
(
0.214925
0.379850
0.262855
0.375711
0.323201
0.396402
−
1.48972
0.941715
−
1.50182
1.049019
−
1.42756
1.151881
)
{{w}^{1000}}=\left( \begin{matrix} 0.214925 & 0.379850 & 0.262855 & 0.375711 & 0.323201 & 0.396402 \\ -1.48972 & 0.941715 & -1.50182 & 1.049019 & -1.42756 & 1.151881 \\ \end{matrix} \right)
w1000=(0.214925−1.489720.3798500.9417150.262855−1.501820.3757111.0490190.323201−1.427560.3964021.151881)
3. 算法实现
BP过程
import numpy as np
_w = [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65]
_b = [0.35, 0.65]
_x = [5, 10]
_y = [0.01, 0.99]
lr = 0.5
def w(index):
return _w[index - 1]
def x(index):
return _x[index - 1]
def b(index):
return _b[index - 1]
def y(index):
return _y[index - 1]
def set_w(index, gd):
_w[index - 1] = _w[index - 1] - lr * gd
def sigmoid(z):
return 1.0 / (1 + np.exp(-z))
def training():
# 1. 前向过程,计算损失
h1 = sigmoid(w(1)*x(1) + w(2)*x(2) + b(1))
h2 = sigmoid(w(3)*x(1) + w(4)*x(2) + b(1))
h3 = sigmoid(w(5)*x(1) + w(6)*x(2) + b(1))
o1 = sigmoid(w(7)*h1 + w(9)*h2 + w(11)*h3 + b(2))
o2 = sigmoid(w(8)*h1 + w(10)*h2 + w(12)*h3 + b(2))
# mse损失
loss = 0.5 * (y(1) - o1)**2 + 0.5 * (y(2) - o2) ** 2
# 交叉熵损失函数,y的取值只有两种可能0或者1
# NOTE: 基于交叉熵损失函数的定义的,可以自己修改代码运行看看
# loss = -(y(1)*np.log(o1) + (1-y(1))*np.log(1 - o1)) - (y(2)*np.log(o2) + (1-y(2))*np.log(1 - o2))
# 2. 反向过程,基于loss求解梯度值,然后更新参数
# NOTE: 这里的t1、t2也好,包括更新参数的gd都是基于loss求解导数/梯度值
t1 = -1.0 * (y(1) - o1) * o1 * (1-o1)
t2 = -1.0 * (y(2) - o2) * o2 * (1-o2)
set_w(1, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(1))
set_w(2, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(2))
set_w(3, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(1))
set_w(4, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(2))
set_w(5, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(1))
set_w(6, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(2))
set_w(7, gd = t1 * h1)
set_w(8, t2 * h1)
set_w(9, t1 * h2)
set_w(10, t2 * h2)
set_w(11, t1 * h3)
set_w(12, t2 * h3)
#set_w(1, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(1))
#set_w(2, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(2))
#set_w(3, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(1))
#set_w(4, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(2))
#set_w(5, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(1))
#set_w(6, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(2))
return loss
def training2():
# 1. 前向过程,计算损失同时计算梯度值
h1 = sigmoid(w(1)*x(1) + w(2)*x(2) + b(1))
h1_gd_w1 = h1 * (1 - h1) * x(1)
h1_gd_w2 = h1 * (1 - h1) * x(2)
h1_gd_x1 = h1 * (1 - h1) * w(1)
h1_gd_x2 = h1 * (1 - h1) * w(2)
h1_gd_b = h1 * (1 - h1)
h2 = sigmoid(w(3)*x(1) + w(4)*x(2) + b(1))
h2_gd_w3 = h2 * (1 - h2) * x(1)
h2_gd_w4 = h2 * (1 - h2) * x(2)
h2_gd_x1 = h2 * (1 - h2) * w(3)
h2_gd_x2 = h2 * (1 - h2) * w(4)
h2_gd_b = h2 * (1 - h2)
h3 = sigmoid(w(5)*x(1) + w(6)*x(2) + b(1))
h3_gd_w5 = h3 * (1 - h3) * x(1)
h3_gd_w6 = h3 * (1 - h3) * x(2)
h3_gd_x1 = h3 * (1 - h3) * w(5)
h3_gd_x2 = h3 * (1 - h3) * w(6)
h3_gd_b = h3 * (1 - h3)
o1 = sigmoid(w(7)*h1 + w(9)*h2 + w(11)*h3 + b(2))
o1_gd_w7 = o1 * (1 - o1) * h1
o1_gd_w9 = o1 * (1 - o1) * h2
o1_gd_w11 = o1 * (1 - o1) * h3
o1_gd_h1 = o1 * (1 - o1) * w(7)
o1_gd_h2 = o1 * (1 - o1) * w(9)
o1_gd_h3 = o1 * (1 - o1) * w(11)
o1_gd_b = o1 * (1 - o1)
o2 = sigmoid(w(8)*h1 + w(10)*h2 + w(12)*h3 + b(2))
o2_gd_w8 = o2 * (1 - o2) * h1
o2_gd_w10 = o2 * (1 - o2) * h2
o2_gd_w12 = o2 * (1 - o2) * h3
o2_gd_h1 = o2 * (1 - o2) * w(8)
o2_gd_h2 = o2 * (1 - o2) * w(10)
o2_gd_h3 = o2 * (1 - o2) * w(12)
o2_gd_b = o2 * (1 - o2)
# mse损失
loss = 0.5 * (y(1) - o1)**2 + 0.5 * (y(2) - o2) ** 2
loss_gd_o1 = -1.0 * (y(1) - o1)
loss_gd_o2 = -1.0 * (y(2) - o2)
# 交叉熵损失函数,y的取值只有两种可能0或者1
# NOTE: 基于交叉熵损失函数的定义的,可以自己修改代码运行看看
# loss = -(y(1)*np.log(o1) + (1-y(1))*np.log(1 - o1)) - (y(2)*np.log(o2) + (1-y(2))*np.log(1 - o2))
# 2. 反向过程,基于loss求解梯度值,然后更新参数
# NOTE: 这里的t1、t2也好,包括更新参数的gd都是基于loss求解导数/梯度值
set_w(1, (loss_gd_o1 * o1_gd_h1 + loss_gd_o2 * o2_gd_h1) * h1_gd_w1)
set_w(2, (loss_gd_o1 * o1_gd_h1 + loss_gd_o2 * o2_gd_h1) * h1_gd_w2)
set_w(3, (loss_gd_o1 * o1_gd_h2 + loss_gd_o2 * o2_gd_h2) * h2_gd_w3)
set_w(4, (loss_gd_o1 * o1_gd_h2 + loss_gd_o2 * o2_gd_h2) * h2_gd_w4)
set_w(5, (loss_gd_o1 * o1_gd_h3 + loss_gd_o2 * o2_gd_h3) * h3_gd_w5)
set_w(6, (loss_gd_o1 * o1_gd_h3 + loss_gd_o2 * o2_gd_h3) * h3_gd_w6)
set_w(7, gd = loss_gd_o1 * o1_gd_w7)
set_w(8, loss_gd_o2 * o2_gd_w8)
set_w(9, loss_gd_o1 * o1_gd_w9)
set_w(10, loss_gd_o2 * o2_gd_w10)
set_w(11, loss_gd_o1 * o1_gd_w11)
set_w(12, loss_gd_o2 * o2_gd_w12)
return loss
pass
if __name__ == '__main__':
print("nihao")
for i in range(1000):
_loss = training2()
print(_w)
print(_loss)