[深度学习]Part3 CV深度学习基础Ch02-BP神经网络——【DeepBlue学习笔记】

21 篇文章 0 订阅
18 篇文章 0 订阅

本文仅供学习使用
本章需要提前学习机器学习相关内容


1. 机器学习算法回顾

1.1 梯度下降法回顾

梯度下降法(Gradient Descent,GD)常用于求解无约束情况下凸函数(Convex Function)极小值,是一种迭代类型的算法,因为凸函数只有一个极值点,故求解出来的极小值点就是函数的最小值点
J ( θ ) = 1 2 m ∑ i = 1 m ( h θ x ( i ) − y ( i ) ) 2 J(\theta )=\frac{1}{2m}\sum\limits_{i=1}^{m}{{{({{h}_{\theta }}{{x}^{(i)}}-{{y}^{(i)}})}^{2}}} J(θ)=2m1i=1m(hθx(i)y(i))2
θ ∗ = arg ⁡ min ⁡ θ   J ( θ ) \theta *=\underset{\theta }{\mathop{\arg \min }}\,J(\theta ) θ=θargminJ(θ)
在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

1.2 线性回归回顾

最基础的一种机器学习算法,常用来拟合线性关系的模型:
{ y ( i ) = θ T x ( i ) + ε ( i ) J ( θ ) = 1 2 ∑ i = 1 m ( h θ x ( i ) − y ( i ) ) 2 θ ′ = θ − α ∂ J ( θ ) ∂ θ \left\{ \begin{matrix} {{y}^{(i)}}={{\theta }^{T}}{{x}^{(i)}}+{{\varepsilon }^{(i)}} \\ J(\theta )=\frac{1}{2}\sum\limits_{i=1}^{m}{{{({{h}_{\theta }}{{x}^{(i)}}-{{y}^{(i)}})}^{2}}} \\ \theta '=\theta -\alpha \frac{\partial J(\theta )}{\partial \theta } \\ \end{matrix} \right. y(i)=θTx(i)+ε(i)J(θ)=21i=1m(hθx(i)y(i))2θ=θαθJ(θ)

1.3 Logistic回归回顾

基于线性回归扩展的一种分类算法:
{ p = h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x , g ( z ) = 1 1 + e − z l o s s = − ℓ ( θ ) = − ∑ i = 1 m ( y ( i ) ln ⁡ h θ ( x ( i ) ) + ( 1 − y ( i ) ) ln ⁡ ( 1 − h θ ( x ( i ) ) ) θ j ′ = θ j − α ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) g ′ ( z ) = g ( z ) ( 1 − g ( z ) ) \left\{ \begin{matrix} p={{h}_{\theta }}(x)=g({{\theta }^{T}}x)=\frac{1}{1+{{e}^{-{{\theta }^{T}}x}}},g(z)=\frac{1}{1+{{e}^{-z}}} \\ loss=-\ell (\theta )=-\sum\limits_{i=1}^{m}{({{y}^{(i)}}\ln {{h}_{\theta }}({{x}^{(i)}})+(1-{{y}^{(i)}})\ln (1-{{h}_{\theta }}({{x}^{(i)}}))} \\ {{\theta }_{j}}'={{\theta }_{j}}-\alpha \sum\limits_{i=1}^{m}{({{y}^{(i)}}-{{h}_{\theta }}({{x}^{(i)}})){{x}_{j}}^{(i)}} \\ g'(z)=g(z)(1-g(z)) \\ \end{matrix} \right. p=hθ(x)=g(θTx)=1+eθTx1,g(z)=1+ez1loss=(θ)=i=1m(y(i)lnhθ(x(i))+(1y(i))ln(1hθ(x(i)))θj=θjαi=1m(y(i)hθ(x(i)))xj(i)g(z)=g(z)(1g(z))

2. 神经网络

2.1 神经网络之BP算法

神经网络的一种求解W/B的算法,分为信号正向传播(FP)求损失,反向传播(BP)回传误差; 根据误差值修改每层的权重,继续迭代。
在这里插入图片描述
BP算法也叫做δ算法,以三层的感知器神经网络为例(假定现在隐层和输出层均存在相同类型的激活函数):

在这里插入图片描述
输出层误差: E = 1 2 ( d − O ) 2 = 1 2 ∑ k = 1 ℓ ( d k − O k ) 2 E=\frac{1}{2}{{(d-O)}^{2}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-{{O}_{k}})}^{2}}} E=21(dO)2=21k=1(dkOk)2
隐层的误差: E = 1 2 ∑ k = 1 ℓ ( d k − f ( n e t k ) ) 2 = 1 2 ∑ k = 1 ℓ ( d k − f ( ∑ j = 1 m w j k y j ) ) 2 E=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(ne{{t}_{k}}))}^{2}}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=1}^{m}{{{w}_{jk}}{{y}_{j}}}))}^{2}}} E=21k=1(dkf(netk))2=21k=1(dkf(j=1mwjkyj))2
输入层误差: E = 1 2 ∑ k = 1 ℓ ( d k − f ( ∑ j = 0 m w j k f ( n e t k ) ) ) 2 = 1 2 ∑ k = 1 ℓ ( d k − f ( ∑ j = 0 m w j k f ( ∑ i = 1 m υ i j x i ) ) ) 2 E=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(ne{{t}_{k}})}))}^{2}}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(\sum\limits_{i=1}^{m}{{{\upsilon }_{ij}}{{x}_{i}}})}))}^{2}}} E=21k=1(dkf(j=0mwjkf(netk)))2=21k=1(dkf(j=0mwjkf(i=1mυijxi)))2

2.2 神经网络之SGD

误差E有了,那么为了使误差越来越小,可以采用随机梯度下降的方式进行ω和υ的求解,即求 得ω和υ使得误差E最小
在这里插入图片描述

BP算法例子
在这里插入图片描述

2.2.1 FP过程

b = ( 0.35 , 0.65 ) b=(0.35,0.65) b=(0.35,0.65)
w = ( 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 ) w=\left( \begin{matrix} 0.1 & 0.15 & 0.2 & 0.25 & 0.3 & 0.35 \\ 0.4 & 0.45 & 0.5 & 0.55 & 0.6 & 0.65 \\ \end{matrix} \right) w=(0.10.40.150.450.20.50.250.550.30.60.350.65)
E = 1 2 ∑ k = 1 ℓ ( d k − f ( ∑ j = 0 m w j k f ( n e t k ) ) ) 2 = 1 2 ∑ k = 1 ℓ ( d k − f ( ∑ j = 0 m w j k f ( ∑ i = 1 m υ i j x i ) ) ) 2 E=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(ne{{t}_{k}})}))}^{2}}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(\sum\limits_{i=1}^{m}{{{\upsilon }_{ij}}{{x}_{i}}})}))}^{2}}} E=21k=1(dkf(j=0mwjkf(netk)))2=21k=1(dkf(j=0mwjkf(i=1mυijxi)))2
o u t h 1 = 1 1 + e − n e t h 1 = 1 1 + e − 2.35 = 0.912934 ⇒ o u t h 2 = 0.979164 ⇒ o u t h 2 = 0.995275 ou{{t}_{h1}}=\frac{1}{1+{{e}^{-ne{{t}_{h1}}}}}=\frac{1}{1+{{e}^{-2.35}}}=0.912934\Rightarrow ou{{t}_{h2}}=0.979164\Rightarrow ou{{t}_{h2}}=0.995275 outh1=1+eneth11=1+e2.351=0.912934outh2=0.979164outh2=0.995275

n e t o 1 = w 7 o u t h 1 + w 9 o u t h 2 + w 11 o u t h 3 + b 2 ⋅ 1 = 0.4 ⋅ 0.912934 + 0.5 ⋅ 0.979164 + 0.6 ⋅ 0.995275 + 0.65 ⋅ 1 = 2.1019206 ne{{t}_{o1}}={{w}_{7}}ou{{t}_{h1}}+{{w}_{9}}ou{{t}_{h2}}+{{w}_{11}}ou{{t}_{h3}}+{{b}_{2}}\cdot 1=0.4\cdot 0.912934+0.5\cdot 0.979164+0.6\cdot 0.995275+0.65\cdot 1=2.1019206 neto1=w7outh1+w9outh2+w11outh3+b21=0.40.912934+0.50.979164+0.60.995275+0.651=2.1019206
o u t o 1 = 1 1 + e − n e t o 1 = 1 1 + e − 2.1019206 = 0.891090 ⇒ o u t o 2 = 0.904330 ou{{t}_{o1}}=\frac{1}{1+{{e}^{-ne{{t}_{o1}}}}}=\frac{1}{1+{{e}^{-2.1019206}}}=0.891090\Rightarrow ou{{t}_{o2}}=0.904330 outo1=1+eneto11=1+e2.10192061=0.891090outo2=0.904330

E o 1 = 1 2 ( targe t o 1 − o u t o 1 ) 2 {{E}_{o1}}=\frac{1}{2}{{(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})}^{2}} Eo1=21(targeto1outo1)2
E t o t a l = E o 1 + E o 2 = 1 2 ( 0.01 − 0.891090 ) 2 + 1 2 ( 0.99 − 0.904330 ) 2 = 0.391829 {{E}_{total}}={{E}_{o1}}+{{E}_{o2}}\text{=}\frac{1}{2}{{(0.01-0.891090)}^{2}}+\frac{1}{2}{{(0.99-0.904330)}^{2}}=0.391829 Etotal=Eo1+Eo2=21(0.010.891090)2+21(0.990.904330)2=0.391829

2.2.2 BP过程

以W7为例:
∂ E t o t a l ∂ w 7 = ∂ E t o t a l ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ w 7 \frac{\partial {{E}_{total}}}{\partial {{w}_{7}}}=\frac{\partial {{E}_{total}}}{\partial ou{{t}_{o1}}}\cdot \frac{\partial ou{{t}_{o1}}}{\partial ne{{t}_{o1}}}\cdot \frac{\partial ne{{t}_{o1}}}{\partial {{w}_{7}}} w7Etotal=outo1Etotalneto1outo1w7neto1
∂ E t o t a l ∂ o u t o 1 = 2 ⋅ 1 2 ( targe t o 1 − o u t o 1 ) 2 − 1 ⋅ ( − 1 ) = − ( 0.01 − 0.891090 ) = 0.88109 \frac{\partial {{E}_{total}}}{\partial ou{{t}_{o1}}}=2\cdot \frac{1}{2}{{(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})}^{2-1}}\cdot (-1)=-(0.01-0.891090)=0.88109 outo1Etotal=221(targeto1outo1)21(1)=(0.010.891090)=0.88109
o u t o 1 = 1 1 + e − n e t o 1 , ∂ o u t o 1 ∂ n e t o 1 = o u t o 1 ( 1 − o u t o 1 ) = 0.891090 ( 1 − 0.891090 ) = 0.097049 ou{{t}_{o1}}=\frac{1}{1+{{e}^{-ne{{t}_{o1}}}}},\frac{\partial ou{{t}_{o1}}}{\partial ne{{t}_{o1}}}=ou{{t}_{o1}}(1-ou{{t}_{o1}})=0.891090(1-0.891090)=0.097049 outo1=1+eneto11,neto1outo1=outo1(1outo1)=0.891090(10.891090)=0.097049
n e t o 1 = w 7 o u t h 1 + w 9 o u t h 2 + w 11 o u t h 3 + b 2 ⋅ 1 , ∂ n e t o 1 ∂ w 7 = o u t h 1 = 0.912934 ne{{t}_{o1}}={{w}_{7}}ou{{t}_{h1}}+{{w}_{9}}ou{{t}_{h2}}+{{w}_{11}}ou{{t}_{h3}}+{{b}_{2}}\cdot 1,\frac{\partial ne{{t}_{o1}}}{\partial {{w}_{7}}}=ou{{t}_{h1}}=0.912934 neto1=w7outh1+w9outh2+w11outh3+b21,w7neto1=outh1=0.912934
∂ E t o t a l ∂ w 7 = − ( targe t o 1 − o u t o 1 ) ⋅ o u t o 1 ⋅ ( 1 − o u t o 1 ) ⋅ o u t h 1 = 0.88109 ⋅ 0.097049 ⋅ 0.912934 = 0.078064 \frac{\partial {{E}_{total}}}{\partial {{w}_{7}}}=-(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})\cdot ou{{t}_{o1}}\cdot (1-ou{{t}_{o1}})\cdot ou{{t}_{h1}}=0.88109\cdot 0.097049\cdot 0.912934=0.078064 w7Etotal=(targeto1outo1)outo1(1outo1)outh1=0.881090.0970490.912934=0.078064
w 7 + = w 7 + Δ w 7 = w 7 − η ∂ E t o t a l ∂ w 7 = 0.4 − 0.5 ⋅ 0.078064 = 0.360968 {{w}_{7}}^{+}={{w}_{7}}+\Delta {{w}_{7}}={{w}_{7}}-\eta \frac{\partial {{E}_{total}}}{\partial {{w}_{7}}}=0.4-0.5\cdot 0.078064=0.360968 w7+=w7+Δw7=w7ηw7Etotal=0.40.50.078064=0.360968

⇒ w 8 + = 0.453383 , w 9 + = 0.458137 , w 10 + = 0.553629 , w 11 + = 0.557448 , w 12 + = 0.653688 \Rightarrow {{w}_{8}}^{+}=0.453383,{{w}_{9}}^{+}=0.458137,{{w}_{10}}^{+}=0.553629,{{w}_{11}}^{+}=0.557448,{{w}_{12}}^{+}=0.653688 w8+=0.453383,w9+=0.458137,w10+=0.553629,w11+=0.557448,w12+=0.653688

以W1为例:
∂ E t o t a l ∂ w 1 = ∂ E t o t a l ∂ o u t h 1 ⋅ ∂ o u t h 1 ∂ n e t h 1 ⋅ ∂ n e t h 1 ∂ w 1 = ( ∂ E o 1 ∂ o u t h 1 + ∂ E o 2 ∂ o u t h 1 ) ⋅ ∂ o u t h 1 ∂ n e t h 1 ⋅ ∂ n e t h 1 ∂ w 1 \frac{\partial {{E}_{total}}}{\partial {{w}_{1}}}=\frac{\partial {{E}_{total}}}{\partial ou{{t}_{h1}}}\cdot \frac{\partial ou{{t}_{h1}}}{\partial ne{{t}_{h1}}}\cdot \frac{\partial ne{{t}_{h1}}}{\partial {{w}_{1}}}=(\frac{\partial {{E}_{o1}}}{\partial ou{{t}_{h1}}}+\frac{\partial {{E}_{o2}}}{\partial ou{{t}_{h1}}})\cdot \frac{\partial ou{{t}_{h1}}}{\partial ne{{t}_{h1}}}\cdot \frac{\partial ne{{t}_{h1}}}{\partial {{w}_{1}}} w1Etotal=outh1Etotalneth1outh1w1neth1=(outh1Eo1+outh1Eo2)neth1outh1w1neth1
∂ E o 1 ∂ o u t h 1 = ∂ E o 1 ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ o u t h 1 = − ( targe t o 1 − o u t o 1 ) ⋅ o u t o 1 ⋅ ( 1 − o u t o 1 ) ⋅ w 7 = − ( 0.01 − 0.891090 ) ⋅ 0.891090 ⋅ ( 1 − 0.891090 ) ⋅ 0.360968 = 0.030866 \frac{\partial {{E}_{o1}}}{\partial ou{{t}_{h1}}}=\frac{\partial {{E}_{o1}}}{\partial ou{{t}_{o1}}}\cdot \frac{\partial ou{{t}_{o1}}}{\partial ne{{t}_{o1}}}\cdot \frac{\partial ne{{t}_{o1}}}{\partial ou{{t}_{h1}}}=-(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})\cdot ou{{t}_{o1}}\cdot (1-ou{{t}_{o1}})\cdot {{w}_{7}}=-(0.01-0.891090)\cdot 0.891090\cdot (1-0.891090)\cdot 0.360968=0.030866 outh1Eo1=outo1Eo1neto1outo1outh1neto1=(targeto1outo1)outo1(1outo1)w7=(0.010.891090)0.891090(10.891090)0.360968=0.030866
∂ E o 2 ∂ o u t h 1 = ∂ E o 2 ∂ o u t o 2 ⋅ ∂ o u t o 2 ∂ n e t o 2 ⋅ ∂ n e t o 2 ∂ o u t h 1 = − ( targe t o 2 − o u t o 2 ) ⋅ o u t o 2 ⋅ ( 1 − o u t o 2 ) ⋅ w 8 \frac{\partial {{E}_{o2}}}{\partial ou{{t}_{h1}}}=\frac{\partial {{E}_{o2}}}{\partial ou{{t}_{o2}}}\cdot \frac{\partial ou{{t}_{o2}}}{\partial ne{{t}_{o2}}}\cdot \frac{\partial ne{{t}_{o2}}}{\partial ou{{t}_{h1}}}=-(\text{targe}{{\text{t}}_{o2}}-ou{{t}_{o2}})\cdot ou{{t}_{o2}}\cdot (1-ou{{t}_{o2}})\cdot {{w}_{8}} outh1Eo2=outo2Eo2neto2outo2outh1neto2=(targeto2outo2)outo2(1outo2)w8

∂ E t o t a l ∂ w 1 = 0.011204 \frac{\partial {{E}_{total}}}{\partial {{w}_{1}}}=0.011204 w1Etotal=0.011204

w 1 + = w 1 + Δ w 1 = w 1 − η ∂ E t o t a l ∂ w 1 = 0.1 − 0.5 ⋅ 0.011204 = 0.094534 {{w}_{1}}^{+}={{w}_{1}}+\Delta {{w}_{1}}={{w}_{1}}-\eta \frac{\partial {{E}_{total}}}{\partial {{w}_{1}}}=0.1-0.5\cdot 0.011204=0.094534 w1+=w1+Δw1=w1ηw1Etotal=0.10.50.011204=0.094534

w 1 = ( 0.094534 0.139069 0.198211 0.246422 0.299497 0.348993 0.360968 0.453383 0.458137 0.553629 0.557448 0.653688 ) {{w}^{1}}=\left( \begin{matrix} 0.094534 & 0.139069 & 0.198211 & 0.246422 & 0.299497 & 0.348993 \\ 0.360968 & 0.453383 & 0.458137 & 0.553629 & 0.557448 & 0.653688 \\ \end{matrix} \right) w1=(0.0945340.3609680.1390690.4533830.1982110.4581370.2464220.5536290.2994970.5574480.3489930.653688)

b 0 = ( 0.35 0.65 ) {{b}^{0}}=\left( \begin{matrix} 0.35 \\ 0.65 \\ \end{matrix} \right) b0=(0.350.65)

2.2.3 多次迭代效果

第10次迭代结果: O = ( 0.662866 , 0.908195 ) O=(0.662866,0.908195) O=(0.662866,0.908195)
第100次迭代结果: O = ( 0.073889 , 0.945864 ) O=(0.073889,0.945864) O=(0.073889,0.945864)
第1000次迭代结果: O = ( 0.022971 , 0.977675 ) O=(0.022971,0.977675) O=(0.022971,0.977675)
w 1000 = ( 0.214925 0.379850 0.262855 0.375711 0.323201 0.396402 − 1.48972 0.941715 − 1.50182 1.049019 − 1.42756 1.151881 ) {{w}^{1000}}=\left( \begin{matrix} 0.214925 & 0.379850 & 0.262855 & 0.375711 & 0.323201 & 0.396402 \\ -1.48972 & 0.941715 & -1.50182 & 1.049019 & -1.42756 & 1.151881 \\ \end{matrix} \right) w1000=(0.2149251.489720.3798500.9417150.2628551.501820.3757111.0490190.3232011.427560.3964021.151881)

3. 算法实现

BP过程

import numpy as np

_w = [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65]
_b = [0.35, 0.65]
_x = [5, 10]
_y = [0.01, 0.99]
lr = 0.5

def w(index):
    return _w[index - 1]

def x(index):
    return _x[index - 1]

def b(index):
    return _b[index - 1]

def y(index):
    return _y[index - 1]

def set_w(index, gd):
    _w[index - 1] = _w[index - 1] - lr * gd

def sigmoid(z):
    return 1.0 / (1 + np.exp(-z))

def training():
    # 1. 前向过程,计算损失
    h1 = sigmoid(w(1)*x(1) + w(2)*x(2) + b(1))
    h2 = sigmoid(w(3)*x(1) + w(4)*x(2) + b(1))
    h3 = sigmoid(w(5)*x(1) + w(6)*x(2) + b(1))
    o1 = sigmoid(w(7)*h1 + w(9)*h2 + w(11)*h3 + b(2))
    o2 = sigmoid(w(8)*h1 + w(10)*h2 + w(12)*h3 + b(2))
    # mse损失
    loss = 0.5 * (y(1) - o1)**2 + 0.5 * (y(2) - o2) ** 2
    # 交叉熵损失函数,y的取值只有两种可能0或者1
    # NOTE: 基于交叉熵损失函数的定义的,可以自己修改代码运行看看
    # loss = -(y(1)*np.log(o1) + (1-y(1))*np.log(1 - o1)) - (y(2)*np.log(o2) + (1-y(2))*np.log(1 - o2))
    
    # 2. 反向过程,基于loss求解梯度值,然后更新参数
    # NOTE: 这里的t1、t2也好,包括更新参数的gd都是基于loss求解导数/梯度值
    t1 = -1.0 * (y(1) - o1) * o1 * (1-o1)
    t2 = -1.0 * (y(2) - o2) * o2 * (1-o2)
    
    set_w(1, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(1))
    set_w(2, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(2))
    set_w(3, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(1))
    set_w(4, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(2))
    set_w(5, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(1))
    set_w(6, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(2))

    set_w(7, gd = t1 * h1)
    set_w(8, t2 * h1)
    set_w(9, t1 * h2)
    set_w(10, t2 * h2)
    set_w(11, t1 * h3)
    set_w(12, t2 * h3)

    #set_w(1, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(1))
    #set_w(2, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(2))
    #set_w(3, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(1))
    #set_w(4, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(2))
    #set_w(5, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(1))
    #set_w(6, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(2))
    return loss


def training2():
    # 1. 前向过程,计算损失同时计算梯度值
    h1 = sigmoid(w(1)*x(1) + w(2)*x(2) + b(1))
    h1_gd_w1 = h1 * (1 - h1) * x(1)
    h1_gd_w2 = h1 * (1 - h1) * x(2)
    h1_gd_x1 = h1 * (1 - h1) * w(1)
    h1_gd_x2 = h1 * (1 - h1) * w(2)
    h1_gd_b = h1 * (1 - h1)
    
    h2 = sigmoid(w(3)*x(1) + w(4)*x(2) + b(1))
    h2_gd_w3 = h2 * (1 - h2) * x(1)
    h2_gd_w4 = h2 * (1 - h2) * x(2)
    h2_gd_x1 = h2 * (1 - h2) * w(3)
    h2_gd_x2 = h2 * (1 - h2) * w(4)
    h2_gd_b = h2 * (1 - h2)
    
    h3 = sigmoid(w(5)*x(1) + w(6)*x(2) + b(1))
    h3_gd_w5 = h3 * (1 - h3) * x(1)
    h3_gd_w6 = h3 * (1 - h3) * x(2)
    h3_gd_x1 = h3 * (1 - h3) * w(5)
    h3_gd_x2 = h3 * (1 - h3) * w(6)
    h3_gd_b = h3 * (1 - h3)
    
    o1 = sigmoid(w(7)*h1 + w(9)*h2 + w(11)*h3 + b(2))
    o1_gd_w7 = o1 * (1 - o1) * h1
    o1_gd_w9 = o1 * (1 - o1) * h2
    o1_gd_w11 = o1 * (1 - o1) * h3
    o1_gd_h1 = o1 * (1 - o1) * w(7)
    o1_gd_h2 = o1 * (1 - o1) * w(9)
    o1_gd_h3 = o1 * (1 - o1) * w(11)
    o1_gd_b = o1 * (1 - o1)
    
    o2 = sigmoid(w(8)*h1 + w(10)*h2 + w(12)*h3 + b(2))
    o2_gd_w8 = o2 * (1 - o2) * h1
    o2_gd_w10 = o2 * (1 - o2) * h2
    o2_gd_w12 = o2 * (1 - o2) * h3
    o2_gd_h1 = o2 * (1 - o2) * w(8)
    o2_gd_h2 = o2 * (1 - o2) * w(10)
    o2_gd_h3 = o2 * (1 - o2) * w(12)
    o2_gd_b = o2 * (1 - o2)

    # mse损失
    loss = 0.5 * (y(1) - o1)**2 + 0.5 * (y(2) - o2) ** 2
    loss_gd_o1 = -1.0 * (y(1) - o1)
    loss_gd_o2 = -1.0 * (y(2) - o2)
    # 交叉熵损失函数,y的取值只有两种可能0或者1
    # NOTE: 基于交叉熵损失函数的定义的,可以自己修改代码运行看看
    # loss = -(y(1)*np.log(o1) + (1-y(1))*np.log(1 - o1)) - (y(2)*np.log(o2) + (1-y(2))*np.log(1 - o2))
    
    # 2. 反向过程,基于loss求解梯度值,然后更新参数
    # NOTE: 这里的t1、t2也好,包括更新参数的gd都是基于loss求解导数/梯度值
    set_w(1, (loss_gd_o1 * o1_gd_h1 + loss_gd_o2 * o2_gd_h1) * h1_gd_w1)
    set_w(2, (loss_gd_o1 * o1_gd_h1 + loss_gd_o2 * o2_gd_h1) * h1_gd_w2)
    set_w(3, (loss_gd_o1 * o1_gd_h2 + loss_gd_o2 * o2_gd_h2) * h2_gd_w3)
    set_w(4, (loss_gd_o1 * o1_gd_h2 + loss_gd_o2 * o2_gd_h2) * h2_gd_w4)
    set_w(5, (loss_gd_o1 * o1_gd_h3 + loss_gd_o2 * o2_gd_h3) * h3_gd_w5)
    set_w(6, (loss_gd_o1 * o1_gd_h3 + loss_gd_o2 * o2_gd_h3) * h3_gd_w6)

    set_w(7, gd = loss_gd_o1 * o1_gd_w7)
    set_w(8, loss_gd_o2 * o2_gd_w8)
    set_w(9, loss_gd_o1 * o1_gd_w9)
    set_w(10, loss_gd_o2 * o2_gd_w10)
    set_w(11, loss_gd_o1 * o1_gd_w11)
    set_w(12, loss_gd_o2 * o2_gd_w12)

    return loss
    pass

if __name__ == '__main__':
    print("nihao")
    for i in range(1000):
        _loss = training2()
    print(_w)
    print(_loss)
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

LiongLoure

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值