[深度学习]Part3 CV深度学习基础Ch02-BP神经网络——【DeepBlue学习笔记】

本文链接：https://blog.csdn.net/LiongLoure/article/details/126298835

本文仅供学习使用
本章需要提前学习机器学习相关内容

CV深度学习基础Ch02-BP神经网络

1. 机器学习算法回顾
2. 神经网络
3. 算法实现

1. 机器学习算法回顾

1.1 梯度下降法回顾

梯度下降法(Gradient Descent，GD)常用于求解无约束情况下凸函数(Convex Function)的极小值，是一种迭代类型的算法，因为凸函数只有一个极值点，故求解出来的极小值点就是函数的最小值点。
$J(\theta )=\frac{1}{2m}\sum\limits_{i=1}^{m}{{{({{h}_{\theta }}{{x}^{(i)}}-{{y}^{(i)}})}^{2}}}$
$\theta *=\underset{\theta }{\mathop{\arg \min }}\,J(\theta )$
在这里插入图片描述

在这里插入图片描述

1.2 线性回归回顾

最基础的一种机器学习算法，常用来拟合线性关系的模型:
$\left\{ \begin{matrix} {{y}^{(i)}}={{\theta }^{T}}{{x}^{(i)}}+{{\varepsilon }^{(i)}} \\ J(\theta )=\frac{1}{2}\sum\limits_{i=1}^{m}{{{({{h}_{\theta }}{{x}^{(i)}}-{{y}^{(i)}})}^{2}}} \\ \theta '=\theta -\alpha \frac{\partial J(\theta )}{\partial \theta } \\ \end{matrix} \right.$

1.3 Logistic回归回顾

基于线性回归扩展的一种分类算法：
$\left\{ \begin{matrix} p={{h}_{\theta }}(x)=g({{\theta }^{T}}x)=\frac{1}{1+{{e}^{-{{\theta }^{T}}x}}},g(z)=\frac{1}{1+{{e}^{-z}}} \\ loss=-\ell (\theta )=-\sum\limits_{i=1}^{m}{({{y}^{(i)}}\ln {{h}_{\theta }}({{x}^{(i)}})+(1-{{y}^{(i)}})\ln (1-{{h}_{\theta }}({{x}^{(i)}}))} \\ {{\theta }_{j}}'={{\theta }_{j}}-\alpha \sum\limits_{i=1}^{m}{({{y}^{(i)}}-{{h}_{\theta }}({{x}^{(i)}})){{x}_{j}}^{(i)}} \\ g'(z)=g(z)(1-g(z)) \\ \end{matrix} \right.$

2. 神经网络

2.1 神经网络之BP算法

神经网络的一种求解W/B的算法，分为信号正向传播(FP)求损失，反向传播(BP)回传误差；根据误差值修改每层的权重，继续迭代。
在这里插入图片描述
BP算法也叫做δ算法，以三层的感知器神经网络为例（假定现在隐层和输出层均存在相同类型的激活函数）：

在这里插入图片描述
输出层误差： $E=\frac{1}{2}{{(d-O)}^{2}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-{{O}_{k}})}^{2}}}$
隐层的误差： $E=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(ne{{t}_{k}}))}^{2}}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=1}^{m}{{{w}_{jk}}{{y}_{j}}}))}^{2}}}$
输入层误差： $E=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(ne{{t}_{k}})}))}^{2}}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(\sum\limits_{i=1}^{m}{{{\upsilon }_{ij}}{{x}_{i}}})}))}^{2}}}$

2.2 神经网络之SGD

误差E有了，那么为了使误差越来越小，可以采用随机梯度下降的方式进行ω和υ的求解，即求得ω和υ使得误差E最小
在这里插入图片描述

BP算法例子

2.2.1 FP过程

$b = (0.35, 0.65)$
$w=\left( \begin{matrix} 0.1 & 0.15 & 0.2 & 0.25 & 0.3 & 0.35 \\ 0.4 & 0.45 & 0.5 & 0.55 & 0.6 & 0.65 \\ \end{matrix} \right)$
$E=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(ne{{t}_{k}})}))}^{2}}}=\frac{1}{2}\sum\limits_{k=1}^{\ell }{{{({{d}_{k}}-f(\sum\limits_{j=0}^{m}{{{w}_{jk}}f(\sum\limits_{i=1}^{m}{{{\upsilon }_{ij}}{{x}_{i}}})}))}^{2}}}$
$ou{{t}_{h1}}=\frac{1}{1+{{e}^{-ne{{t}_{h1}}}}}=\frac{1}{1+{{e}^{-2.35}}}=0.912934\Rightarrow ou{{t}_{h2}}=0.979164\Rightarrow ou{{t}_{h2}}=0.995275$

$ne{{t}_{o1}}={{w}_{7}}ou{{t}_{h1}}+{{w}_{9}}ou{{t}_{h2}}+{{w}_{11}}ou{{t}_{h3}}+{{b}_{2}}\cdot 1=0.4\cdot 0.912934+0.5\cdot 0.979164+0.6\cdot 0.995275+0.65\cdot 1=2.1019206$
$ou{{t}_{o1}}=\frac{1}{1+{{e}^{-ne{{t}_{o1}}}}}=\frac{1}{1+{{e}^{-2.1019206}}}=0.891090\Rightarrow ou{{t}_{o2}}=0.904330$

${{E}_{o1}}=\frac{1}{2}{{(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})}^{2}}$
${{E}_{total}}={{E}_{o1}}+{{E}_{o2}}\text{=}\frac{1}{2}{{(0.01-0.891090)}^{2}}+\frac{1}{2}{{(0.99-0.904330)}^{2}}=0.391829$

2.2.2 BP过程

以W7为例：
$\frac{\partial {{E}_{total}}}{\partial {{w}_{7}}}=\frac{\partial {{E}_{total}}}{\partial ou{{t}_{o1}}}\cdot \frac{\partial ou{{t}_{o1}}}{\partial ne{{t}_{o1}}}\cdot \frac{\partial ne{{t}_{o1}}}{\partial {{w}_{7}}}$
$\frac{\partial {{E}_{total}}}{\partial ou{{t}_{o1}}}=2\cdot \frac{1}{2}{{(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})}^{2-1}}\cdot (-1)=-(0.01-0.891090)=0.88109$
$ou{{t}_{o1}}=\frac{1}{1+{{e}^{-ne{{t}_{o1}}}}},\frac{\partial ou{{t}_{o1}}}{\partial ne{{t}_{o1}}}=ou{{t}_{o1}}(1-ou{{t}_{o1}})=0.891090(1-0.891090)=0.097049$
$ne{{t}_{o1}}={{w}_{7}}ou{{t}_{h1}}+{{w}_{9}}ou{{t}_{h2}}+{{w}_{11}}ou{{t}_{h3}}+{{b}_{2}}\cdot 1,\frac{\partial ne{{t}_{o1}}}{\partial {{w}_{7}}}=ou{{t}_{h1}}=0.912934$
$\frac{\partial {{E}_{total}}}{\partial {{w}_{7}}}=-(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})\cdot ou{{t}_{o1}}\cdot (1-ou{{t}_{o1}})\cdot ou{{t}_{h1}}=0.88109\cdot 0.097049\cdot 0.912934=0.078064$
${{w}_{7}}^{+}={{w}_{7}}+\Delta {{w}_{7}}={{w}_{7}}-\eta \frac{\partial {{E}_{total}}}{\partial {{w}_{7}}}=0.4-0.5\cdot 0.078064=0.360968$

$\Rightarrow {{w}_{8}}^{+}=0.453383,{{w}_{9}}^{+}=0.458137,{{w}_{10}}^{+}=0.553629,{{w}_{11}}^{+}=0.557448,{{w}_{12}}^{+}=0.653688$

以W1为例：
$\frac{\partial {{E}_{total}}}{\partial {{w}_{1}}}=\frac{\partial {{E}_{total}}}{\partial ou{{t}_{h1}}}\cdot \frac{\partial ou{{t}_{h1}}}{\partial ne{{t}_{h1}}}\cdot \frac{\partial ne{{t}_{h1}}}{\partial {{w}_{1}}}=(\frac{\partial {{E}_{o1}}}{\partial ou{{t}_{h1}}}+\frac{\partial {{E}_{o2}}}{\partial ou{{t}_{h1}}})\cdot \frac{\partial ou{{t}_{h1}}}{\partial ne{{t}_{h1}}}\cdot \frac{\partial ne{{t}_{h1}}}{\partial {{w}_{1}}}$
$\frac{\partial {{E}_{o1}}}{\partial ou{{t}_{h1}}}=\frac{\partial {{E}_{o1}}}{\partial ou{{t}_{o1}}}\cdot \frac{\partial ou{{t}_{o1}}}{\partial ne{{t}_{o1}}}\cdot \frac{\partial ne{{t}_{o1}}}{\partial ou{{t}_{h1}}}=-(\text{targe}{{\text{t}}_{o1}}-ou{{t}_{o1}})\cdot ou{{t}_{o1}}\cdot (1-ou{{t}_{o1}})\cdot {{w}_{7}}=-(0.01-0.891090)\cdot 0.891090\cdot (1-0.891090)\cdot 0.360968=0.030866$
$\frac{\partial {{E}_{o2}}}{\partial ou{{t}_{h1}}}=\frac{\partial {{E}_{o2}}}{\partial ou{{t}_{o2}}}\cdot \frac{\partial ou{{t}_{o2}}}{\partial ne{{t}_{o2}}}\cdot \frac{\partial ne{{t}_{o2}}}{\partial ou{{t}_{h1}}}=-(\text{targe}{{\text{t}}_{o2}}-ou{{t}_{o2}})\cdot ou{{t}_{o2}}\cdot (1-ou{{t}_{o2}})\cdot {{w}_{8}}$

$\frac{\partial {{E}_{total}}}{\partial {{w}_{1}}}=0.011204$

${{w}_{1}}^{+}={{w}_{1}}+\Delta {{w}_{1}}={{w}_{1}}-\eta \frac{\partial {{E}_{total}}}{\partial {{w}_{1}}}=0.1-0.5\cdot 0.011204=0.094534$

${{w}^{1}}=\left( \begin{matrix} 0.094534 & 0.139069 & 0.198211 & 0.246422 & 0.299497 & 0.348993 \\ 0.360968 & 0.453383 & 0.458137 & 0.553629 & 0.557448 & 0.653688 \\ \end{matrix} \right)$

${{b}^{0}}=\left( \begin{matrix} 0.35 \\ 0.65 \\ \end{matrix} \right)$

2.2.3 多次迭代效果

第10次迭代结果: $O = (0.662866, 0.908195)$
第100次迭代结果: $O = (0.073889, 0.945864)$
第1000次迭代结果: $O = (0.022971, 0.977675)$
${{w}^{1000}}=\left( \begin{matrix} 0.214925 & 0.379850 & 0.262855 & 0.375711 & 0.323201 & 0.396402 \\ -1.48972 & 0.941715 & -1.50182 & 1.049019 & -1.42756 & 1.151881 \\ \end{matrix} \right)$

3. 算法实现

BP过程

import numpy as np

_w = [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65]
_b = [0.35, 0.65]
_x = [5, 10]
_y = [0.01, 0.99]
lr = 0.5

def w(index):
    return _w[index - 1]

def x(index):
    return _x[index - 1]

def b(index):
    return _b[index - 1]

def y(index):
    return _y[index - 1]

def set_w(index, gd):
    _w[index - 1] = _w[index - 1] - lr * gd

def sigmoid(z):
    return 1.0 / (1 + np.exp(-z))

def training():
    # 1. 前向过程，计算损失
    h1 = sigmoid(w(1)*x(1) + w(2)*x(2) + b(1))
    h2 = sigmoid(w(3)*x(1) + w(4)*x(2) + b(1))
    h3 = sigmoid(w(5)*x(1) + w(6)*x(2) + b(1))
    o1 = sigmoid(w(7)*h1 + w(9)*h2 + w(11)*h3 + b(2))
    o2 = sigmoid(w(8)*h1 + w(10)*h2 + w(12)*h3 + b(2))
    # mse损失
    loss = 0.5 * (y(1) - o1)**2 + 0.5 * (y(2) - o2) ** 2
    # 交叉熵损失函数,y的取值只有两种可能0或者1
    # NOTE: 基于交叉熵损失函数的定义的，可以自己修改代码运行看看
    # loss = -(y(1)*np.log(o1) + (1-y(1))*np.log(1 - o1)) - (y(2)*np.log(o2) + (1-y(2))*np.log(1 - o2))
    
    # 2. 反向过程，基于loss求解梯度值，然后更新参数
    # NOTE: 这里的t1、t2也好，包括更新参数的gd都是基于loss求解导数/梯度值
    t1 = -1.0 * (y(1) - o1) * o1 * (1-o1)
    t2 = -1.0 * (y(2) - o2) * o2 * (1-o2)
    
    set_w(1, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(1))
    set_w(2, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(2))
    set_w(3, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(1))
    set_w(4, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(2))
    set_w(5, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(1))
    set_w(6, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(2))

    set_w(7, gd = t1 * h1)
    set_w(8, t2 * h1)
    set_w(9, t1 * h2)
    set_w(10, t2 * h2)
    set_w(11, t1 * h3)
    set_w(12, t2 * h3)

    #set_w(1, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(1))
    #set_w(2, (t1 * w(7) + t2 * w(8)) * h1 * (1-h1) * x(2))
    #set_w(3, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(1))
    #set_w(4, (t1 * w(9) + t2 * w(10)) * h2 * (1-h2) * x(2))
    #set_w(5, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(1))
    #set_w(6, (t1 * w(11) + t2 * w(12)) * h3 * (1-h3) * x(2))
    return loss


def training2():
    # 1. 前向过程，计算损失同时计算梯度值
    h1 = sigmoid(w(1)*x(1) + w(2)*x(2) + b(1))
    h1_gd_w1 = h1 * (1 - h1) * x(1)
    h1_gd_w2 = h1 * (1 - h1) * x(2)
    h1_gd_x1 = h1 * (1 - h1) * w(1)
    h1_gd_x2 = h1 * (1 - h1) * w(2)
    h1_gd_b = h1 * (1 - h1)
    
    h2 = sigmoid(w(3)*x(1) + w(4)*x(2) + b(1))
    h2_gd_w3 = h2 * (1 - h2) * x(1)
    h2_gd_w4 = h2 * (1 - h2) * x(2)
    h2_gd_x1 = h2 * (1 - h2) * w(3)
    h2_gd_x2 = h2 * (1 - h2) * w(4)
    h2_gd_b = h2 * (1 - h2)
    
    h3 = sigmoid(w(5)*x(1) + w(6)*x(2) + b(1))
    h3_gd_w5 = h3 * (1 - h3) * x(1)
    h3_gd_w6 = h3 * (1 - h3) * x(2)
    h3_gd_x1 = h3 * (1 - h3) * w(5)
    h3_gd_x2 = h3 * (1 - h3) * w(6)
    h3_gd_b = h3 * (1 - h3)
    
    o1 = sigmoid(w(7)*h1 + w(9)*h2 + w(11)*h3 + b(2))
    o1_gd_w7 = o1 * (1 - o1) * h1
    o1_gd_w9 = o1 * (1 - o1) * h2
    o1_gd_w11 = o1 * (1 - o1) * h3
    o1_gd_h1 = o1 * (1 - o1) * w(7)
    o1_gd_h2 = o1 * (1 - o1) * w(9)
    o1_gd_h3 = o1 * (1 - o1) * w(11)
    o1_gd_b = o1 * (1 - o1)
    
    o2 = sigmoid(w(8)*h1 + w(10)*h2 + w(12)*h3 + b(2))
    o2_gd_w8 = o2 * (1 - o2) * h1
    o2_gd_w10 = o2 * (1 - o2) * h2
    o2_gd_w12 = o2 * (1 - o2) * h3
    o2_gd_h1 = o2 * (1 - o2) * w(8)
    o2_gd_h2 = o2 * (1 - o2) * w(10)
    o2_gd_h3 = o2 * (1 - o2) * w(12)
    o2_gd_b = o2 * (1 - o2)

    # mse损失
    loss = 0.5 * (y(1) - o1)**2 + 0.5 * (y(2) - o2) ** 2
    loss_gd_o1 = -1.0 * (y(1) - o1)
    loss_gd_o2 = -1.0 * (y(2) - o2)
    # 交叉熵损失函数,y的取值只有两种可能0或者1
    # NOTE: 基于交叉熵损失函数的定义的，可以自己修改代码运行看看
    # loss = -(y(1)*np.log(o1) + (1-y(1))*np.log(1 - o1)) - (y(2)*np.log(o2) + (1-y(2))*np.log(1 - o2))
    
    # 2. 反向过程，基于loss求解梯度值，然后更新参数
    # NOTE: 这里的t1、t2也好，包括更新参数的gd都是基于loss求解导数/梯度值
    set_w(1, (loss_gd_o1 * o1_gd_h1 + loss_gd_o2 * o2_gd_h1) * h1_gd_w1)
    set_w(2, (loss_gd_o1 * o1_gd_h1 + loss_gd_o2 * o2_gd_h1) * h1_gd_w2)
    set_w(3, (loss_gd_o1 * o1_gd_h2 + loss_gd_o2 * o2_gd_h2) * h2_gd_w3)
    set_w(4, (loss_gd_o1 * o1_gd_h2 + loss_gd_o2 * o2_gd_h2) * h2_gd_w4)
    set_w(5, (loss_gd_o1 * o1_gd_h3 + loss_gd_o2 * o2_gd_h3) * h3_gd_w5)
    set_w(6, (loss_gd_o1 * o1_gd_h3 + loss_gd_o2 * o2_gd_h3) * h3_gd_w6)

    set_w(7, gd = loss_gd_o1 * o1_gd_w7)
    set_w(8, loss_gd_o2 * o2_gd_w8)
    set_w(9, loss_gd_o1 * o1_gd_w9)
    set_w(10, loss_gd_o2 * o2_gd_w10)
    set_w(11, loss_gd_o1 * o1_gd_w11)
    set_w(12, loss_gd_o2 * o2_gd_w12)

    return loss
    pass

if __name__ == '__main__':
    print("nihao")
    for i in range(1000):
        _loss = training2()
    print(_w)
    print(_loss)