深度学习之 BP 算法

神经网络的一种求解W的算法,分为信号“正向传播(FP)”求损失,“反向传播(BP)”回传误差;根据误差值修改每层的权重,继续迭代。

BP算法也叫做δ算法。以三层的感知器为例(假定现在隐层和输出层均存在相同类型的激活函数)

  • 隐层 y = f(x * v)
  • 输出层 o = f(f(y) * w)
  • 输入层误差:E=\frac{1}{2}(d-O)^{2}=\frac{1}{2}\sum_{k=1}^{\iota }(d_k-O_k)^2
  • 隐层误差:   E=\frac{1}{2}\sum_{k=1}^{\iota }(d_k-f(net_k))^2=\frac{1}{2}\sum_{k=1}^{\iota }(d_k-f(\sum_{j=1}^{m}w_{jk}y_j)))^2
  • 输入层误差:E=\frac{1}{2}\sum_{k=1}^{\iota }(d_k-f\left [ \sum_{j=1}^{m}w_{jk}f(net_j)\right ])^2=\frac{1}{2}\sum_{k=1}^{\iota }(d_k-f\left [ \sum_{j=1}^{m}w_{jk}f\left ( \sum_{i=1}^{n}v_{ij}x_i \right )\right ])^2

误差E有了,那么为了使误差越来越小,可以采用随机梯度下降的方式进行ω和υ的求解,即求得ω和υ使得误差E最小

BP算法的例子

å¨è¿éæå¥å¾çæè¿°

  • 初始值:w(0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65),   b(0.35,0.65)
  • 输出值:O=(0.01,0.99)
  • 学习率:η=0.5 
  • 假设隐层和输出层都使用 sigmoid 激活函数

1、FP过程

先求out

net_{h_1} = w_1 * l_1 + w_2 * l_2 + b_1 * 1 = 0.1 * 5 + 0.15*10+0.35*1=2.35
out_{h_1} = \frac{1}{1+e^{-h_1}} = \frac{1}{1+e^{-2.35}} = 0.912934

同理可以得到:

out_{h_2} =0.979164
out_{h_3} =0.995275

o_1 = w_7 * out_{h_1}+w_9 * out_{h_2}+w_{10} * out_{h_3}+b_2*1

out_{o1} = \frac{1}{1+e^{-o_1}} = \frac{1}{1+e^{-2.1019206}}=0.891090

同理可以得到:

out_{0_2} =0.904330

输出层误差表示如下:

E = \frac{1}{2}(d - O)^2 = \frac{1}{2}\sum_{k=1}^l(d_k - O_k)^2
E_{o_1} = \frac{1}{2}(target_{o_1} - out_{o_1})^2

E_{total} = E_{o_1}+E_{o_2}= \frac{1}{2}(0.01 - 0.891090)^2 + \frac{1}{2}(0.99 -0.904330 )^2 = 0.391829

2、BP 过程

输出层到第二层隐层,以求 w_7  为例:

\frac{\partial E_{total}}{\partial w_7} =\frac{\partial E_{total}}{\partial out_{o_1}}*\frac{\partial out_{o_1}}{\partial o_1}*\frac{\partial o_1}{\partial w_7}

下面我们分别求上式的三个部分,其中第一部分:

E_{o_1} = \frac{1}{2}(target_{o_1} - out_{o_1})^2

E_{total} = E_{o_1}+E_{o_2}=\frac{1}{2}(target_{o_1} - out_{o_1})^2+\frac{1}{2}(target_{o_2} - out_{o_2})^2

\frac{\partial E_{total}}{\partial out_{o_1}} = 2 * \frac{1}{2}(target_{o_1}-out_{o_1}) * (-1) + 0 = -(0.01 - 0.891090) = 0.88109

第二分部因为:

out_{o1} = \frac{1}{1+e^{-o_1}}

\begin{align*} {out_{o_1}}'=\frac{e^{-o_1}}{(1+e^{-o_1})^2}=\frac{1+e^{-o_1}-1}{(1+e^{-o_1})^2}=\frac{1}{1+e^{-o_1}}-\frac{1}{(1+e^{-o_1})^2}=out_{o_1}(1- out_{o_1}) \end{align*}

\frac{\partial out_{o_1}}{\partial o_1} = out_{o_1}(1 - out_{o_1}) = 0.891090(1 - 0.891090) = 0.097049

第三部分,因为:

o_1 = w_7 * out_{h_1}+w_9 * out_{h_2}+w_{10} * out_{h_3}+b_2*1

\frac{\partial o_1}{\partial w_7} = out_{h_1} + 0 + 0+0=0.912934

最终得到:

\frac{\partial E_{total}}{\partial w_7} =0.88109*0.097049*0.912934=0.078064

更新 w_7 的值:

\hat{w_7} = w_7 + \Delta w_7 = w_7 - \eta \frac{\partial E_{total}}{\partial w_7} =0.4 - 0.5 * 0.078064=0.360968

同理可以求出:

\hat{w_8} = 0.453383

\hat{w_9} = 0.458137

\hat{w_{10}} = 0.553629 

\hat{w_{11}} = 0.557448

\hat{w_{12}} = 0.653688

第二层隐层到第一层隐层,以求 w_1 为例:

\frac{\partial E_{total}}{\partial w_1} = \frac{\partial E_{total}}{\partial out_{h_1}}* \frac{\partial out_{h_1}}{\partial h_1}* \frac{\partial h_1}{\partial w_1}

\frac{\partial E_{total}}{\partial w_1}=\Big(\frac{\partial E_{o_1}}{\partial out_{h_1}} + \frac{\partial E_{o_2}}{\partial out_{h_1}}\Big)* \frac{\partial out_{h_1}}{\partial h_1}* \frac{\partial h_1}{\partial w_1}

\frac{\partial E_{o_1}}{\partial out_{h_1}}=\frac{\partial E_{o_1}}{\partial out_{o_1}} * \frac{\partial out_{o_1}}{\partial o_1}*\frac{\partial o_1}{\partial out_{h_1}}

下面我们分别计算,第一部分:

\frac{\partial E_{o_1}}{\partial out_{h_1}}=\frac{\partial E_{o_1}}{\partial out_{o_1}} * \frac{\partial out_{o_1}}{\partial o_1}*\frac{\partial o_1}{\partial out_{h_1}}

其中:

E_{o_1} = \frac{1}{2}(target_{o_1} - out_{o_1})^2

out_{o1} = \frac{1}{1+e^{-o_1}}

o_1 = w_7 * out_{h_1}+w_9 * out_{h_2}+w_{10} * out_{h_3}+b_2*1

\frac{\partial E_{o_1}}{\partial out_{h_1}}=-(target_{o_1} - out_{o_1})*out_{o_1}*(1- out_{o_1})*\hat{w_7}

注意:这里由于是反向传播,此时要用到之前更新后的 w_7​ 的值

\begin{align*}\frac{\partial E_{o_1}}{\partial out_{h_1}}=-(0.01 - 0.891090)*0.891090*(1-0.891090)*0.360968=0.030866 \end{align*}

同理计算:

\frac{\partial E_{o_2}}{\partial out_{h_1}} = \frac{\partial E_{o_2}}{\partial out_{o_2}} * \frac{\partial out_{o_2}}{\partial o_2}*\frac{\partial o_2}{\partial out_{h_1}}

\begin{align*} \frac{\partial E_{o_2}}{\partial out_{h_1}} &= -(target_{o_2} -out_{o_2})*out_{o_2}(1-out_{o_2})*w_8 \\ & =-(0.99-0.904330)*0.904330*(1-0.904330)*0.453383\\ &=-0.003360 \end{align*}

接着计算第二部分:

\frac{\partial out_{h_1}}{\partial h_1}=out_{h_1}*(1-out_{h_1}) =0.912934*(1-0.912934)=0.079486

接着计算第三部分:

\frac{\partial h_1}{\partial w_1} = l_1 = 5

最终整合起来:

\frac{\partial E_{total}}{\partial w_1} = (0.030866 + (-0.003360))*0.079486 *5=0.010932

于是更新 w_1

\hat{w_1} = w_1 + \Delta w_1 = w_1 - \eta \frac{\partial E_{total}}{\partial w_1} = 0.1 - 0.5 *0.010932 =0.094534

同理求出:w_2, w_3,w_4,w_5,w_6

以上是第一次迭代,经过多次迭代,最终的误差会越来越小

上图可以看出,当迭代1000次时,输出为 O=(0.022971,0.977675) 和原本的 O=(0.01,0.99) 比较接近了。

python代码

https://github.com/flepeng/code/blob/master/DL/bp_demo.py

import numpy as np

# 初始值
w = [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65]
# 偏置项b不进行更新
b = [0.35, 0.65]

l = [5, 10]

# sigmoid函数
def sigmoid(z):
    return 1.0 / (1 + np.exp(-z))


def f1(w, b, l):
    # 前向传播,计算结果值
    h1 = sigmoid(w[0] * l[0] + w[1] * l[1] + b[0])
    h2 = sigmoid(w[2] * l[0] + w[3] * l[1] + b[0])
    h3 = sigmoid(w[4] * l[0] + w[5] * l[1] + b[0])

    o1 = sigmoid(w[6] * h1 + w[8] * h2 + w[10] * h3 + b[1])
    o2 = sigmoid(w[7] * h1 + w[9] * h2 + w[11] * h3 + b[1])

    # 后向传播,更新w
    # 输出层到第二层隐层,前两项
    # 公式中的第一部分-(0.01 - o1),第二部分o1 * (l - o1)
    t1 = -(0.01 - o1) * o1 * (l - o1)
    # 第二层隐层到第一层隐层,前两项
    t2 = -(0.99 - o2) * o2 * (l - o2)
    # t1*第三部分,即输出层到第二层隐层的参数梯度
    w[6] = w[6] - 0.5 * (t1 * h1)
    w[8] = w[8] - 0.5 * (t1 * h2)
    w[10] = w[10] - 0.5 * (t1 * h3)
    w[7] = w[7] - 0.5 * (t2 * h1)
    w[9] = w[9] - 0.5 * (t2 * h2)
    w[11] = w[11] - 0.5 * (t2 * h3)

    # (t1 * w[6] + t2 * w[7])对于公式()中的两项,h1 * (1 - h1)对于第二项,l[0]对应第三项
    w[0] = w[0] - 0.5 * (t1 * w[6] + t2 * w[7]) * h1 * (1 - h1) * l[0]
    w[1] = w[1] - 0.5 * (t1 * w[6] + t2 * w[7]) * h1 * (1 - h1) * l[1]
    w[2] = w[2] - 0.5 * (t1 * w[8] + t2 * w[9]) * h2 * (1 - h2) * l[0]
    w[3] = w[3] - 0.5 * (t1 * w[6] + t2 * w[9]) * h2 * (1 - h2) * l[1]
    w[4] = w[4] - 0.5 * (t1 * w[10] + t2 * w[11]) * h3 * (1 - h3) * l[0]
    w[5] = w[5] - 0.5 * (t1 * w[10] + t2 * w[11]) * h3 * (1 - h3) * l[1]

    return o1, o2, w


for i in range(1000):
    r1, r2, w = f1(w, b, l)
    print("第{}次迭代后,结果值为:({},{}),权重更新为:{}".format(i+1, r1, r2, w))

 

  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值