BP反向传播算法推导

最新推荐文章于 2024-07-02 20:59:51 发布

DW_CK

最新推荐文章于 2024-07-02 20:59:51 发布

阅读量343

点赞数 1

文章标签：神经网络算法人工智能深度学习

本文链接：https://blog.csdn.net/DW_CK/article/details/118273633

版权

本文主要是在该文的基础上进行了自己的整理总结，如有不清楚的地方还请移步

1 准备工作

1.1 Sigmoid激活函数的导数

Sigmoid函数表达式：
$\begin{aligned} \sigma \left( x \right) =\frac{1}{1+e^{-x}} \end{aligned}$
其导数为：
$\begin{aligned} \\ \frac{d\sigma \left( x \right)}{dx}&=\frac{d}{dx}\left( \frac{1}{1+e^{-x}} \right) \\ &=\frac{e^{-x}}{\left( 1+e^{-x} \right) ^2}=\frac{\left( 1+e^{-x} \right) -1}{\left( 1+e^{-x} \right) ^2} \\ &=\frac{1+e^{-x}}{\left( 1+e^{-x} \right) ^2}-\left( \frac{1}{1+e^{-x}} \right) ^{\begin{array}{c}2 \\ \end{array}} \\ &=\sigma \left( x \right) -\sigma \left( x \right) ^2=\sigma \left( 1-\sigma \right) \end{aligned}$

2 神经网络结构图

$i$ 为输入， $h$ 和 $o$ 为两个全连接层

对神经网络的权重 $w$ 和偏置 $b$ 的参数进行初始化，如下图所示：

3 前向传播

1. 对 $h$ 层：

1.计算 $h_1$ 节点的全部输入
$\begin{aligned} net_{h1}&=w_1\times i_1+w_2\times i_2+b_1\times 1 \\ &=0.15\times 0.05+0.2\times 0.1+0.35\times 1 \\ &=0.3775 \end{aligned}$
2.计算 $h_1$ 节点的输出。 $h_1$ 节点的公式应该为 $out_{h1}=\sigma \left( wx+b \right)$ 其中， $x$ 为该节点的输入（在此处即 $i$ ）, $w$ 为权重， $b$ 为偏置， $\sigma$ 为激活函数（此处采用的激活函数即准备工作1.1中的Sigmoid激活函数）。则 $h_1$ 节点的输出为：
$\begin{aligned} out_{h1}&=\sigma \left( wx+b \right) \\ &=\sigma \left( net_{h1} \right) \\ &=\frac{1}{1+e^{-net_{h1}}}=\frac{1}{1+e^{-0.3775}} \\ &=0.593269992 \end{aligned}$
3.用同样的方法得 $out_{h2}=0.596884378$

2.对 $o$ 层

1.对 $o$ 层重复上述的过程：
$\begin{aligned} net_{o1}&=w_5\times out_{h1}+w_6\times out_{h2}+b_2\times 1 \\ &=0.4\times 0.593269992+0.45\times 0.596884378+0.6 \\ &=1.105905967 \end{aligned}$
则输出为：
$out_{o1}=\frac{1}{1+e^{-net_{o1}}}=0.75136507$
同理可以得到 $out_{o2}=0.772929456$

4 计算误差（Loss）

在这里采用均方差损失函数，其表达式为： $E_{total}=\frac{1}{2}\sum_{k=1}^K{\left( y_k-o_k \right) ^2}$ 其中， $y_k$ 为真实值（期望值）， $o_k$ 为输出值。

如上图2，对 $o_1$ 节点，其真实值为0.01，而神经网络经前向传播之后的输出值为0.75136507，则其误差为：
$\begin{aligned} E_{o1}&=\frac{1}{2}\left( target-output \right) ^2=\frac{1}{2}\times \left( 0.01-0.75136507 \right) ^2 \\ &=0.274811 \end{aligned}$
同理可得 $E_{o2}=0.023560026$

综上所述，可以得到总误差为： $E_{total}=E_{o1}+E_{o2}=0.274811+0.023560025=0.298371$

5 反向传播

1.对输出层（ $o$ 层）

对于 $w_5$ ，想知道其改变对于总误差有多大的影响，于是需要计算 $\frac{\partial E_{total}}{\partial w_5}$

根据链式法则： $\frac{\partial E_{total}}{\partial w_5}=\frac{\partial E_{total}}{\partial out_{o1}}\times \frac{\partial out_{o1}}{\partial net_{o1}}\times \frac{\partial net_{o1}}{\partial w_5}$

1.对 $\frac{\partial E_{total}}{\partial out_{o1}}$ ：

$\begin{aligned} E_{total}&=\frac{1}{2}\left( target_{o1}-out_{o1} \right) ^2+\frac{1}{2}\left( target_{o2}-out_{o2} \right) ^2 \\ \frac{\partial E_{total}}{\partial out_{o1}}&=2\times \frac{1}{2}\left( target_{o1}-out_{o1} \right) ^{2-1}\times \left( -1 \right) +0 \\ &=-\left( target_{o1}-out_{o1} \right) \\ &=-\left( 0.01-0.75136507 \right) =0.741365 \end{aligned}$
2.对 $\frac{\partial out_{o1}}{\partial net_{o1}}$ ：
$\begin{aligned} out_{o1}&=\frac{1}{1+e^{-net_{o1}}} \\ \frac{\partial out_{o1}}{\partial net_{o1}}&=out_{o1}\left( 1-out_{o1} \right) =0.186815602 \end{aligned}$
3.对 $\frac{\partial net_{o1}}{\partial w_5}$ ：
$\begin{aligned} net_{o1}&=w_5\times out_{h1}+w_6\times out_{h2}+b_2\times 1 \\ \frac{\partial net_{o1}}{\partial w_5}&=1\times out_{h1}\times w_{5}^{\left( 1-1 \right)}+0+0=0.593269992 \end{aligned}$
综上所述，
$\begin{aligned} \frac{\partial E_{total}}{\partial w_5}&=\frac{\partial E_{total}}{\partial out_{o1}}\times \frac{\partial out_{o1}}{\partial net_{o1}}\times \frac{\partial net_{o1}}{\partial w_5} \\ &=0.741365\times 0.186815602\times 0.593269992 \\ &=0.082167 \end{aligned}$

接下来是使用优化器根据该值调整权重 $w_5$ （关于优化器的更多介绍：Link），在这里采用最基础的标准梯度下降法(GD)，设置学习率 $\eta=0.5$ ：
$w_{5}^{+}=w_5-\eta \times \frac{\partial E_{total}}{\partial w_5}=0.4-0.5\times 0.082167041=0.358916$
于是就得到了权重 $w_5$ 更新之后的值 $w_{5}^{+}$

重复上述相同的步骤即可得到 $w_{6}^{+}$ ， $w_{7}^{+}$ ， $w_{8}^{+}$

2.对隐藏层（h层）

对h层的步骤与对输出层类似，需要计算： $\frac{\partial E_{total}}{\partial h_5}=\frac{\partial E_{total}}{\partial out_{h1}}\times \frac{\partial out_{h1}}{\partial net_{h1}}\times \frac{\partial net_{h1}}{\partial w_5}$

后续的步骤就不再赘述，即可求得 $w_{1}^{+}$ ， $w_{2}^{+}$ ， $w_{3}^{+}$ ， $w_{4}^{+}$

综上，不断的重复上述的操作，就能实现对网络中每一层的参数进行更新，达到训练的效果。

在神经网络训练时，每经过一个batch，就会通过上述的操作对权重进行更新，在经过多个Epoch之后，网络达到收敛（即误差值很小），前向传播的输出值与我们期望值相差很小，整个网络的训练完成。

DW_CK

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
BP反向传播算法推导

本文主要是在该文的基础上进行了自己的整理总结，如有不清楚的地方还请移步原文1 准备工作1.1 Sigmoid激活函数的导数Sigmoid函数表达式：σ(x)=11+e−x\sigma \left( x \right) =\frac{1}{1+e^{-x}}σ(x)=1+e−x1其导数为：dσ(x)dx=ddx(11+e−x)=e−x(1+e−x)2=(1+e−x)−1(1+e−x)2=1+e−x(1+e−x)2−(11+e−x)2=σ(x)−σ(x)2=σ(1−σ)\\\frac{d
复制链接

扫一扫