反向传播算法实例

最新推荐文章于 2025-03-03 00:56:45 发布

Yale曼陀罗

最新推荐文章于 2025-03-03 00:56:45 发布

阅读量5.5k

点赞数 18

分类专栏：机器学习

本文链接：https://blog.csdn.net/weixin_42782150/article/details/105378900

版权

机器学习专栏收录该内容

9 篇文章

订阅专栏

反向传播算法实例

- - Step1 前向传播
  - Step2 反向传播

算例情景：

假设有一个如下图所示的三层结构的神经网络。其中，第一层是输入层，包含两个神经元 $i_1, i_2$ 和截距项 $b_1$ ；第二层是隐藏层，包含两个神经元 $h_1, h_2$ 和截距项 $b_2$ ；第三层是输出层，包含两个输出项 $o_1, o_2$ 。每条线上标的 $w_i$ 表示层与层之间连接的权重，在本算例中激活函数默认选择 $S i g m o i d$ 函数。
在这里插入图片描述
接下来，对上面的神经网络模型赋初值，所得结果如下图所示：

赋初值的具体取值情形如下，其中：

输入数据（原始输入）： $i_1=0.05, i_2=0.10;$
输出数据（期望输出）： $o_1=0.01, o_2=0.99;$
初始权重： $w_1=0.15, w_2=0.20, w_3=0.25, w_4=0.30;$
$\quad \quad \quad \quad w_5=0.40, w_6=0.45, w_7=0.50, w_8=0.55;$
截距项： $b_1=0.35, b_2=0.60;$

预期目标： 根据给定的输入数据 $i_1, i_2(0.05和0.10)$ ，通过反向传播算法进行“权值修正”，使得实际输出与期望输出 $o_1, o_2(0.01和0.99)$ 最接近（即：实际输出与期望输出之间误差最小）。

算法的基本原理：

算法实现过程主要分两步：
（1）前向传播： 求得初始状态下，实际输出和期望输出之间的总误差 $Δ_0$ ;
（2）反向传播： 根据 “链式求导法则” 对输出层、隐藏层的权值进行修正，从而缩小实际输出与期望输出之间的总误差。

算法的求解过程：

Step1 前向传播

1. 输入层----> 隐藏层：

由从神经网络第 $l - 1$ 层的第 $k$ 个节点到神经网络第 $l$ 层的第 $j$ 个节点的输出结果的计算公式： $z_j^l=\sum_kw_{jk}^la_k^{l-1}+b_j^l$ 可知：

本算例中，神经元 $h_1$ 的输入加权和 $net_{h_1}$ 的计算如下：
$\begin{aligned} net_{h_1}&=w_1*i_1+w_2*i_2+b_1*1\\ &=0.15*0.05+0.2*0.1+0.35*1\\ &=0.3775 \end{aligned}$
引入激活函数Sigmoid，计算神经元 $h_1$ 的输出 $out_{h_1}$ ：
$\begin{aligned} out_{h_1}&=\frac{1}{1+e^{-net_{h_1}}}\\ &=\frac{1}{1+e^{-0.3775}}\\ &=0.59327 \end{aligned}$
同理，可以计算出神经元 $h_2$ 的输出 $out_{h_2}$ 为：
$out_{h_2}=0.59688$
2. 隐藏层----> 输出层：

依次计算输出层神经元 $o_1$ 和 $o_2$ 的实际输出值，方法同上：
$\begin{aligned} net_{o_1}&=w_5*out_{h_1}+w_6*out_{h_2}+b_2*1\\ &=0.40*0.59327+0.45*0.59688+0.60*1\\ &=1.105904 \end{aligned}$
引入激活函数Sigmoid，计算神经元 $o_1$ 的输出 $out_{o_1}$ ：
$\begin{aligned} out_{o_1}&=\frac{1}{1+e^{-net_{o_1}}}\\ &=\frac{1}{1+e^{-1.105904}}\\ &=0.75136 \end{aligned}$
同理可以计算神经元 $o_2$ 的输出 $out_{o_2}$ 为：
$out_{o_2}=0.772928$
至此，通过前向传播计算神经网络实际输出的过程就结束了。最终求得实际输出值为 $[0.75136, 0.772928]$ ，与期望输出 $[0.01, 0.99]$ 还相差甚远。因此，接下来需要我们通过误差反向传播算法来更新权值，重新计算输出，以缩小实际输出与期望输出之间的总误差。

Step2 反向传播

1. 计算总误差：

本算例，引入均方根误差（MSE）求解实际输出与期望输出之间的总误差，计算公式为：

$E_{total}=\sum_{i=1}^n \frac{1}{2}(target-output)^2$

备注： $n$ 表示输出神经元的个数， $t a r g e t$ 表示期望输出， $o u t p u t$ 表示实际输出。

在本算例中，神经网络有两个输出神经元，故 $n = 2$ ，总误差即为两者之和：
$\begin{aligned} E_{o_1}&=\frac{1}{2}(target_{o_1}-out_{o_1})^2\\ &=\frac{1}{2}(0.01-0.75136)^2\\ &=0.2748 \end{aligned}$
同理，可以计算 $E_{o_2}$ 的值为：
$E_{o_2}=0.02356$
2. 输出层----> 隐藏层的权值更新：

核心思想： 对于整个神经网络的总误差 $E_{total}$ ，通过 “链式求导法则” 依次对各权重 $w_i$ 求偏导，从而得知权重 $w_i$ 对总误差产生了多少影响。

以总误差 $E_{total}$ 对权重 $w_5$ 的偏导过程为例，其误差的反向传播过程如图所示：
在这里插入图片描述
根据“链式求导法则”，计算公式为：
$\frac{\partial E_{total}}{\partial w_5}=\frac{E_{total}}{\partial{out_{o_1}}}*\frac{\partial{out_{o_1}}}{\partial net_{o_1}}*\frac{{\partial{net_{o_1}}}}{\partial w_5}$
接下来，分别计算每个偏导分式的值：

（1）计算 $\frac{E_{total}}{\partial{out_{o_1}}}$ 的值：

$E_{total}=\frac{1}{2}(target_{o_1}-out_{o_1})^2+\frac{1}{2}(target_{o_2}-out_{o_2})^2$
$\begin{aligned} \frac{E_{total}}{\partial{out_{o_1}}}&=2*\frac{1}{2}(target_{o_1}-out_{o_1})^{2-1}*(-1)+0\\ &=-(target_{o_1}-out_{o_1})\\ &=-(0.01-0.75136)\\ &=0.74136 \end{aligned}$

（2）计算 $\frac{\partial{out_{o_1}}}{\partial net_{o_1}}$ 的值：
$out_{o_1}=\frac{1}{1+e^{-net_{o_1}}}$

备注： 这一步实际上就是对Sigmoid函数求导。

$\begin{aligned} \frac{\partial{out_{o_1}}}{\partial net_{o_1}}&=out_{o_1}(1-out_{o_1})\\ &=0.75136*(1-0.75136)\\ &=0.1868 \end{aligned}$
（3）计算 $\frac{{\partial{net_{o_1}}}}{\partial w_5}$ 的值：

$net_{o_1}=w_5*out_{h_1}+w_6*out_{h2}+b_2*1$

$\begin{aligned} \frac{{\partial{net_{o_1}}}}{\partial w_5}&=out_{h_1}+0+0\\ &=out_{h_1}\\ &=0.59327 \end{aligned}$
最终，根据 “链式求导法则” 将三者相乘得最终结果：
$\begin{aligned} \frac{\partial E_{total}}{\partial w_5}&=\frac{E_{total}}{\partial{out_{o_1}}}*\frac{\partial{out_{o_1}}}{\partial net_{o_1}}*\frac{{\partial{net_{o_1}}}}{\partial w_5}\\ &=0.74136*0.1868*0.59327\\ &=0.08216 \end{aligned}$
至此，就完成了计算总体误差 $E_{total}$ 对 $w_5$ 的求偏导全过程。

回顾上述求解过程，发现：

—————————————手动分割线—————————————

$\begin{aligned} \frac{\partial E_{total}}{\partial w_5}&=\frac{E_{total}}{\partial{out_{o_1}}}*\frac{\partial{out_{o_1}}}{\partial net_{o_1}}*\frac{{\partial{net_{o_1}}}}{\partial w_5}\\ &=-(target_{o_1}-out_{o_1})*out_{o_1}(1-out_{o_1})*out_{h_1} \end{aligned}$

—————————————手动分割线—————————————

为了方便表达，用 $\delta_{o_1}$ 来表示输出层的误差：

$\begin{aligned} \delta_{o_1}&=\frac{E_{total}}{\partial{out_{o_1}}}*\frac{\partial{out_{o_1}}}{\partial net_{o_1}}\\ &=-(target_{o_1}-out_{o_1})*out_{o_1}(1-out_{o_1}) \end{aligned}$
因此，总误差 $E_{total}$ 对权重 $w_5$ 的偏导公式可以改写为：

—————————————手动分割线—————————————

$\begin{aligned} \frac{\partial E_{total}}{\partial w_5}&=\delta_{o_1}*out_{h_1} \end{aligned}$

—————————————手动分割线—————————————

备注： 如果遇到输出层误差为负的情形，也可以将上述结果改写为：
$\begin{aligned} \frac{\partial E_{total}}{\partial w_5}&=-\delta_{o_1}*out_{h_1} \end{aligned}$

最后更新各个权重值：

在本算例中，依旧以计算更新 $w_5$ 的权重值为例：

$\begin{aligned} w_5^+&=w_5-\eta*\frac{\partial E_{total}}{\partial w_5}\\ &=0.4-0.5*0.08216\\ &=0.35892 \end{aligned}$

备注： 此处， $\eta$ 表示学习率，这里假设 $\eta=0.5$ 。

同理，可以计算更新 $w_6, w_7, w_8$ 的值分别为：
$\begin{aligned} & w_6^+=0.40866\\ & w_7^+=0.51130\\ & w_8^+=0.56137 \end{aligned}$

3. 隐藏层----> 输入层的权值更新：

核心原理： 计算隐藏层----> 输入层的权值更新的方法其实和计算输出层----> 隐藏层的权值更新方法类似。区别之处在于，上文在计算总误差对 $w_5$ 的偏导时，是从 $out_{o_1} → net_{o_1}→w_5$ ，但在计算隐藏层之间的权值更新时，是从 $out_{h_1} → net_{h_1}→w_1$ ，然而 $out_{h_1}$ 会接受 $E_{o_1}$ 和 $E_{o_2}$ 两个地方传递过来的误差，所以此处 $E_{out_{h1}}$ 为二者的加和。 求解隐藏层 $h_1$ 到输入层 $i_1$ 的更新权值 $w_1$ 的过程示意图如下：
在这里插入图片描述
计算过程：

根据 “链式求导法则”，计算隐藏层 $out_{h_1}$ 对 $w_1$ 的偏导数公式为：

$\begin{aligned} \frac{\partial E_{total}}{\partial w_1}=&\frac{\partial E_{total}}{\partial{out_{h_1}}}*\frac{\partial{out_{h_1}}}{\partial net_{h_1}}*\frac{{\partial{net_{h_1}}}}{\partial w_1}\\ &\downarrow\\ &\frac{\partial E_{total}}{\partial{out_{h_1}}}=\frac{\partial E_{o_1}}{\partial{out_{h_1}}}+\frac{\partial E_{o_2}}{\partial{out_{h_1}}} \end{aligned}$

（1）计算 $\frac{\partial E_{total}}{\partial{out_{h_1}}}$ ：

接下来，分别计算 $\frac{\partial E_{o_1}}{\partial{out_{h_1}}}$ 和 $\frac{\partial E_{o_2}}{\partial{out_{h_1}}}$ ：

根据公式

—————————————手动分割线—————————————

$\begin{aligned} \frac{\partial E_{o_1}}{\partial{out_{h_1}}} &=\frac{\partial E_{o_1}}{\partial{out_{o_1}}}*\frac{\partial{out_{o_1}}}{\partial{net_{o_1}}}*\frac{\partial net_{o_1}}{\partial{out_{h_1}}} \end{aligned}$

—————————————手动分割线—————————————

其中，

第一项

由于
$E_{o_1}=\frac{1}{2}(target_{o_1}-out_{o_1})^2$
所以，
$\begin{aligned} \frac{\partial E_{o_1}}{\partial out_{o_1}}&=-(target_{o_1}-out_{o_1})\\ &=-(0.01-0.75136)\\ &=0.74136 \end{aligned}$

第二项

由于
$out_{o_1}=\frac{1}{1+e^{-net_{o_1}}}$
所以，
$\begin{aligned} \frac{\partial out_{o_1}}{\partial net_{o_1}}&=out_{o_1}(1-out_{o_1})\\ &=0.75136*(1-0.75136)\\ &=0.1868 \end{aligned}$

第三项

由于
$net_{o_1}=w_5*out_{h_1}+w_6*out_{h_2}+b_2*1$
因此，
$\frac{\partial net_{o_1}}{\partial out_{h_1}}=w_5=0.04$

最后：
$\begin{aligned} \frac{\partial E_{o_1}}{\partial{out_{h_1}}} &=\frac{\partial E_{o_1}}{\partial{out_{o_1}}}*\frac{\partial net_{o_1}}{\partial{out_{h_1}}}\\ &=0.74136*0.1868*0.40\\&=0.055399\end{aligned}$

同理，可以计算出 $\frac{\partial E_{o_2}}{\partial{out_{h_1}}}$ :
$\frac{\partial E_{o_1}}{\partial{out_{h_1}}}=-0.019049$

最终，二者相加得到总值：
$\begin{aligned} \frac{\partial E_{total}}{\partial{out_{h_1}}}&=\frac{\partial E_{o_1}}{\partial{out_{h_1}}}+\frac{\partial E_{o_2}}{\partial{out_{h_1}}}\\ &=0.055399+(-0.019049)\\ &=0.03635 \end{aligned}$

（2）计算 $\frac{\partial{out_{h_1}}}{\partial{net_{h_1}}}$ ：

由于

(a) $\begin{aligned} net_{h_1}&=w_1*i_1+w_2*i_2+b_1*1\\ &=0.15*0.05+0.2*0.1+0.35*1\\ &=0.3775 \end{aligned}$
(b)
$out_{h_1}=\frac{1}{1+e^{-net_{h_1}}}=0.59327$
所以，
$\begin{aligned} \frac{\partial out_{h_1}}{\partial net_{h_1}}&=out_{h_1}(1-out_{h_1})\\ &=0.59327*(1-0.59327)\\ &=0.2413 \end{aligned}$
（3）计算 $\frac{\partial{net_{h_1}}}{\partial w_1}$ ：
由于
$\begin{aligned} net_{h_1}&=w_1*i_1+w_2*i_2+b_1*1 \end{aligned}$
所以，
$\begin{aligned} \frac{\partial{net_{h_1}}}{\partial w_1}&=i_1=0.05 \end{aligned}$

因此，求得 $\frac{\partial{E_{total}}}{\partial w_1}$ ：

$\begin{aligned} \frac{\partial E_{total}}{\partial w_1}&=\frac{\partial E_{total}}{\partial{out_{h_1}}}*\frac{\partial{out_{h_1}}}{\partial net_{h_1}}*\frac{{\partial{net_{h_1}}}}{\partial w_1}\\ &=0.03635*0.2413*0.05\\ &=0.000438 \end{aligned}$

备注： 为了简化公式，可以使用 $δ_{h_1}$ 来表示隐藏层单元 $h_1$ 的误差：
$\begin{aligned} \frac{\partial E_{total}}{\partial w_1}&=\bigg(\sum_o\frac{\partial E_{total}}{\partial{out_{o_1}}}*\frac{\partial out_{o_1}}{\partial net_{o_1}}*\frac{\partial net_{o_1}}{\partial out_{h_1}}\bigg)*\frac{\partial{out_{h_1}}}{\partial net_{h_1}}*\frac{{\partial{net_{h_1}}}}{\partial w_1}\\ &=\big(\sum_oδ_o*w_{h_o}\big)*out_{h_1}(1-out_{h_1})*i_1\\ &=δ_{h_1}*i_1 \end{aligned}$