一、BP神经网络
BP神经网络是一种按照误差逆向传播算法训练的多层前馈神经网络,是应用最广泛也是最简单的神经网络模型之一。
1.1 BP神经网络的结构
- 神经网络结构如下
- 其中 x 1 , x 2 x_{1},x_{2} x1,x2为输入, φ \varphi φ为激活函数。BP神经网络是一种按误差反向传播(简称误差反传)训练的多层前馈网络,其算法称为BP算法,它的基本思想是梯度下降法,利用梯度搜索技术,以期使网络的实际输出值和期望输出值的误差均方差为最小。其均方误差表达式为 F = e 1 2 + e 2 2 = ( d 1 − y 1 ) 2 + ( d 2 − y 2 ) 2 \begin{align} F=e_{1}^{2}+e_{2}^{2}=(d_{1}-y_{1})^{2}+(d_{2}-y_{2})^{2} \end{align} F=e12+e22=(d1−y1)2+(d2−y2)2其中 e 1 , e 2 e_{1},e_{2} e1,e2为误差, d 1 , d 2 d_{1},d_{2} d1,d2为样本真实标签, y 1 , y 2 y_{1},y_{2} y1,y2为预测输。
1.2 BP神经网络算法流程
- BP神经网络包括前向传播的反向传播,其算法流程图如图所示,
首先给定算法模型的输入输出,也就是所说的训练集。接着求取模型的实际输出,就是模型计算输出的真实值。之后进行误差的计算,当误差满足所设需求时,一般为90%-95%,或者循环次数达到所设置值时,则结束训练。否则求取误差梯度和权值阈值更新,重复求取实际输出。
1.3 前向传播
输入信号通过隐含层作用于输出节点,经过非线性变换,产生输出信号,若实际输出与期望输出不相符,则转入误差的反向传播过程。以 y 1 y_{1} y1这一条线路为例,
其前向传播公式为 y 1 ( 1 ) = φ ( x 1 ⋅ w 11 ( 1 ) + x 2 ⋅ w 12 ( 1 ) ) \begin{align} y_{1}^{(1)}=\varphi(x_{1}\cdot w_{11}^{(1)}+x_{2}\cdot w_{12}^{(1)}) \end{align} y1(1)=φ(x1⋅w11(1)+x2⋅w12(1))令 v 1 ( 1 ) = x 1 ⋅ w 11 ( 1 ) + x 2 ⋅ w 12 1 v_{1}^{(1)}=x_{1}\cdot w_{11}^{(1)}+x_{2}\cdot w_{12}^{1} v1(1)=x1⋅w11(1)+x2⋅w121,则上式变为
y 1 ( 1 ) = φ ( v 1 ( 1 ) ) \begin{align} y_{1}^{(1)}=\varphi{(v_{1}^{(1)})} \end{align} y1(1)=φ(v1(1))最终前向传播的结果为 y 1 = φ ( y 1 ( 1 ) ⋅ w 11 ( 2 ) + y 2 ( 1 ) ⋅ w 12 ( 2 ) ) \begin{align} y_{1}=\varphi{(y_{1}^{(1)}\cdot w_{11}^{(2)}+y_{2}^{(1)}\cdot w_{12}^{(2)})} \end{align} y1=φ(y1(1)⋅w11(2)+y2(1)⋅w12(2))最终误差为: e 1 2 = [ d 1 − φ ( y 1 ( 1 ) ⋅ w 11 ( 2 ) + y 2 ( 1 ) ⋅ w 12 ( 2 ) ) ] 2 \begin{align} e_{1}^{2}=[d_{1}-\varphi{(y_{1}^{(1)}\cdot w_{11}^{(2)}+y_{2}^{(1)}\cdot w_{12}^{(2)})}]^{2} \end{align} e12=[d1−φ(y1(1)⋅w11(2)+y2(1)⋅w12(2))]2令 v 1 ( 2 ) = y 1 ( 1 ) ⋅ w 11 ( 2 ) + y 2 ( 1 ) ⋅ w 12 ( 2 ) v_{1}^{(2)}=y_{1}^{(1)}\cdot w_{11}^{(2)}+y_{2}^{(1)}\cdot w_{12}^{(2)} v1(2)=y1(1)⋅w11(2)+y2(1)⋅w12(2)得到, e 1 2 = [ d 1 − φ ( v 1 ( 2 ) ) ] 2 \begin{align} e_{1}^{2}=[d_{1}-\varphi{(v_{1}^{(2)})}]^{2} \end{align} e12=[d1−φ(v1(2))]2至此,前向传播完毕。
1.4 反向传播
将输出误差通过隐含层向输入层逐层反传,并将误差分摊给各层所有单元,以从各层获得的误差信号作为调整各单元权值的依据。通过调整输入节点与隐层节点的联接强度和隐层节点与输出节点的联接强度以及阈值,使误差沿梯度方向下降,经过反复学习训练,确定与最小误差相对应的网络参数(权值和阈值),训练即告停止。其权值的更新需要求偏导,公式如下:
∂ F ∂ w 11 ( 2 ) = ∂ e 1 2 ∂ w 11 ( 2 ) + ∂ e 2 2 ∂ w 11 ( 2 ) \begin{align} \frac{\partial{F}}{\partial{w_{11}^{(2)}}}=\frac{\partial{e_{1}^{2}}}{\partial{w_{11}^{(2)}}}+\frac{\partial{e_{2}^{2}}}{\partial{w_{11}^{(2)}}} \end{align} ∂w11(2)∂F=∂w11(2)∂e12+∂w11(2)∂e22以 ∂ e 1 2 ∂ w 11 ( 2 ) \frac{\partial{e_{1}^{2}}}{\partial{w_{11}^{(2)}}} ∂w11(2)∂e12为例,将 e 1 2 e_{1}^{2} e12带入得: ∂ e 1 2 ∂ w 11 ( 2 ) = − 2 [ d 1 − φ ( y 1 ( 1 ) ⋅ w 11 ( 2 ) + y 2 ( 1 ) ⋅ w 12 ( 2 ) ) ] φ ′ ( y 1 ( 1 ) ⋅ w 11 ( 2 ) + y 2 ( 1 ) ⋅ w 12 ( 2 ) ) y 1 ( 1 ) = − 2 e 1 φ ′ ( v 1 ( 2 ) ) y 1 ( 1 ) \begin{align} \frac{\partial{e_{1}^{2}}}{\partial{w_{11}^{(2)}}}&=-2[d_{1}-\varphi{(y_{1}^{(1)}\cdot w_{11}^{(2)}+y_{2}^{(1)}\cdot w_{12}^{(2)})}]\varphi^{'}{(y_{1}^{(1)}\cdot w_{11}^{(2)}+y_{2}^{(1)}\cdot w_{12}^{(2)})}y_{1}^{(1)}\\ &=-2e_{1}\varphi^{'}{(v_{1}^{(2)})}y_{1}^{(1)} \end{align} ∂w11(2)∂e12=−2[d1−φ(y1(1)⋅w11(2)+y2(1)⋅w12(2))]φ′(y1(1)⋅w11(2)+y2(1)⋅w12(2))y1(1)=−2e1φ′(v1(2))y1(1)同理可以得到 ∂ e 1 2 ∂ w 12 ( 2 ) = − 2 e 1 φ ′ ( v 1 ( 2 ) ) y 2 ( 1 ) ∂ e 2 2 ∂ w 21 ( 2 ) = − 2 e 2 φ ′ ( v 2 ( 2 ) ) y 1 ( 1 ) ∂ e 1 2 ∂ w 22 ( 2 ) = − 2 e 2 φ ′ ( v 2 ( 2 ) ) y 2 ( 1 ) \begin{align} &\frac{\partial{e_{1}^{2}}}{\partial{w_{12}^{(2)}}}=-2e_{1}\varphi^{'}{(v_{1}^{(2)})}y_{2}^{(1)}\\ &\frac{\partial{e_{2}^{2}}}{\partial{w_{21}^{(2)}}}=-2e_{2}\varphi^{'}{(v_{2}^{(2)})}y_{1}^{(1)}\\ &\frac{\partial{e_{1}^{2}}}{\partial{w_{22}^{(2)}}}=-2e_{2}\varphi^{'}{(v_{2}^{(2)})}y_{2}^{(1)} \end{align} ∂w12(2)∂e12=−2e1φ′(v1(2))y2(1)∂w21(2)∂e22=−2e2φ′(v2(2))y1(1)∂w22(2)∂e12=−2e2φ′(v2(2))y2(1)接着可以使用梯度下降法进行权重的更新,即 w ( n e w ) = w ( o l d ) − η ⋅ ∂ F ∂ w \begin{align} w(new)=w(old)-\eta \cdot\frac{\partial{F}}{\partial{w}} \end{align} w(new)=w(old)−η⋅∂w∂F其中 η \eta η为学习率,最终结果写成矩阵形式为: [ w 11 ( 2 ) ( k + 1 ) w 12 ( 2 ) ( k + 1 ) w 21 ( 2 ) ( k + 1 ) w 11 ( 2 ) ( k + 1 ) ] = [ w 11 ( 2 ) ( k ) w 12 ( 2 ) ( k ) w 21 ( 2 ) ( k ) w 11 ( 2 ) ( k ) ] + 2 η [ e 1 φ ′ ( v 1 ( 2 ) ) e 2 φ ′ ( v 2 ( 2 ) ) ] [ y 1 ( 1 ) y 2 ( 2 ) ] ( 14 ) \begin{bmatrix} w_{11}^{(2)}(k+1)&w_{12}^{(2)}(k+1) \\ w_{21}^{(2)}(k+1)& w_{11}^{(2)}(k+1) \end{bmatrix}=\begin{bmatrix} w_{11}^{(2)}(k)&w_{12}^{(2)}(k) \\ w_{21}^{(2)}(k)& w_{11}^{(2)}(k) \end{bmatrix}+2\eta \begin{bmatrix} e_{1}\varphi^{'}{(v_{1}^{(2)})}\\ e_{2}\varphi^{'}{(v_{2}^{(2)})} \end{bmatrix}\begin{bmatrix} y_{1}^{(1)}&y_{2}^{(2)} \end{bmatrix}\hspace{1em}(14) [w11(2)(k+1)w21(2)(k+1)w12(2)(k+1)w11(2)(k+1)]=[w11(2)(k)w21(2)(k)w12(2)(k)w11(2)(k)]+2η[e1φ′(v1(2))e2φ′(v2(2))][y1(1)y2(2)](14)完成了对参数 w ( 2 ) w^{(2)} w(2)得更新,接下来进行 w ( 1 ) w^{(1)} w(1)的更新,对于 w ( 1 ) w^{(1)} w(1)的更新使用的是链式求导法则
∂ y ∂ x = ∂ y ∂ v ⋅ ∂ v ∂ x \begin{align} \frac{\partial{y}}{\partial{x}}=\frac{\partial{y}}{\partial{v}}\cdot \frac{\partial{v}}{\partial{x}} \end{align} ∂x∂y=∂v∂y⋅∂x∂v最终可以计算出上一层权值。这里不再进行推导演示。