前向后向传播算法

文章内容是我在《深度之眼》学习《DEEP LEARNING》过程中的作业笔记。

在推导过程中使用的简化版神经网络结构如图1所示。

图1  神经网络简化图

假设X为N*M矩阵(即 输入包含N条样本,每条样本包含M维特征)

h1与z1的维数为m1\rightarrow \omega _{1}为m*m1的矩阵,b_{1}\in R^{m1}为m*m1的矩阵,b_{1}\in R^{m1}注:神经网络中处理bias单元通常有两种方式,第一种处理方式是bias本身值为1,但它连接各个神经元的权重不为1,即---整个神经网络只有1个bias,对应有多个不同的权重(权重个数等于hide层和out层神经元的个数。第二种处理方式是bias连接各个神经元的所有权重都为1,但bias本身不为1,即---有多个bias,但所有的bias对应的权重都为1(bias的个数等于hide层和out层神经元的个数)。本文中使用的是第二种处理方式,即(\omega ,b)形式的网络参数。

 h2与z2的维数为m2\rightarrow \omega _{2}为m1*m2的矩阵,b_{2}\in R^{m2}为m1*m2的矩阵,b_{2}\in R^{m2}

……

h_{L}Z_{L}的维数为m_{L}\rightarrow\omega _{L}m_{L-1}*m_{L}的矩阵,b_{L}\in R^{m_{L}}

a.推导前向传播算法

 h_{1}=x\omega _{1}+\tilde{b}_{1},Z_{1}=f_{1}(h_{1}),\tilde{b}_{1}b_{1}^{T}后沿着行方向扩展N行,f()为非线性变换。

h_{2}=Z_{1}\omega _{2}+\tilde{b_}{2},Z_{2}=f_{2}(h_{2})

……

h_{L}=Z_{l-1}\omega _{L}+\tilde{b}_{L},Z_{L}=f_{L}(h_{L})

out=Z_{L}\omega _{L+1}+\tilde{b}_{L+1}

假设输出为n维,则out矩阵为N*n维,根据mse或者ce准则来计算J以及\frac{\partial J}{\partial out}

b.推导损失函数对线性层的梯度

out=Z_{L}\omega _{L+1}+\tilde{b}_{L+1}

Z_{L}=\begin{pmatrix} Z_{11} &Z_{12} &Z_{13} \\ Z_{21} &Z_{22} &Z _{23} \end{pmatrix},W_{L+1}=\begin{pmatrix} w_{11} &w _{12}\\ w_{21}&w _{22}\\ w_{31}&w_{32} \end{pmatrix},\tilde{b}_{L+1}=\begin{pmatrix} b_{1} &b_{2} \\ b_{1} &b_{2} \end{pmatrix},out=\begin{pmatrix} O_{11} &O_{12} \\ O_{21}& O_{22} \end{pmatrix}

Z_{L}\cdot W_{L+1}=\begin{pmatrix} z_{11}w_{11} +z_{12}w_{21}+z_{13}w_{31}&z_{11}w_{12}+z_{12}w_{22}+z_{13}w_{32} \\ z_{21}w_{11} +z_{22}w_{21}+z_{23}w_{31} & z_{21}w_{12}+z_{22}w_{22}+z_{23}w_{32} \end{pmatrix}

O_{11}=z_{11}w_{11} +z_{12}w_{21}+z_{13}w_{31}+b_{1}

O_{12}=z_{11}w_{12}+z_{12}w_{22}+z_{13}w_{32}+b_{2}

O_{21}=z_{21}w_{11} +z_{22}w_{21}+z_{23}w_{31} +b_{1}

O_{22}=z_{21}w_{12}+z_{22}w_{22}+z_{23}w_{32} +b_{22}

\frac{\partial J}{\partial _{w_{11}}}=\frac{\partial J}{\partial _{o_{11}}}z_{11}+\frac{\partial J}{\partial _{o_{21}}}z_{21},\frac{\partial J}{\partial _{w_{12}}}=\frac{\partial J}{\partial _{o_{12}}}z_{11}+\frac{\partial J}{\partial _{o_{22}}}z_{21}

\frac{\partial J}{\partial _{w_{21}}}=\frac{\partial J}{\partial _{o_{11}}}z_{12}+\frac{\partial J}{\partial _{o_{21}}}z_{22},\frac{\partial J}{\partial _{w_{22}}}=\frac{\partial J}{\partial _{o_{12}}}z_{12}+\frac{\partial J}{\partial _{o_{22}}}z_{22}

\frac{\partial J}{\partial _{w_{31}}}=\frac{\partial J}{\partial _{o_{11}}}z_{13}+\frac{\partial J}{\partial _{o_{21}}}z_{23},\frac{\partial J}{\partial _{w_{32}}}=\frac{\partial J}{\partial _{o_{12}}}z_{13}+\frac{\partial J}{\partial _{o_{22}}}z_{23}

\begin{pmatrix} \frac{\partial J}{\partial w_{11}} & \frac{\partial J}{\partial w_{12}} \\ \frac{\partial J}{\partial w_{21}} & \frac{\partial J}{\partial w_{22}} \\ \frac{\partial J}{\partial w_{31}} & \frac{\partial J}{\partial w_{31}} \end{pmatrix}=\begin{pmatrix} z_{11} &z_{12} \\ z_{21} & z_{22} \\ z_{31} & z_{32} \end{pmatrix}\begin{pmatrix} \frac{\partial J}{\partial O_{11}} &\frac{\partial J}{\partial O_{12}} \\ \frac{\partial J}{\partial O_{21}}& \frac{\partial J}{\partial O_{22}} \end{pmatrix}

由以上推导不难发现\frac{\partial J}{\partial W_{L+1}}=Z_{L}^{T}\frac{\partial J}{\partial \partial out}

\left\{\begin{matrix} \frac{\partial J}{\partial b_{1}}=\frac{\partial J}{\partial O_{11}}+\frac{\partial J}{\partial O_{21}}\\ \frac{\partial J}{\partial b_{2}}=\frac{\partial J}{\partial O_{12}}+\frac{\partial J}{\partial O_{22}} \end{matrix}\right.,\Rightarrow \begin{pmatrix} \frac{\partial J}{\partial b} \end{pmatrix}^{T}=\begin{pmatrix} \frac{\partial J}{\partial b_{1}} & \frac{\partial J}{\partial b_{2}} \end{pmatrix}=\begin{pmatrix} \frac{\partial J}{\partial O_{11}}+\frac{\partial J}{\partial O_{21}} & \frac{\partial J}{\partial O_{12}}+\frac{\partial J}{\partial O_{22}} \end{pmatrix},也就是将\frac{\partial J}{\partial out}的每一行加起来。

图2 反向传播算法图

在如图2所示的反向传播算法图中可以看出,反向传播计算参数梯度时\frac{\partial J}{\partial h_{L} } 是必须要求解的,在求解\frac{\partial J}{\partial h_{L} }时,需要知道\frac{\partial J}{\partial Z_{L}}

\frac{\partial J}{\partial Z_{L}}=\frac{\partial J}{\partial out}W_{L+1}^{T}

c.推导损失函数对非线性层的梯度

非线性层f_{L}使用的函数一般为三种:sigmoid函数、tanh函数或者ReLu函数。

  1. f_{L}为sigmoid函数时,

            Z_{}=\frac{1}{1+e^{-h_{L}}}

            \frac{\partial J}{\partial h_{L}}=\frac{\partial J}{\partial z_{L}}\frac{dz_{L}}{d_{h_{L}}}=\frac{\partial J}{\partial z_{L}}\frac{e^{-h_{L}}}{(1+e^{-h_{L}})^{2}}=\frac{\partial J}{\partial z_{L}}\frac{1}{1+e^{-h_{L}}}\frac{e^{-h_{L}}}{1+e^{-h_{L}}}=\frac{\partial J}{\partial z_{L}}Z_{L}(1-_{Z_{L}})

      2.f_{L}为tanh函数时

              Z_{L}=\frac{e^{h_{L}}-e^{-h_{L}}}{e^{h_{L}}+e^{-h_{L}}}

               \frac{\partial J}{\partial h_{L}}=\frac{\partial J}{\partial Z_{L}}\frac{dZ_{L}}{dh_{L}}=\frac{\partial J}{\partial Z_{L}}\frac{4}{(e^{h_{L}}+e^{-h_{L}})^{2}}=\frac{\partial J}{\partial Z_{L}}[1-(\frac{e^{h_{L}}-e^{-h_{L}}}{e^{h_{L}}+e^{-h_{L}}})^{2}]=\frac{\partial J}{\partial Z_{L}}[1-Z_{L}^{2}]    

      3.f_{L}为relu函数时

               Z_{L}=relu(h_{L})=\left\{\begin{matrix} 0 ,&h_{L}\leq =0 \\ h_{L},& h_{L}> 0 \end{matrix}\right.

               \frac{\partial J}{\partial h_{L}}=\frac{\partial J}{\partial Z_{L}}\frac{dZ_{L}}{dh_{L}}=\left\{\begin{matrix} 0, & h_{L}\leq 0\\ \frac{\partial J}{\partial Z_{L}},& h_{L}> 0 \end{matrix}\right.

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值