深度学习基础(二)Backpropagation Algorithm 分类: ...

我们知道神经网络有forward propagation,很自然会想那有没有Backpropagation,结果还真有。

forward progation相当于神经网络额一次初始化,但是这个神经网络还缺少训练,而Backpropagation Algorithm就是用来训练神经网络的。

Network331.png

假设现在又m组训练集

\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}

代价函数为:

单个神经元的

\begin{align}J(W,b; x,y) = \frac{1}{2} \left\| h_{W,b}(x) - y \right\|^2.\end{align}

神经网络的:

\begin{align}J(W,b)&= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right]                       + \frac{\lambda}{2} \sum_{l=1}^{n_l-1} \; \sum_{i=1}^{s_l} \; \sum_{j=1}^{s_{l+1}} \left( W^{(l)}_{ji} \right)^2 \\&= \left[ \frac{1}{m} \sum_{i=1}^m \left( \frac{1}{2} \left\| h_{W,b}(x^{(i)}) - y^{(i)} \right\|^2 \right) \right]                       + \frac{\lambda}{2} \sum_{l=1}^{n_l-1} \; \sum_{i=1}^{s_l} \; \sum_{j=1}^{s_{l+1}} \left( W^{(l)}_{ji} \right)^2\end{align}

现在用经典的梯度下降法求解:

\begin{align}W_{ij}^{(l)} &= W_{ij}^{(l)} - \alpha \frac{\partial}{\partial W_{ij}^{(l)}} J(W,b) \\b_{i}^{(l)} &= b_{i}^{(l)} - \alpha \frac{\partial}{\partial b_{i}^{(l)}} J(W,b)\end{align}

其中

\begin{align}\frac{\partial}{\partial W_{ij}^{(l)}} J(W,b) &=\left[ \frac{1}{m} \sum_{i=1}^m \frac{\partial}{\partial W_{ij}^{(l)}} J(W,b; x^{(i)}, y^{(i)}) \right] + \lambda W_{ij}^{(l)} \\\frac{\partial}{\partial b_{i}^{(l)}} J(W,b) &=\frac{1}{m}\sum_{i=1}^m \frac{\partial}{\partial b_{i}^{(l)}} J(W,b; x^{(i)}, y^{(i)})\end{align}

我们引入误差项\delta^{(l)}_i代表第l层的第i个单元对于整个我们整个神经网络输出误差的贡献大小

还记得我们前面提到

\textstyle z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i  

a^{(l)}_i = f(z^{(l)}_i)



Backpropagation algorithm:

  1. Perform a feedforward pass, computing the activations for layers L2L3, and so on up to the output layer L_{n_l}.
  2. For each output unit i in layer nl (the output layer), set
    \begin{align}\delta^{(n_l)}_i= \frac{\partial}{\partial z^{(n_l)}_i} \;\;        \frac{1}{2} \left\|y - h_{W,b}(x)\right\|^2 = - (y_i - a^{(n_l)}_i) \cdot f'(z^{(n_l)}_i)\end{align}
  3. For l = n_l-1, n_l-2, n_l-3, \ldots, 2
    For each node  i in layer  l, set
    \delta^{(l)}_i = \left( \sum_{j=1}^{s_{l+1}} W^{(l)}_{ji} \delta^{(l+1)}_j \right) f'(z^{(l)}_i)
  4. Compute the desired partial derivatives, which are given as:
    \begin{align}\frac{\partial}{\partial W_{ij}^{(l)}} J(W,b; x, y) &= a^{(l)}_j \delta_i^{(l+1)} \\\frac{\partial}{\partial b_{i}^{(l)}} J(W,b; x, y) &= \delta_i^{(l+1)}.\end{align}

The algorithm can then be written:

  1. Perform a feedforward pass, computing the activations for layers \textstyle L_2\textstyle L_3, up to the output layer \textstyle L_{n_l}, using the equations defining the forward propagation steps
  2. For the output layer (layer \textstyle n_l), set
    \begin{align}\delta^{(n_l)}= - (y - a^{(n_l)}) \bullet f'(z^{(n_l)})\end{align}
  3. For \textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2
    Set
    \begin{align}                 \delta^{(l)} = \left((W^{(l)})^T \delta^{(l+1)}\right) \bullet f'(z^{(l)})                 \end{align}
  4. Compute the desired partial derivatives:
    \begin{align}\nabla_{W^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} (a^{(l)})^T, \\\nabla_{b^{(l)}} J(W,b;x,y) &= \delta^{(l+1)}.\end{align}


还BP算法中很重要的是中间隐层误差项的推导,我们接下来特别地详细研究一下:

我们假设代价函数为

E = \tfrac{1}{2}(t - y)^2,

其中

t  是训练集的输出线
y 是实际的输出项

对于每个单元的输出项:

o_{j} = \varphi(\mbox{net}_{j}) = \varphi\left(\sum_{k=1}^{n}w_{kj}x_k\right)

\varphi(z) = \frac{1}{1+e^{-z}}

我们现在也求解

\frac{\partial E}{\partial w_{ij}} = \frac{\partial E}{\partial o_j} \frac{\partial o_j}{\partial\mathrm{net_j}} \frac{\partial \mathrm{net_j}}{\partial w_{ij}}

其中

\frac{\partial \mathrm{net_j}}{\partial w_{ij}} = \frac{\partial}{\partial w_{ij}}\left(\sum_{k=1}^{n}w_{kj}x_k\right) = x_i

\frac{\partial o_j}{\partial\mathrm{net_j}} = \frac {\partial}{\partial \mathrm{net_j}}\varphi(\mathrm{net_j}) = \varphi(\mathrm{net_j})(1-\varphi(\mathrm{net_j}))

\frac{\partial E}{\partial o_j} = \frac{\partial E}{\partial y} = \frac{\partial}{\partial y} \frac{1}{2}(t - y)^2 = y - t

特别地,如果是隐含层j,那么

\frac{\partial E(o_j)}{\partial o_j} = \frac{\partial E(\mathrm{net}_u, \mathrm{net}_v, \dots, \mathrm{net}_w)}{\partial o_j}

\frac{\partial E}{\partial o_j} = \sum_{l \in L} \left(\frac{\partial E}{\partial \mathrm{net}_l}\frac{\partial \mathrm{net}_l}{\partial o_j}\right) = \sum_{l \in L} \left(\frac{\partial E}{\partial o_{l}}\frac{\partial o_{l}}{\partial \mathrm{net}_l}w_{jl}\right)

综上可以得到:

\dfrac{\partial E}{\partial w_{ij}} = \delta_{j} x_{i}
\delta_{j} = \frac{\partial E}{\partial o_j} \frac{\partial o_j}{\partial\mathrm{net_j}} = \begin{cases}(o_{j}-t_{j})\varphi(\mbox{net}_{j})(1-\varphi(\mbox{net}_{j})) & \mbox{if } j \mbox{ is an output neuron,}\\(\sum_{l\in L} \delta_{l} w_{jl})\varphi(\mbox{net}_{j})(1-\varphi(\mbox{net}_{j}))  & \mbox{if } j \mbox{ is an inner neuron.}\end{cases}
我们可以看到,BP算法的误差由输出向输入方向累计传递。这使得如果中间某一层单元的误差出现计算错误时,会影响最后收敛的结果,因此最好在求解过程中进行梯度检验(gradient check),也就是在参数附近选两个点用差分逼近偏导数,如果差分法求出的结果与偏导数相差太大,则很有可能是计算出错了。

参考资料:

http://en.wikipedia.org/wiki/Backpropagation

http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial

版权声明:本文为博主原创文章,未经博主允许不得转载。

转载于:https://www.cnblogs.com/learnordie/p/4656967.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值