# BP网络反向传播推导

COURSERA:多层神经网络反向传播的推导

### 多层神经网络反向传播的推导

BP网络其实就是多个逻辑斯特模型的组合，逻辑斯特模型的梯度下降反向传播请看上一篇。

$a^l = \sigma(z^l) = \sigma(W^la^{l-1} + b^l)$
$a^l$是第L层的输出，列向量, $W^l$就是该层前面的权重，矩阵，$b^l$是该层的偏置，列向量。

#### 损失函数

$J(W,b,x,y) = \frac{1}{2}||a^L-y||_2^2$

#### 计算输出层第L层的梯度

$a^L = \sigma(z^L) = \sigma(W^La^{L-1} + b^L)$

$J(W,b,x,y) = \frac{1}{2}||a^L-y||_2^2 = \frac{1}{2}|| \sigma(W^La^{L-1} + b^L)-y||_2^2$

$\frac{\partial J(W,b,x,y)}{\partial W^L} =\frac{\partial J(W,b,x,y)}{\partial z^L}\frac{\partial z^L}{\partial W^L} = [(a^L-y) \odot \sigma^{'}(z^L)](a^{L-1})^T$

$\frac{\partial J(W,b,x,y)}{\partial b^L} =(a^L-y)\odot \sigma^{'}(z^L)$

#### 计算非输出层任意第$l$层的梯度

$\delta^L = \frac{\partial J(W,b,x,y)}{\partial z^L} = (a^L-y)\odot \sigma^{'}(z^L)$

$\delta^l =\frac{\partial J(W,b,x,y)}{\partial z^l} = \frac{\partial J(W,b,x,y)}{\partial z^L}(\frac{\partial z^L}{\partial z^{L-1}}\frac{\partial z^{L-1}}{\partial z^{L-2}}...\frac{\partial z^{l+1}}{\partial z^{l}})$

$\delta^{l} = \frac{\partial J(W,b,x,y)}{\partial z^l} = \frac{\partial J(W,b,x,y)}{\partial z^{l+1}} \frac{\partial z^{l+1}}{\partial z^{l}} =\delta^{l+1}\frac{\partial z^{l+1}}{\partial z^{l}}$

$\frac{\partial z^{l+1}}{\partial z^{l}} = {(W^{l+1})}^T\odot\sigma^{'}(z^l)$

$\delta^{l} = (\frac{\partial z^{l+1}}{\partial z^{l}})^T\frac{\partial J(W,b,x,y)}{\partial z^{l+1}} =(W^{l+1})^T\delta^{l+1}\odot \sigma^{'}(z^l)$

$\frac{\partial J(W,b,x,y)}{\partial W^l} = \frac{\partial J(W,b,x,y)}{\partial z^l}\frac{\partial z^l}{\partial W^l} = \delta^{l}(a^{l-1})^T$

$\frac{\partial J(W,b,x,y)}{\partial b^l} = \frac{\partial J(W,b,x,y)}{\partial z^l}\frac{\partial b^l}{\partial W^l} = \delta^{l}$

#### 参数更新

$W^l = W^l - α\frac{\partial J(W,b,x,y)}{\partial W^l}$

$b^l = b^l - α\frac{\partial J(W,b,x,y)}{\partial b^l}$

,x,y)}{\partial W^l}


$b^l = b^l - α\frac{\partial J(W,b,x,y)}{\partial b^l}$

©️2019 CSDN 皮肤主题: 大白 设计师: CSDN官方博客