# 吴恩达机器学习：神经网络 | 反向传播算法

### 代价函数

$J\left(\mathrm{\Theta }\right)=-\frac{1}{m}\sum _{i=1}^{m}\sum _{k=1}^{K}\left({y}_{k}^{\left(i\right)}log\left({h}_{\mathrm{\Theta }}\left({x}^{\left(i\right)}\right){\right)}_{k}+\left(1-{y}_{k}^{\left(i\right)}\right)log\left(1-\left({h}_{\mathrm{\Theta }}\left({x}^{\left(i\right)}\right){\right)}_{k}\right)\right)\phantom{\rule{0ex}{0ex}}+\frac{\lambda }{2m}\sum _{l=1}^{L-1}\sum _{i=1}^{{s}_{l}}\sum _{j=1}^{{s}_{l+1}}\left({\mathrm{\Theta }}_{ji}^{\left(l\right)}{\right)}^{2}$

### 梯度计算

$\frac{\mathrm{\partial }J\left(\theta \right)}{\mathrm{\partial }{\theta }_{j}}\approx \frac{J\left({\theta }_{1},\cdots ,{\theta }_{j}+ϵ,\cdots ,{\theta }_{n}\right)-J\left({\theta }_{1},\cdots ,{\theta }_{j}-ϵ,\cdots ,{\theta }_{n}\right)}{2ϵ}$

### 链式法则

$\frac{\mathrm{\partial }z}{\mathrm{\partial }x}=\frac{\mathrm{\partial }z}{\mathrm{\partial }u}\frac{\mathrm{\partial }u}{\mathrm{\partial }x}+\frac{\mathrm{\partial }z}{\mathrm{\partial }v}\frac{\mathrm{\partial }v}{\mathrm{\partial }x}\phantom{\rule{0ex}{0ex}}\frac{\mathrm{\partial }z}{\mathrm{\partial }y}=\frac{\mathrm{\partial }z}{\mathrm{\partial }u}\frac{\mathrm{\partial }u}{\mathrm{\partial }y}+\frac{\mathrm{\partial }z}{\mathrm{\partial }v}\frac{\mathrm{\partial }v}{\mathrm{\partial }y}$

$\frac{\mathrm{\partial }z}{\mathrm{\partial }x}=\frac{\mathrm{\partial }z}{\mathrm{\partial }p}\frac{\mathrm{\partial }p}{\mathrm{\partial }u}\frac{\mathrm{\partial }u}{\mathrm{\partial }x}+\frac{\mathrm{\partial }z}{\mathrm{\partial }p}\frac{\mathrm{\partial }p}{\mathrm{\partial }v}\frac{\mathrm{\partial }v}{\mathrm{\partial }x}+\frac{\mathrm{\partial }z}{\mathrm{\partial }q}\frac{\mathrm{\partial }q}{\mathrm{\partial }u}\frac{\mathrm{\partial }u}{\mathrm{\partial }x}+\frac{\mathrm{\partial }z}{\mathrm{\partial }q}\frac{\mathrm{\partial }q}{\mathrm{\partial }v}\frac{\mathrm{\partial }v}{\mathrm{\partial }x}$

$\frac{\mathrm{\partial }z}{\mathrm{\partial }x}=\frac{\mathrm{\partial }z}{\mathrm{\partial }u}\frac{\mathrm{\partial }u}{\mathrm{\partial }x}+\frac{\mathrm{\partial }z}{\mathrm{\partial }v}\frac{\mathrm{\partial }v}{\mathrm{\partial }x}$

$\frac{\mathrm{\partial }y}{\mathrm{\partial }t}=\sum _{i=1}^{n}\frac{\mathrm{\partial }y}{\mathrm{\partial }{z}_{i}}\frac{\mathrm{\partial }{z}_{i}}{\mathrm{\partial }t}$

### 公式推导

${\delta }_{j}^{l}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{j}^{l}}$

${\delta }_{j}^{L}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{j}^{L}}$

${\delta }_{j}^{L}=\sum _{k=1}^{K}\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{k}^{L}}\frac{\mathrm{\partial }{a}_{k}^{L}}{\mathrm{\partial }{z}_{j}^{L}}$

$\begin{array}{}\text{(1)}& {\delta }_{j}^{L}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{j}^{L}}\frac{\mathrm{\partial }{a}_{j}^{L}}{\mathrm{\partial }{z}_{j}^{L}}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{a}_{j}^{L}}{g}^{\prime }\left({z}_{j}^{L}\right)={a}_{j}^{L}-{y}_{j}^{L}\end{array}$

${\delta }_{j}^{l}=\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{j}^{l}}$

${\delta }_{j}^{l}=\sum _{k=1}^{{s}_{l+1}}\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{k}^{l+1}}\frac{\mathrm{\partial }{z}_{k}^{l+1}}{\mathrm{\partial }{z}_{j}^{l}}=\sum _{k=1}^{{s}_{l+1}}{\delta }_{k}^{l+1}\frac{\mathrm{\partial }{z}_{k}^{l+1}}{\mathrm{\partial }{z}_{j}^{l}}$

${z}_{k}^{l+1}=\sum _{p=1}^{{s}_{l}}{\mathrm{\Theta }}_{kp}^{l}g\left({z}_{p}^{l}\right)+{b}_{k}^{l}$

$\frac{\mathrm{\partial }{z}_{k}^{l+1}}{\mathrm{\partial }{z}_{j}^{l}}={\mathrm{\Theta }}_{kj}^{l}{g}^{\prime }\left({z}_{j}^{l}\right)$

$\begin{array}{}\text{(2)}& {\delta }_{j}^{l}=\sum _{k=1}^{{s}_{l+1}}{\mathrm{\Theta }}_{kj}^{l}{\delta }_{k}^{l+1}{g}^{\prime }\left({z}_{j}^{l}\right)=\sum _{k=1}^{{s}_{l+1}}{\mathrm{\Theta }}_{kj}^{l}{\delta }_{k}^{l+1}{a}_{j}^{l}\left(1-{a}_{j}^{l}\right)\end{array}$

$\frac{\mathrm{\partial }J}{{\mathrm{\Theta }}_{ij}^{l}}=\sum _{k=1}^{{s}_{l+1}}\frac{\mathrm{\partial }J}{\mathrm{\partial }{z}_{k}^{l+1}}\frac{\mathrm{\partial }{z}_{k}^{l+1}}{\mathrm{\partial }{\mathrm{\Theta }}_{ij}^{l}}=\sum _{k=1}^{{s}_{l}}{\delta }_{k}^{l+1}\frac{\mathrm{\partial }{z}_{k}^{l+1}}{\mathrm{\partial }{\mathrm{\Theta }}_{ij}^{l}}$

${z}_{k}^{l+1}=\sum _{p=1}^{{s}_{l}}{\mathrm{\Theta }}_{kp}^{l}g\left({z}_{p}^{l}\right)+{b}_{k}^{l}$

$\begin{array}{}\text{(3)}& \frac{\mathrm{\partial }J}{{\mathrm{\Theta }}_{ij}^{l}}=g\left({z}_{j}^{l}\right){\delta }_{i}^{l+1}={a}_{j}^{l}{\delta }_{i}^{l+1}\end{array}$

### 反向传播

1. 对于所有的 $l,i,j$$l, i, j$ 初始化 ${\mathrm{\Delta }}_{ij}^{l}=0$$\Delta_{ij}^l = 0$
2. 对于 $m$$m$ 组训练数据，$k$$k$$1$$1$ 取到 $m$$m$
• ${a}^{1}={x}^{\left(k\right)}$$a^1 = x^{(k)}$
• 前向传播，计算各层激活向量 ${a}^{l}$$a^{l}$
• 使用 (1) 式，计算输出层误差 ${\delta }^{L}$$\delta^L$
• 使用 (2) 式，计算其它层误差 ${\delta }^{L-1},{\delta }^{L-2},...,{\delta }^{2}$$\delta^{L-1},\delta^{L-2},...,\delta^{2}$
• 使用 (3) 式，累加 ${\mathrm{\Delta }}_{ij}^{l}$$\Delta_{ij}^l$${\mathrm{\Delta }}_{ij}^{l}:={\mathrm{\Delta }}_{ij}^{l}+{a}_{j}^{l}{\delta }_{i}^{l+1}$$\Delta_{ij}^l:=\Delta_{ij}^l+a_j^l\delta_i^{l+1}$
3. 计算梯度矩阵：
4. 更新权值 ${\mathrm{\Theta }}^{l}:={\mathrm{\Theta }}^{l}+\alpha {D}^{l}$$\Theta^l := \Theta^l+ \alpha D^l$

### 权值初始化

So~，这周学完了 神经网络 和它的学习算法，大家有没有觉得很神奇呢？