【深度学习入门】NNDL学习笔记（二）

最新推荐文章于 2023-10-30 15:21:25 发布

AsanoKiri

最新推荐文章于 2023-10-30 15:21:25 发布

阅读量367

点赞数

文章标签：深度学习

本文链接：https://blog.csdn.net/AsanoKiri/article/details/95033053

版权

Chapter2 How the backpropagation algorithm works?

反向传播 Backpropagation: a fast algorithm for computing gradients.

Warm up: a fast matrix-based approach to computing the output from a neural network

$a^l_j = \sigma(\sum_kw^l_{jk}a^{l-1}_k+b^l_j)\Rightarrow a^l=\sigma(w^l\cdot a^{l-1}+b^l)$

weighted input: $z^l = w^l\cdot a^{l-1}+b^l$

elementwise multiplication: $s \odot t$

The two assumptions we need about the cost function

1. The cost function can be written as an average $C=\frac{1}{n}\sum_x C_x$ over cost functions Cx for individual training examples, x.

2. The cost can be written as a function of the outputs from the neural network $C_x = C_x(a^l)$

The four fundamental equations behind backpropagation

The error in the jth neuron in the lth layer: $\delta^l_j \equiv\frac{\partial C}{\partial z^l_j}$

For all σ(.)：

Equation1: the error in the output layer $\delta ^L$

$\delta^L_j=\frac{\partial C}{\partial a^L_j}\sigma(z^L_j)'$ ,

matrix form: $\delta^L = \nabla_a C \odot \sigma' (z^L)$

It is easily computed:

因为 $\sigma = \frac{1}{1+e^z}$ , 计算出z 就可计算出 $\sigma(z^L_j)'$ ；

If we're using the quadratic cost function then $C=\frac{\sum_j(y_j-a_j^L)^2}{2}$ , $\frac{\partial C}{\partial a^L_j}=(a^L_j-y^L)$

matrix form $\delta^L = (a^L-y) \odot \sigma' (z^L)$

Equation2: the error $\delta^l$ in terms of the next layer, $\delta^{l+1}$

$\delta^l = ((w^{l+1})^T\delta^{l+1})\odot \delta'(z^l)$ ,

$\begin{align*} &\Rightarrow dC=(w^{l+1})^T \frac{\partial C}{\partial z^{l+1}}d(a^{l})\\ &\Rightarrow w^{l+1}\frac{\partial C}{\partial a^{l}}= \frac{\partial C}{\partial z^{l+1}} \end{align*}$

Equation3: An equation for the rate of change of the cost with respect to any bias in the network.

$\frac{\partial C}{\partial b^l_j}=\delta ^l_j$

Equation4: An equation for the rate of change of the cost with respect to any weight in the network

$\frac{\partial C}{\partial w^l_{jk}} = a^{l-1}_k\delta^l_j$ , $\frac{\partial C}{\partial w}=a_{in}\delta_{out}$

A weight or bias in the final layer will learn slowly if the output neuron is either low activation (≈0) or high activation (≈1).

If the input neuron is low-activation, or if the output neuron has saturated, weights and biases will learn very slowly.

The backpropagation algorithm

Exercise

1.Suppose we modify a single neuron in a feedforward network so that the output from the neuron is given by f(∑jwjxj+b). How should we modify the backpropagation algorithm in this case?

For BP1 & BP2 just exchange σ for f while BP3 & BP4 will not change.

2.Linear: Suppose we replace the usual non-linear σ function with σ(z)=z throughout the network. Rewrite the backpropagation algorithm for this case.

Change every σ to f(z) = x.

The code for backpropagation

For a mini-batch of size m:

Fully matrix-based approach： It's possible to modify the backpropagation algorithm so that it computes the gradients for all training examples in a mini-batch simultaneously，taking full advantage of modern libraries for linear algebra

The big picture

$\Delta C\approx\frac{\partial C}{\partial w^l_jk} \Delta w^l_jk$

This suggests that a possible approach to computing ∂C/∂wljk is to carefully track how a small change in wljk propagates to cause a small change in C.

$\Delta a^l_j = \frac{\partial a^l_j}{\partial w^l_{jk}} \Delta w^l_{jk}$

For a single neuron in the next layer:

$\Delta a^{l+1}_q \approx \frac{\partial a^{l+1}_q}{\partial a^l_j}\Delta a^l_j$

Imagine a path from w to C:

There are many paths from w to C:

What the equation tells us is that every edge between two neurons in the network is associated with a rate factor which is just the partial derivative of one neuron's activation with respect to the other neuron's activation.

AsanoKiri

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【深度学习入门】NNDL学习笔记（二）

Chapter2 How the backpropagation algorithm works?反向传播 Backpropagation: a fast algorithm for computing gradients.Warm up: a fast matrix-based approach to computing the output from a neural netwo...
复制链接

扫一扫