本章我做的矩阵乘法的地方与这本书后面刚好相反,我这地方答题用的矩阵乘相当于书本中交换转置相乘
目录
文章目录
Ch 01 Using neural nets to recognize handwritten digits
在线书籍http://neuralnetworksanddeeplearning.com/chap1.html
Sigmoid neurons
- Sigmoid neurons simulating perceptrons, part I:
Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive constant, c > 0 c>0 c>0. Show that the behaviour of the network doesn’t change.
证明: 由文章中公式(2) o u t p u t = { 0 i f ∑ i w i x i + b ≤ 0 1 i f ∑ i w i x i + b > 0 output = \left\{ \begin{array}{rcl} 0 & & {if \sum_i{w_i x_i } + b \leq 0}\\ 1 & & {if \sum_i{w_i x_i} + b > 0}\\ \end{array} \right. output={
01if∑iwixi+b≤0if∑iwixi+b>0
因此对任意感知元,参数 w w w与偏置 b b b增大 c > 0 c>0 c>0倍,则每个感知元输出为 o u t p u t = { 0 i f ∑ i c ⋅ w i x i + c ⋅ b ≤ 0 1 i f ∑ i c ⋅ w i x i + c ⋅ b > 0 output = \left\{ \begin{array}{rcl} 0 & & {if \sum_i{c \cdot w_i x_i } + c \cdot b \leq 0}\\ 1 & & {if \sum_i{c \cdot w_i x_i} + c \cdot b > 0}\\ \end{array} \right. output={
01if∑ic⋅wixi+c⋅b≤0if∑ic⋅wixi+c⋅b>0
由于 c > 0 c>0 c>0,且 ∑ i c ⋅ w i x i + c ⋅ b = c ⋅ ( ∑ i w i x i + b ) \sum_i{c \cdot w_i x_i } + c \cdot b = c \cdot(\sum_i{w_i x_i } + b) ∑ic⋅wixi+c⋅b=c⋅(∑iwixi+b),若 ∑ i w i x i + b > 0 \sum_i{w_i x_i } + b > 0 ∑iwixi+b>0,则 c ⋅ ( ∑ i w i x i + b ) > 0 c \cdot(\sum_i{w_i x_i } + b) > 0 c⋅(∑iwixi+b)>0,若 c ⋅ ( ∑ i w i x i + b ) ≤ 0 c \cdot(\sum_i{w_i x_i } + b) \le 0 c⋅(∑iwixi+b)≤0 则 c ⋅ ( ∑ i w i x i + b ) ≤ 0 c \cdot(\sum_i{w_i x_i } + b) \le 0 c⋅(∑iwixi+b)≤0,所以对于每个感知元输出相同。
- Sigmoid neurons simulating perceptrons, part II
Suppose we have the same setup as the last problem - a network of perceptrons. Suppose also that the overall input to the network of perceptrons has been chosen. We won’t need the actual input value, we just need the input to have been fixed. Suppose the weights and biases are such that w ⋅ x + b ≠ 0 w⋅x+b≠0 w⋅x+b=0 for the input x to any particular perceptron in the network. Now replace all the perceptrons in the network by sigmoid neurons, and multiply the weights and biases by a positive constant c > 0 c>0 c>0. Show that in the limit as c → ∞ c→∞ c→∞ the behaviour of this network of sigmoid neurons is exactly the same as the network of perceptrons. How can this fail when w ⋅ x + b = 0 w⋅x+b=0 w⋅x+b=0 for one of the perceptrons?
**证明:**对于每个感知元,
o u t p u t = { 0 i f ∑ i w i x i + b ≤ 0 1 i f ∑ i w i x i + b > 0 output = \left\{ \begin{array}{rcl} 0 & & {if \sum_i{w_i x_i } + b \leq 0}\\ 1 & & {if \sum_i{w_i x_i} + b > 0}\\ \end{array} \right. output={ 01if∑iwixi+b≤0if∑iwixi+b>0
由于 c → ∞ c \rightarrow \infty c→∞,当 ∑ i w i x i + b < 0 \sum_i{w_i x_i } + b < 0 ∑iwixi+b<0时, c ⋅ ( ∑ i w i x i + b ) → − ∞ c \cdot (\sum_i{w_i x_i } + b ) \rightarrow -\infty c⋅(∑iwixi+b)→−∞, σ ( x ) = 1 1 + e − ( ∑ i w i x i + b ) → 0 \sigma(x) = \frac{1}{1+e^{-(\sum_i{w_i x_i } + b)}} \rightarrow 0 σ(x)=1+e−(∑iwixi+b)1→0,同理,当 ∑ i w i x i + b > 0 \sum_i{w_i x_i } + b > 0 ∑iwixi+b>0时, c ⋅ ( ∑ i w i x i + b ) → + ∞ c \cdot (\sum_i{w_i x_i } + b ) \rightarrow +\infty c⋅(∑iwixi+b)→+∞, σ ( x ) = 1 1 + e − ( ∑ i w i x i + b ) → 1 \sigma(x) = \frac{1}{1+e^{-(\sum_i{w_i x_i } + b)}} \rightarrow 1 σ(x)=1+e−(∑iwixi+b)1→1,与感知元输出结果相同。而当 ∑ i w i x i + b = 0 \sum_i{w_i x_i } + b = 0 ∑iwixi+b=0时, c ⋅ ( ∑ i w i x i + b ) = 0 c \cdot (\sum_i{w_i x_i } + b ) = 0 c⋅(∑iwixi+b)=0, σ ( x ) = 1 1 + e − ( ∑ i w i x i + b ) = 1 2 \sigma(x) = \frac{1}{1+e^{-(\sum_i{w_i x_i } + b)}} = \frac{1}{2} σ(x)=1+e−(∑iwixi+b)1=21,与感知元输出不同。
A simple network to classify handwritten digits
- There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 3 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99 0.99 0.99, and incorrect outputs have activation less than 0.01 0.01 0.01.
**证明:**最后一层为sigmoid函数输出,因此神经元输出值为 0 0 0或 1 1 1,我们将做后一层 4 4 4个神经元组成的二进制向量,如下表所示。
|数字| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|-| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
|编码| 0000 | 0001 | 0010 | 0011 | 0100 | 0101 | 0110 | 0111 | 1000 | 1001 |
原输出层期望输出如下表,
| 数字 | 编码 |
| :–: | :–: |
| 0 | 1000000000 |
| 1 | 0100000000 |
| 2 | 0010000000 |
| 3 | 0001000000 |
| 4 | 0000100000 |
| 5 | 0000010000 |
| 6 | 0000001000 |
| 7 | 0000000100 |
| 8 | 0000000010 |
| 9 | 0000000001 |
因此 W = [ 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 ] W = \left[\begin{matrix} 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 1\\ 0 & 1 & 0 & 0\\ 0 & 1 & 0 & 1\\ 0 & 1 & 1 & 0\\ 0 & 1 & 1 & 1\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 1 \end{matrix} \right] W=