NN&DL
[.org](http://neuralnetworksanddeeplearning.com/chap3.html)
Using Neural Networks to Identify Handwritten Digits
- Sensor : Binary input and output
biase b ≡ −threshold , w & x : vector of weights and input
Sigmoid neurons:input form 0 to1,output = σ(w · x + b),σ(sigmoid function)
When z = w · x + b is large and positive, the output of the S neuron is approximately 1, and when z is a large negative number, exp(z) → ∞,σ(z) ≈ 0。
(output)- neural network
multilayer perceptrons or MLPs
feedforward neural networks : The output of the previous layer is the input of the next layer
recurrent neural networks - cost function
a is the expected output, n is
the number of training
Why introduce the quadratic cost? Isn't this a rather ad hoc choice?
Find the w,b which making C(w,b)minist. - the gradient descent algorithm
the gradient vector, v is a mulmember vector
∇C relates changes in vv to changes in C, just as we'd expect something called a gradient to do.
When ∆v = −η∇C, η is a small, positive parameter (known as the learning rate) try python
The training_data is a list of tuples (x, y) representing the training inputs and corresponding desired outputs. The variables epochs and mini_batch_size are what you'd expect - the number of epochs to train for, and the size of the mini-batches to use when sampling. eta is the learning rate, η.
epochs、mini_batch_size、eta:hyper-parametersbackpropagation
- Hadamard product : s ⊙ t : (s ⊙ t)j = sj tj 。
The four fundamental equations behind backpropagation
The backpropagation algorithm:
Improving the way neural networks learn
the cross-entropy cost function
Overfitting and regularization
technique 1: weight decay (L2 regularization. )