1. linear model complexity
Logistic Model is defined as: X*W + b = y
parameter W and b should be determined by optimization method.
X is 1 by 784. 784 = 28*28
W is 784 by 10
b is 1 by 10
so number of parameters is 784*10 + 10
2. Rectified Linear Unit (ReLU) and neutron networks
another activation function more like brain activation signal than sigmoid.
picture below shows a two layers neutron networks.
1.The first layer effectively consists of the set of weights and biases applied to X and passed through ReLUs. The output of this layer is fed to the next one, but is not observable outside the network, hence it is known as a hidden layer.
2.The second layer consists of the weights and biases applied to these intermediate outputs, followed by the softmax function to generate probabilities.
3. chain rule
chain rule is a concept in calculus and demonstrates the derivative of a function with a function as its input parameters.
it has efficient data pipeline and lots of data reuse.
4.back propagation
forward propagation computes output y
back propagation computes all derivatives of weight matrices.
then we can update weight by new_weight = weight - alpha*derivative_weight.
back propagation need two times memory and computation than forward propagation.
5. Deep learning networks
6.Early termination
在validation data 的准确度达到一定峰值时,要及时结束训练,来避免过拟合。
7. Regularization
8. Drop out