定义一个function
Goodness of function
Pick the best function
Deeper usually does not imply better
Vanishing gradient problem
Input 改参数对output 变化很小导致small gradient
改进:做linear unit(ReLU)
Reason:
- Fast to compute
- Biological reason
- Infinite sigmoid with different biases
- Vanishing gradient problem
A thinner linear network
Maxout:ReLU is a special cases of Maxout
Learnable activation function - Activation function in maxout network can be any piecewise linear convex function
- How many pieces depending on how many elements ina group
Maxout-Training - Given a training data x, we know which a would be the max
- Train this thin and linear network
- 不同input max不同,每个example都会被pooling
RmsPROP与adgrad不同,少了平方
Momentum
Movement:movement of last step minus gradient at present