Big picture on why we need randomness in stochastic algorithms
- randomness during initialization: as the structure of the search space is unknown
- randomness during the progression of the search: avoid getting stuck in local optima
– This is achieved in SGD or mini-batch gradient descent
备:examples of stochastic algorithms – stochastic gradient descent, genetic algorithms, simulated annealing
Principal and methods for random initialization
Principal: initialize the weights of neural networks to some random but close-to-zero values, such as in [0,0.1].
Methods: see https://keras.io/initializers/.
Reason for effectiveness of random initialization
If two hidden units with the same activation fun