Saxe, Andrew M., James L. McClelland, and Surya Ganguli. “Exact solutions to the nonlinear dynamics of learning in deep linear neural net-
[Timescale of Learning]
• Deep net learning time depends on optimal (largest stable) learning
rate.
• The optimal learning rate can be estimated by taking inverse of max-
imal eigenvalue of Hessian over the region of interest.
[Motivations] Unsupervised pretraining speeds up the optimization and act as a special regularizer towards solutions with better generalization performance.
• Unsupervised pretraining finds the special class of orthogonalized, decoupled initial conditions.
• That allow for rapid supervised
works.” arXiv preprint arXiv:1312.6120 (2013). [Citations: 97].
[Timescale of Learning]
• Deep net learning time depends on optimal (largest stable) learning
rate.
• The optimal learning rate can be estimated by taking inverse of max-
imal eigenvalue of Hessian over the region of interest.
• Optimal learning rate scales as O(1/L), where L is # of layers.
[Motivations] Unsupervised pretraining speeds up the optimization and act as a special regularizer towards solutions with better generalization performance.
• Unsupervised pretraining finds the special class of orthogonalized, decoupled initial conditions.
• That allow for rapid supervised