[深度学习论文笔记][Weight Initialization] Exact solutions to the nonlinear dynamics of learning in deep lin

最新推荐文章于 2020-05-04 18:34:12 发布

Hao_Zhang_Vision

最新推荐文章于 2020-05-04 18:34:12 发布

阅读量1.1k

点赞数

本文链接：https://blog.csdn.net/Hao_Zhang_Vision/article/details/52593076

版权

本文探讨了深度学习中权重初始化对于学习动态的非线性影响。研究指出，最佳学习率与层数成反比，并提出了使用随机正交矩阵作为初始化策略，以保持层间统计特性，加快学习速度。同时，相较于高斯矩阵，正交矩阵能精确保持所有向量的范数，避免学习受阻。在非线性情况下，理想的初始化要求雅可比矩阵的奇异值集中在1附近。

摘要由CSDN通过智能技术生成

Saxe, Andrew M., James L. McClelland, and Surya Ganguli. “Exact solutions to the nonlinear dynamics of learning in deep linear neural net-

works.” arXiv preprint arXiv:1312.6120 (2013). [Citations: 97].

1 General Learning Dynamics of Gradient Descent
[Timescale of Learning]
• Deep net learning time depends on optimal (largest stable) learning
rate.
• The optimal learning rate can be estimated by taking inverse of max-
imal eigenvalue of Hessian over the region of interest.

• Optimal learning rate scales as O(1/L), where L is # of layers.

2 Finding Good Weight Initializations
[Motivations] Unsupervised pretraining speeds up the optimization and act as a special regularizer towards solutions with better generalization performance.
• Unsupervised pretraining finds the special class of orthogonalized, decoupled initial conditions.
• That allow for rapid supervised