[深度学习论文笔记][Weight Initialization] Exact solutions to the nonlinear dynamics of learning in deep lin

本文探讨了深度学习中权重初始化对于学习动态的非线性影响。研究指出,最佳学习率与层数成反比,并提出了使用随机正交矩阵作为初始化策略,以保持层间统计特性,加快学习速度。同时,相较于高斯矩阵,正交矩阵能精确保持所有向量的范数,避免学习受阻。在非线性情况下,理想的初始化要求雅可比矩阵的奇异值集中在1附近。
摘要由CSDN通过智能技术生成
Saxe, Andrew M., James L. McClelland, and Surya Ganguli. “Exact solutions to the nonlinear dynamics of learning in deep linear neural net-

works.” arXiv preprint arXiv:1312.6120 (2013). [Citations: 97].


1 General Learning Dynamics of Gradient Descent
[Timescale of Learning]
• Deep net learning time depends on optimal (largest stable) learning
rate.
• The optimal learning rate can be estimated by taking inverse of max-
imal eigenvalue of Hessian over the region of interest.

• Optimal learning rate scales as O(1/L), where L is # of layers.


2 Finding Good Weight Initializations
[Motivations] Unsupervised pretraining speeds up the optimization and act as a special regularizer towards solutions with better generalization performance.
• Unsupervised pretraining finds the special class of orthogonalized, decoupled initial conditions.
• That allow for rapid supervised
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值