优化与深度学习
BarryZhao000
这个作者很懒,什么都没留下…
展开
-
3. Adaptive Learning Rate - Variants of SGD
在讲动量法的时候,我们看到当不同方向上的梯度相差较大时,会出现振荡的情况。动量法依赖指数加权平均使得自变量的更新方向更加一致,从而降低loss divergence的情况。 其它调整学习率的算法,如AdaGrad, Adam和RMS...原创 2020-03-17 02:37:49 · 110 阅读 · 0 评论 -
2. Gradient Clipping for Gradient Exploding
Gradient Exploding: When parameters approch a cliff region, the gradient update step can move the learner towards a very bad configuration (Divergence) Gradient Clipping: To address the presence...原创 2020-03-15 22:19:05 · 98 阅读 · 0 评论 -
1. SGD + Momentum for Oscillation and Plateau Problem
Momentum + SGD for Plateau Problem SGD method has trouble when navigating areas where the curvature is steeper in one dimensions and is flat in another directions. Ends up oscillating around the sl...原创 2020-03-15 22:19:20 · 101 阅读 · 0 评论