1. SGD + Momentum for Oscillation and Plateau Problem

最新推荐文章于 2022-07-21 15:21:00 发布

BarryZhao000

最新推荐文章于 2022-07-21 15:21:00 发布

阅读量102

点赞数

分类专栏：优化与深度学习

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_45583738/article/details/104860757

版权

优化与深度学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

SGD method has trouble when navigating areas where the curvature is steeper in one dimension and is flat in another direction.
- Ends up oscillating around the slopes and maks slow progress

可以看到，在同一位置上，目标标函数在垂直方向上得斜率得绝对值远大于在水平方向上的斜率。因此，给定学习步长，SGD迭代自变量时会使自变量在竖直方向上的移动幅度过大，甚至越过最优解。这样不断地振荡，会导致向最优解移动的效率变慢。　　
　期望让寻找最优解的曲线能够更加的平滑，在水平方向的速度更快: Momentum(动量)

Incorporation of Momentum
$\bm{v_t = βv_{t-1} + \eta_tg_t},$ $\beta\in[0,1); v_0=0$ and $g_t$ is the gradient at current x
$\bm{x_t=x_{t-1}-v_t}$
- if $\beta=0$ , it’s the normal SGD method
- $v_t=\beta v_{t-1}+(1-\beta)\frac{\eta_tg_t}{1-\beta}＝\bm{\beta^tv_0+(1-\beta)\sum_{i=0}^{t-i}\beta^{i}\frac{\eta _{t-i}g_{t-i}}{1-\beta}}$
- The weight on $\frac{\eta_{t-i}g_{t-i}}{1-\beta}=(1-\beta)\beta^i$ exponential decreases as i increases.
- $v_t$ is the eponentially weighted moving average of past v’s
- Let $n=\frac{1}{1-\beta},\beta=1-\frac{1}{n}$
  - $lim_{n->\infty}(1-\frac{1}{n})^{n}=\frac{1}{e}\approx0.3679$
  - If we treat this number as a small number, then we can ignore all terms including $\beta^{\frac{1}{1-\beta}}$ and higher terms when $\beta\rightarrow1.$ $\bm{\beta\in[0.9, 1)}$ .
    
    For example: if $\beta=0.95, \beta^{1-\beta}\approx20,v_t＝\beta^tv_0+(1-\beta)\sum_{i=0}^{19}\beta^{i}\frac{\eta _{t-i}g_{t-i}}{1-\beta}$
  - For mementum, the updated term for the x is approximated equal to the exponentially weighted moving average of previous $\frac{1}{1-\beta}$ updated terms ( $\eta_{t-i}g_{t-i}$ ) and then divided by $1-\beta$
Momentum
- Reduce updates along directions that changes gradients frequently
- Increase updates along directions that gradients are consistent
- Dampen oscillations

图转自知乎https://zhuanlan.zhihu.com/p/34240246

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。