7月24日深度学习笔记——Optimization


前言

本文为7月24日深度学习笔记,分为五个章节:

  • Towards Improving Adam:AMSGrad、AdaBound;
  • Towards Improving SGDM:Cyclical LR、SGDR、One-cycle LR;
  • Combination of Adam and SGDM:SWATS、RAdam;
  • 总结;
  • 参考文献。

一、Towards Improving Adam

  • fast training;
  • large generalization gap;
  • unstable.

1、AMSGrad

The convergence issues can be fixed by endowing such algorithms with “long-term memory” of past gradients, and propose new variants of the ADAM algorithm which not only fix the convergence issues but often also lead to improved empirical performance. [Reddi2019]
1
2
θ t = θ t − 1 − η v ^ t + ε m t v ^ t = m a x ( v ^ t − 1 , v t ) \theta_t=\theta_{t-1}-\frac{\eta}{\sqrt{\hat v_t}+\varepsilon}m_t \\ \hat v_t=max(\hat v_{t-1}, v_t) θt=θt1v^t +εηmtv^t=max(v^t1,vt)

2、AdaBound

In this paper, Luo et al. demonstrate that extreme learning rates can lead to poor performance. They provide new variants of ADAM and AMSGRAD, called ADABOUND and AMSBOUND respectively, which employ dynamic bounds onlearning rates to achieve a gradual and smooth transition from adaptive methods to SGD and give a theoretical proof of convergence. [Luo2019]
3
4
5


二、Towards Improving SGDM

  • stable;
  • little generalization gap;
  • better convergence.

1、Cyclical LR

This paper describes a new method for setting the learning rate, named cyclical learning rates, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates. Instead of monotonically decreasing the learning rate, this method lets the learning rate cyclically vary between reasonable boundary values. Training with cyclical learning rates instead of fixed values achieves improved classification accuracy without a need to tune and often in fewer iterations. [Smith2015]
6

2、SGDR

In this paper, Loshchilov and Frank propose a simple warm restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks. [Loshchilov2016]
7

3、One-cycle LR

In this paper, Smith et al. describe a phenomenon, which we named “super-convergence”, where neural networks can be trained an order of magnitude faster than with standard training methods. One of the key elements of super-convergence is training with one learning rate cycle and a large maximum learning rate. [Smith2017]

8

三、Combination of Adam and SGDM

1、SWATS

A simple strategy which Switches from Adam to SGD when a triggering condition is satisfied. The condition we propose relates to the projection of Adam steps on the gradient subspace[Keskar2017].
9

2、RAdam

Liu et al. further propose Rectified Adam (RAdam), a novel variant of Adam, by introducing a term to rectify the variance of the adaptive learning rate.[Liu2019]
10

四、总结

11
12
13

五、参考文献

Keskar, N. S., & Socher, R. (2017). Improving generalization performance by switching
from adam to sgd. arXiv. Retrieved from https://arxiv.org/abs/1712.07628
doi: 10.48550/ARXIV.1712.07628

Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., & Han, J. (2019). On
the variance of the adaptive learning rate and beyond. arXiv. Retrieved from
https://arxiv.org/abs/1908.03265 doi: 10.48550/ARXIV.19

Loshchilov, I., & Hutter, F. (2016). Sgdr: Stochastic gradient descent with warm restarts. arXiv. Retrieved from https://arxiv.org/abs/1608.03983 doi:10.48550/ARXIV.1608.03983

Luo, L., Xiong, Y., Liu, Y., & Sun, X. (2019). Adaptive gradient methods with
dynamic bound of learning rate. arXiv. Retrieved from https://arxiv.org/
abs/1902.09843 doi: 10.48550/ARXIV.1902.09843

Reddi, S. J., Kale, S., & Kumar, S. (2019). On the convergence of adam and beyond.
arXiv. Retrieved from https://arxiv.org/abs/1904.09237 doi: 10.48550/
ARXIV.1904.09237

Smith, L. N. (2015). Cyclical learning rates for training neural networks. arXiv.
Retrieved from https://arxiv.org/abs/1506.01186 doi: 10.48550/ARXIV
.1506.01186

Smith, L. N., & Topin, N. (2017). Super-convergence: Very fast training of neural
networks using large learning rates. arXiv. Retrieved from https://arxiv.org/
abs/1708.07120 doi: 10.48550/ARXIV.17


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值