7月24日深度学习笔记——Optimization

Ashen_0nee

于 2022-07-24 10:35:51 发布

阅读量224

点赞数

文章标签：深度学习人工智能机器学习

本文链接：https://blog.csdn.net/Ashen_0nee/article/details/125951488

版权

文章目录

前言
一、Towards Improving Adam
- 1、AMSGrad
- 2、AdaBound
二、Towards Improving SGDM
三、Combination of Adam and SGDM
- 1、SWATS
- 2、RAdam
四、总结
五、参考文献

前言

本文为7月24日深度学习笔记，分为五个章节：

Towards Improving Adam：AMSGrad、AdaBound；
Towards Improving SGDM：Cyclical LR、SGDR、One-cycle LR；
Combination of Adam and SGDM：SWATS、RAdam；
总结；
参考文献。

一、Towards Improving Adam

fast training；
large generalization gap;
unstable.

1、AMSGrad

The convergence issues can be fixed by endowing such algorithms with “long-term memory” of past gradients, and propose new variants of the ADAM algorithm which not only fix the convergence issues but often also lead to improved empirical performance. ^[Reddi2019]

$\theta_t=\theta_{t-1}-\frac{\eta}{\sqrt{\hat v_t}+\varepsilon}m_t \\ \hat v_t=max(\hat v_{t-1}, v_t)$

2、AdaBound

In this paper, Luo et al. demonstrate that extreme learning rates can lead to poor performance. They provide new variants of ADAM and AMSGRAD, called ADABOUND and AMSBOUND respectively, which employ dynamic bounds onlearning rates to achieve a gradual and smooth transition from adaptive methods to SGD and give a theoretical proof of convergence. ^[Luo2019]

二、Towards Improving SGDM

stable;
little generalization gap;
better convergence.

1、Cyclical LR

This paper describes a new method for setting the learning rate, named cyclical learning rates, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates. Instead of monotonically decreasing the learning rate, this method lets the learning rate cyclically vary between reasonable boundary values. Training with cyclical learning rates instead of fixed values achieves improved classification accuracy without a need to tune and often in fewer iterations. ^[Smith2015]

2、SGDR

In this paper, Loshchilov and Frank propose a simple warm restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks. ^{[Loshchilov2016]}

3、One-cycle LR

In this paper, Smith et al. describe a phenomenon, which we named “super-convergence”, where neural networks can be trained an order of magnitude faster than with standard training methods. One of the key elements of super-convergence is training with one learning rate cycle and a large maximum learning rate. ^[Smith2017]

三、Combination of Adam and SGDM

1、SWATS

A simple strategy which Switches from Adam to SGD when a triggering condition is satisfied. The condition we propose relates to the projection of Adam steps on the gradient subspace^[Keskar2017].

2、RAdam

Liu et al. further propose Rectified Adam (RAdam), a novel variant of Adam, by introducing a term to rectify the variance of the adaptive learning rate.^[Liu2019]

四、总结

五、参考文献

Keskar, N. S., & Socher, R. (2017). Improving generalization performance by switching
from adam to sgd. arXiv. Retrieved from https://arxiv.org/abs/1712.07628
doi: 10.48550/ARXIV.1712.07628

Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., & Han, J. (2019). On
the variance of the adaptive learning rate and beyond. arXiv. Retrieved from
https://arxiv.org/abs/1908.03265 doi: 10.48550/ARXIV.19

Loshchilov, I., & Hutter, F. (2016). Sgdr: Stochastic gradient descent with warm restarts. arXiv. Retrieved from https://arxiv.org/abs/1608.03983 doi:10.48550/ARXIV.1608.03983

Luo, L., Xiong, Y., Liu, Y., & Sun, X. (2019). Adaptive gradient methods with
dynamic bound of learning rate. arXiv. Retrieved from https://arxiv.org/
abs/1902.09843 doi: 10.48550/ARXIV.1902.09843

Reddi, S. J., Kale, S., & Kumar, S. (2019). On the convergence of adam and beyond.
arXiv. Retrieved from https://arxiv.org/abs/1904.09237 doi: 10.48550/
ARXIV.1904.09237

Smith, L. N. (2015). Cyclical learning rates for training neural networks. arXiv.
Retrieved from https://arxiv.org/abs/1506.01186 doi: 10.48550/ARXIV
.1506.01186

Smith, L. N., & Topin, N. (2017). Super-convergence: Very fast training of neural
networks using large learning rates. arXiv. Retrieved from https://arxiv.org/
abs/1708.07120 doi: 10.48550/ARXIV.17

Ashen_0nee

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
7月24日深度学习笔记——Optimization

本文为7月24日深度学习笔记，分为五个章节：- Towards Improving Adam：AMSGrad、AdaBound；- Towards Improving SGDM：Cyclical LR、SGDR、One-cycle LR；- Combination of Adam and SGDM：SWATS、RAdam；- 总结；- 参考文献。
复制链接

扫一扫