浅析深度学习中优化方法

最新推荐文章于 2022-02-19 15:46:44 发布

何雷

最新推荐文章于 2022-02-19 15:46:44 发布

阅读量3.4k

点赞数 2

分类专栏： DNN 机器学习中的优化方法文章标签：深度学习 SGD Momentum Adagrad Adam

本文链接：https://blog.csdn.net/helei001/article/details/54379446

版权

机器学习中的优化方法同时被 2 个专栏收录

4 篇文章 116 订阅

订阅专栏

DNN

2 篇文章 0 订阅

订阅专栏

目前而言，深度学习是机器学习的发展前沿，一般针对大数据量的学习目标。其优化方法来源于基本的机器学习的优化方法，但也有所不同。

下面，小结一下，其基础是随机梯度下降的方法，但是为了学习的自适应性，做了如下改进：1. 因为每次训练的数据不一样，可能导致目标函数的梯度变化剧烈，为了解决这个问题，联合上次迭代的梯度和当前梯度，使梯度变化变缓（指数衰减）；2. 在学习过程中，当迭代结果接近最优值时，我们需要学习率（即步长）越来越小，去逼近最优值，要不然会出现震荡情况导致网络不收敛。为了解决这个问题，引入学习率自适应减小机制。

参考资料：

¹⁾ Ruder, An overview of gradient descent optimization algorithms http://sebastianruder.com/optimizing-gradient-descent/index.html#gradientdescentoptimizationalgorithms

²⁾ https://climin.readthedocs.org/en/latest/#optimizer-overview

³⁾ Schaul, Antonoglou, Silver, Unit Tests for Stochastic Optimization

⁴⁾ Sutskever, Martens, Dahl, and Hinton, “On the importance of initialization and momentum in deep learning” (ICML 2013)

⁵⁾ Dyer, “Notes on AdaGrad”

⁶⁾ Duchi, Hazan, and Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization” (COLT 2010)

⁷⁾ Hinton, Srivastava, and Swersky, “rmsprop: Divide the gradient by a running average of its recent magnitude”

⁸⁾ Dauphin, Vries, Chung and Bengion, “RMSProp and equilibrated adaptive learning rates for non-convex optimization”

⁹⁾ Graves, “Generating Sequences with Recurrent Neural Networks”

¹⁰⁾ Zeiler, “Adadelta: An Adaptive Learning Rate Method”

¹¹⁾ Kingma and Ba, “Adam: A Method for Stochastic Optimization”

12）http://colinraffel.com/wiki/stochastic_optimization_techniques