STOCHASTIC GRADIENT/MIRROR DESCENT: MINIMAX OPTIMALITY AND IMPLICIT REGULARIZATION——ICLR 2019

最新推荐文章于 2023-07-04 16:25:22 发布

小乐&小蓝的house

最新推荐文章于 2023-07-04 16:25:22 发布

阅读量369

点赞数

分类专栏：优化文章标签：深度学习机器学习神经网络计算机视觉人工智能

本文链接：https://blog.csdn.net/qq_34738572/article/details/106176752

版权

优化专栏收录该内容

7 篇文章 1 订阅

订阅专栏

Abstract：

优秀短语：1.SGD have become increasingly popular in optimization

2.it is now widely recognized that

3.play a key role in reaching "good" solutions that

4.In an attempt to shed some light on why this is the case 试图阐明为什么会这样

5.we also argue 我们也讨论了

6.provide insights into 揭示了

单词释义：1.generalize 泛化 2.re-visit 重访 3. properties 属性 4.holds for 持有 5.sufficiently 足够，充分

6.namely convergence 即收敛 7.implicit regularization 隐式正则化 8.over-parameterized 过度参数化

9.interpolating regime 差值制度

文章主要内容：We revisit some minimax properties of SGD and extend them to general stochastic mirror descent (SMD) algorithms. In particular, we show that there is a fundamental identity 基本恒等式 which holds for SMD (and SGD) , and which implies the minimax optimality of SMD (and SGD) ~~for (sufficiently small step size, and for a general class of loss functions and general nonlinear models)~~. We further show that this identity can be used to naturally establish other properties of SMD (and SGD). We also argue how this identity can be used in the so-called “highly over-parameterized” nonlinear setting.

文章主要包含3个方面内容，1、基本恒等式介绍。 2、恒等式可以用来自然地建立SMD和SGD的收敛、隐式正则化属性。3、恒等式可以用于非线性函数“过度参数”设置。

1 Introduction

词：1.good generalization properties 良好的泛化性能

2.Initially 最初

3. interpolate 差值

4.uncountably infinitely 无穷无尽
5.drastically 彻底的，激烈的

6. trivial 微不足道的

7.non-convexity 非凸性

8.saddle points 鞍点
9.discrepancy 差异

短语：1. arguably, remains somewhat of a mystery to this day 可以说至今仍是个谜
2. it has been observed that 据观察
3.In other words 换句话说

4. In particular 特别是
5.Even though it may seem at first that 尽管，乍一看

6.What is even more interesting is that 更有趣的是
7. in the absence of 没有

8.which again highlights the important role of the optimization algorithm in generalization 再次强调了泛化中的优化算法的重要性

9.Despite this recent progress

10.would be of great interest 引起极大的兴趣

段落：第一段：(SGD and its variants) Optimization algorithms used to train these models play a key role in learning parameters that generalize well.
第二段：Which minimum among all the possible minima we pick in practice is determined by the optimization algorithm that we use for training the model. There is a discrepancy in the solutions achieved by different algorithm,which again highlights the important role of the optimization algorithm in generalization.

第三段：Most results explaining the behavior of the optimization algorithm are limited to linear or very simplistic models.Therefore, a general characterization of the behavior of stochastic descent algorithms for more general models would be of great interest.

1.1 OUR CONTRIBUCTION

词语：1.We do so by 。。。。2. speculative arguments 推测性论点

短语：1.we also use the theory developed in this paper to provide some speculative arguments into why

2.common to deep learning

3.In an attempt to make the paper easier to follow 为了使本文更容易理解

4.We demonstrate some implications of this theory

5.we finally conclude with some remarks

段落： In this paper, we present an the stochastic mirror descent (SMD) family of algorithms.

1.公式2和5定义算法

2.证明了SMD方法是极小极大滤波问题的最优方法。

3.泛化了文献中的几个结果

4.该算法可以推出文献中优化/学习性质——SMD的隐式正则化。

5.产生SMD收敛的新的结果。

6.证明在非线性条件下，SMD与深度学习具有相似的收敛和隐式正则化。

第三段：第三节描述了论文的主要思想和简单设置结果——SGD优化l线性模型oss平方的结果

第四节涉及H理论，有关SMD的一般损失函数类和一般非线性模型的完整结果。

第5节中，我们证明了该理论的相关性质——确定性收敛和隐式正则化。

第6节描述了我们得出的相关结论。

2 PRELIMINARIES

词：1.Denote表示 2.stationarity 平稳性 3.regime 机制 4.so-caled 所谓的
5.uncountably infinitely many 无限多个 6.manifold 流形 7.empirical 经验
8.renders 使得 9.i.e.，即

短语：1.The noise can be due to actual measurement error 2.it can be a combination of both 两者的结合

3.We remark that 我们指出

3 WARM-UP: REVISITING SGD ON SQUARE LOSS OF LINEAR MODELS

词：1.remark 备注 2. recursions 递归 3.converging 汇聚 4.instantaneous 瞬间 5.non-vanishing 不消失 6. estimate 估计

7.pretain 属于 8.Prior 先前

短语：1.The update in this case can be expressed as 这种情况下更新可以表达为

2.We should point out that 我们应该指出

3.The reason being that 原因是

4.if this is not the case 如果不是这样

。。。。。。。相关算法描述

最终结果对比的曲线：

小乐&小蓝的house

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
STOCHASTIC GRADIENT/MIRROR DESCENT: MINIMAX OPTIMALITY AND IMPLICIT REGULARIZATION——ICLR 2019

Abstract：优秀短语：1.SGD have become increasingly popular in optimization2.it is now widely recognized that3.play a key role in reaching "good" solutions that4.In an attempt to shed some light on why this is the case 试图阐明为什么会这样5.we also argue 我们也讨论了..
复制链接

扫一扫

专栏目录