STOCHASTIC GRADIENT/MIRROR DESCENT: MINIMAX OPTIMALITY AND IMPLICIT REGULARIZATION——ICLR 2019

Abstract:

优秀短语:1.SGD have become increasingly popular in optimization

2.it is now widely recognized that 

3.play a key role in reaching "good" solutions that 

4.In an attempt to shed some light on why this is the case 试图阐明为什么会这样

5.we also argue 我们也讨论了

6.provide insights into 揭示了

单词释义:1.generalize 泛化        2.re-visit 重访         3. properties 属性         4.holds for 持有      5.sufficiently 足够,充分

6.namely convergence 即收敛      7.implicit regularization 隐式正则化             8.over-parameterized 过度参数化     

9.interpolating regime 差值制度

文章主要内容:We revisit some minimax properties of SGD and extend them to general stochastic mirror descent (SMD) algorithms. In particular, we show that there is a fundamental identity 基本恒等式 which holds for SMD (and SGD) , and which implies the minimax optimality of SMD (and SGD) for (sufficiently small step size, and for a general class of loss functions and general nonlinear models). We further show that this identity can be used to naturally establish other properties of SMD (and SGD). We also argue how this identity can be used in the so-called “highly over-parameterized” nonlinear setting.

文章主要包含3个方面内容,1、基本恒等式介绍。  2、恒等式可以用来自然地建立SMD和SGD的收敛、隐式正则化属性。3、恒等式可以用于非线性函数“过度参数”设置。

1 Introduction

词:1.good generalization properties 良好的泛化性能

2.Initially 最初

3. interpolate 差值

4.uncountably infinitely 无穷无尽
5.drastically 彻底的,激烈的

6. trivial 微不足道的

7.non-convexity 非凸性

8.saddle points 鞍点
9.discrepancy 差异

短语:1. arguably, remains somewhat of a mystery to this day 可以说至今仍是个谜
2. it has been observed that 据观察
3.In other words 换句话说

4. In particular 特别是
5.Even though it may seem at first that 尽管,乍一看

6.What is even more interesting is that  更有趣的是
7. in the absence of 没有

8.which again highlights the important role of the optimization algorithm in generalization 再次强调了泛化中的优化算法的重要性

9.Despite this recent progress

10.would be of great interest 引起极大的兴趣

段落:第一段:(SGD and its variants) Optimization algorithms used to train these models play a key role in learning parameters that generalize well.
第二段:Which minimum among all the possible minima we pick in practice is determined by the optimization algorithm that we use for training the model. There is a discrepancy in the solutions achieved by different algorithm,which again highlights the important role of the optimization algorithm in generalization.

第三段:Most results explaining the behavior of the optimization algorithm are limited to linear or very simplistic models.Therefore, a general characterization of the behavior of stochastic descent algorithms for more general models would be of great interest.

1.1 OUR CONTRIBUCTION

词语:1.We do so by 。。。。2. speculative arguments 推测性论点
 

短语:1.we also use the theory developed in this paper to provide some speculative arguments into why 

2.common to deep learning

3.In an attempt to make the paper easier to follow 为了使本文更容易理解

4.We demonstrate some implications of this theory 

5.we finally conclude with some remarks

段落: In this paper, we present an the stochastic mirror descent (SMD) family of algorithms.

1.公式2和5定义算法

2.证明了SMD方法是极小极大滤波问题的最优方法。

3.泛化了文献中的几个结果

4.该算法可以推出文献中优化/学习性质——SMD的隐式正则化。

5.产生SMD收敛的新的结果。

6.证明在非线性条件下,SMD与深度学习具有相似的收敛和隐式正则化。

第三段:第三节描述了论文的主要思想和简单设置结果——SGD优化l线性模型oss平方的结果

第四节涉及H理论,有关SMD的一般损失函数类和一般非线性模型的完整结果。

第5节中,我们证明了该理论的相关性质——确定性收敛和隐式正则化。

第6节描述了我们得出的相关结论。

2 PRELIMINARIES  

词:1.Denote表示 2.stationarity 平稳性  3.regime 机制  4.so-caled  所谓的
 5.uncountably infinitely many 无限多个  6.manifold 流形 7.empirical  经验
 8.renders 使得  9.i.e.,即

短语:1.The noise can be due to actual measurement error    2.it can be a combination of both 两者的结合

3.We remark that 我们指出


3 WARM-UP: REVISITING SGD ON SQUARE LOSS OF LINEAR MODELS

词:1.remark 备注 2. recursions 递归  3.converging  汇聚  4.instantaneous 瞬间 5.non-vanishing 不消失 6. estimate 估计

7.pretain 属于  8.Prior  先前
 

短语:1.The update in this case can be expressed as 这种情况下更新可以表达为

2.We should point out that  我们应该指出

3.The reason being that  原因是

4.if this is not the case  如果不是这样

。。。。。。。相关算法描述

最终结果对比的曲线:


 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值