Making Gradient Descent Optimal for Strongly Convex Stochastic Optimizatio

 

Brief Introduction and Background

In the practice of online learning in industrial, stochastic gradient descend is a common method to solve optimization problems. To be specific, SGD is the most popular method for its satisfactory convergence rate, known as O(log⁡(T)/T). Under the condition of a convex loss function and a training set of T examples, SGD is employed to generate a sequence of T point predictors {w1,…,wT}. However, when the loss function  is strongly convex, there is a paper declaimed that  is not the most optimal convergence rate. In particular,  could be obtained with a more complex and comparable more computational complexity algorithm (Hazan & Kale, 2011). In other words, it means that  can be too loose to analyze the stochastic setting properly. This paper, however, proves with experiment results that SGD is still the optimal algorithm in the stochastic setting and avoid an online analysis with an online-to-batch conversion. They show that, for smooth problems, SGD can also reach a convergence rate at , for non-smooth problems, the bound convergence rate might still be O(log⁡(T)/T).

Definition

Under the standard setting of convex stochastic optimization, convex function  is unknown, and a vector  is generated with a given input . The goal of optimization algorithm is to find a predictor  whose expected loss F(w) is optimal.

∀w,w'∈W, ∀g, F is strongly convex, if

 

Then the algorithm reaches a single point after obtaining a sequence of points

With the projection operator on , the can be obtained by

This paper considered a more general step sizes with a constant  instead of the step size  in the stochastic optimization of -strongly function analysis.

Smooth functions and non-smooth functions

The author investigates the optimality of SGD algorithm in two different conditions: with smooth function and non-smooth function. In the optimization problems where convex function F(.) is both strongly convex and smooth with

Pick  for constant ,which indicates that the distance from to w* is on the order of For the non-smooth function, the more general cases, the author declaimed that the intuition that an  rate for SGD with averaging is tighter than  is incorrect. The lower bound of SGD with averaging is , which indicates that the convergence rate of SGD is still tight for online learning.

 

Experiment results

This article conducted three experiments. The first is strong convex smooth F; the second is non-smooth strong convex F; the third is different from the first two in that there are three real binary classification data sets. The essential algorithms are compared in three experiments: the first three algorithms are SGD algorithms using the above-mentioned different point selection strategies, and the latter algorithm is the strong convex stochastic optimization algorithm EPOCH_GD (Hazan & Kale, 2011) algorithm. The results of the three experiments all show that, among a variety of algorithms, the SGD algorithm that selects the last-w strategy has the worst performance, and the α suffix average SGD algorithm has the best performance.

Discussion

This paper prove the optimality of SGD algorithm in the stochastic setting compared with the more complex algorithm with convergence rate . The simple and most popular method performs efficiently under smooth function condition and non-smooth function condition. And it also reveals that SGD can also reach  with a simple modification of averaging step. However, the author also left some questions for future research. The  rate still need averaging in non-smooth case.

 

Reference

[1] Hazan, E. and Kale, S. Beyond the regret minimization barrier: An optimal algorithm for stochastic strongly-convex optimization. In COLT, 2011.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值