Review of Papar“Dual Averaging Method for Regulized Stochastic Learning and Online Optimization“

RDA(regularized dual average) is a algorithm to efficiently solve regularized stochastic learning and regularized online optimization. This paper shows RDA's effectivity for sparse online learning with L1-regularization.

For the traditional online algorithms, such as SGD, they have limited capability of exploiting problem structure in solving regularized learning problems. It means that their low accuracy often makes it hard to obtain the desired regularization effects. So in this paper, the author introduces a strongly convex function which has the same optimal as the regularization function to make great use of this strong convexity to make the iterarion converge faster and get better result.

The algorithm is showed below and some explaination will be given:

 Let's take a look at the average subgradient:\bar{g_t} = \frac{t-1}{t}\bar{g}_{t-1} + \frac{1}{t}g_{t} where g_{t} is the subgradient of  \partial f_{t}(W_{t}).As samples added, the subgradient contains not only the new t's information but part of all previous information, which makes the loss function only trucates and get the effect similar to batch processing. The weight updates as W_{t+1} = arg {min_{W}}^{​{<\bar{g_{t}}, W> + \psi (W) + \frac{\beta _{t}}{t}h(W)}}. Some may ask about the complexity of this optimization problem. Fortunately, this problem is indeed simple for many important leaning problems in practice. Compare this form with SGD method:

W_{t+1} = w_{t} - \alpha _{t}(g_t + \xi _t), we can find that these two method should have same solution theoratically, which is actually the thinking of dual average. Besides, this form is interesting and below is my thinking of its motivation. For the regularization function, there might be a gentle slope where the converge might be really slow. If we introduce a strong convex function here and make these two function have same optimal, then the solution will be obvious and easy to converge.

This paper does not include the complete proof of regret bounds and convergence.  Some simple results are provided: for regularization function whose convexity parameter is 0: By setting proper parameters, we get regret bound o(\sqrt{t}) and for those strongly convex function: o(ln(t)). Besides, as for the optimization problem mentioned before, the author uses some examples to give a simple solution to some commonly used regularization function. As a result, this state-of-the-art RDA algorithm is useful for regularized online optimization problems.

Furthermore, this algorithm also performs well on solving regularized stochastic learning problem. As it is not the main topic of this course, here I only list some great performance here:1.the computational complexity per iteration is 𝑂(𝑛), the same as the SGD method.2.converges to the optimal solution of with the optimal rate 𝑂(1/√𝑡). If the the regularization function Ψ(𝑤) is strongly convex, we have the better rate 𝑂(ln 𝑡/𝑡).

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值