Review Report of paper “Online Convex Programming and Generalized Infinitesimal Gradient Ascent“

The paper contains two main parts. Part A is online convex programming. The author describe a Greedy Projection algorithm and analyze the performance of it, proving that the average regret of it converges to zero. Part B is an application of OCP. The author firstly formulates the repeated game as online convex program and then proposes a Generalized Infinitesimal Gradient Ascent(GIGA) algorithm which is quite similar to Greedy Projection. All in all, the main contribution this paper gives is that the Greedy Projection, or say projection gradient descent methods gives a general algorithm for a large variety of problems. Besides, this simple algorithm gets state-of-art performance.

Firstly, the definition of online convex program is that for any step t, only after we choose one from feasible set we can see the cost function. Obviously, online convex optimization is not able to find the optimal of the problem due to the uncertainty. But we do be able to find the bound of a certain algorithm considering some specific rules. regret is the difference between the total online cost and the cost of a particular kind of offline algorithm. Here offline algorithm means that we know the cost function in advance and we choose the same point in the feasible set in each step. The offline algorithm is simply a ordinary convex optimization problem.

To cope with this online convex problem, greedy algorithm is proposed in this paper. Greedy algorithm use gradient descent and feasible set projection alternatively to obtain a sequence of inputs. At step t use the current cost and action to do one step gradient descent. Then projecting the value of it to the feasible set using the argmin projection. It is proved that the averatge regret of greedy projection approaches zero. Moreover, in section 2.3, the auther proposes another algorithm Lazy Projection which performs suprisingly well.

In part B, to deal with repeated game problem, the auther proposes generalized infinitesimal gradient ascent alrithm. It shows the application of online convex optimization. As utility needs to be as large as possible, we should use gradient ascent instead of descent. Repeated game regret is a little bit different. It is the maximum regret of not playing a paticular action, which means that both player's and enviroment's behavior are distributions. Besides, instead of gradient ascent, it acutally use the utility function itself during updating. By setting the learning rate equal to square root of time slot t, GIGA is universally consistent.

It is a useful guideline for works in projectiong gradient decent method under the setting of online learning. Besides, it suggests future work to accomplish this work in a reinforce learning framework.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值