Aim ‾ \underline{\text{Aim}} Aim
In this paper, the concept of online convex programming is introduced. To solve the online convex programming problem, an algorithm for general convex functions based on gradient descent is also presented.
Background ‾ \underline{\text{Background}} Background
Online convex programming widely appears in fields such as model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. One of the core rule in online convex programming is that, though the convex set in which you make your choice is known in advance, the cost function for that step can be seen only after you make your choice (in that step).
To handle an online convex programming problem, there exists a widely-used algorithm named experts algorithm. Compared to the experts algorithm, the gradient-descent-based algorithm in this paper has two advantages: firstly, it can handle an arbitrary sequence of convex functions, which has yet to be solved; secondly, this algorithm can in some circumstances perform better than an experts algorithm in online linear programs.
Brief Project Description ‾ \underline{\text{Brief Project Description}} Brief Project Description
Definition 4 give the definition of an online convex programming problem: An online convex programming problem consists of a feasible set F ⊆ R n \mathcal{F} \subseteq\mathbb{R}^n F⊆Rn and an infinite sequence { c 1 , c 2 , ⋯ } \{c^1,c^2,\cdots\} {c1,c2,⋯} where each c t c^t ct: F → R \mathcal{F}\rightarrow\mathbb{R} F→R is a convex function. At each time step t t t, an online convex programming algorithm selects a vector x t ∈ F \bm{x}^t\in\mathcal{F} xt∈F. After the vector is selected, it retrieves the cost function c t c^t ct.
Two Algorithms, Greedy Projection and Lazy Projection, are proposed for this problem. For the former algorithm, both the upper bounds of the regret (Theorem 1) and the dynamic regret (Theorem 2) of it are derived. It can be seen that the regret upper bound approaches 0 (as T → 0 T\rightarrow0 T→0). For the Lazy Projection, only the upper bound of the regret is derived.
It is established in this paper that repeated games are online linear programming problems. A repeated game is thus formulated as an online linear program and then an algorithm (GIGA, Generalized Infinitesimal Gradient Ascent) for this online linear program is proposed.An upper bound of the expected regret of GIGA is derived in Theorem 4. Observed that GIGA is essentially just a method of constructing a behavior, though a bevavior from it can be generated by proper simulations. As the strategy of GIGA in the current time step can be calculated given x 1 x^1 x1 (a constant) and the past actions of the environment, GIGA is also self-oblivious. And, moreover, GIGA is also universally consistent.
This paper naively translate 1) algorithms for mixing experts into algorithms for online linear programs, and 2) online linear programming algorithms into algorithms for online convex programs. Algorithm 5 presents how to achieve the former translation. As for the latter one, two issues should be handled: one is that the feasible region of an online convex program may be impossible to be described as the convex hull of any finite number of points; the other one is that a convex function may not achieve its optimal value at the boundary of the feasible region. About the former issue, the paper just assume that the OLPA can handle the feasible region of the online convex programming problem. As for the latter one, the paper handle it by converting the cost function to a linear one. Algorithm 6 and Algorithm 7 are the Exact and Approx versions of the convertion from an OLPA (online linear programming algorithm) to an online programming algorithm. The worst-case difference between Approx and Exact are bounded in the paper.
Significance of Paper ‾ \underline{\text{Significance of Paper}} Significance of Paper
In this paper, we have defined an online convex programming problem. We have established that gradient descent is a very effective technique on this problem. This work was motivated by trying to better understand the infinitesimal gradient ascent algorithm, and the techniques developed we applied to that problem to establish an extension to infinitesimal gradient ascent that is universally consistent.
The definition of Online Convex Programming Problem is firstly given – this makes the paper a pioneer in this field. The algorithms proposed are all based on gradient descent, one of the most simple and natural algorithm that is widely used. And the analytical upper bound of different kinds of regrets render the algorithms usable, rather than just intuitively understandable.
Reference {\text{\Large Reference}} Reference
[1] Zinkevich, Martin. “Online convex programming and generalized infinitesimal gradient ascent.” Proceedings of the 20th international conference on machine learning (icml-03). 2003.