Introduction
As a computational bottleneck when applying online learning to massive datasets, the projection-step always restricts the application. This paper provides a projection-free algorithm (OFW) for online learning with three advantages: the first one is computationally efficient, second, better regret bounds for cases such as stochastic online smooth convex optimization and parameter-free in the stochastic case and produce sparse decisions. Traditional online gradient descent method requires a projection step onto a feasible set. In Euclidean space, this projection step is essentially a step to solve a convex quadratic programming problem on a convex set. A series of regret boundaries are derived for different settings. The experiment in collaborative filtering application shows that the improved method is based on the standard data set.
Motivation
Projection is usually the bottleneck of the algorithm because it involves norm and quadratic programming. On the contrary, in many practical cases, it is impossible to solve convex quadratic programming, and linear optimization can be carried out effectively. In this paper, an effective online learning algorithm is proposed, in which the linear optimization step is used to replace the projection step under various settings.
The above theorem entails several appealing advantages over the existing methods for online learning.
Preliminaries
In the setting of online convex optimization problem, we assume that the set diameter bounded by D and it is possible to efficiently minimize a linear function, for any two points , we have
Then, smooth function is defined as:
The feature of this algorithm is that they predict with sparse solutions, where sparsity is defined in the following manner.
Algorithm
The algorithm produces t-sparse prediction at iteration t with restrict to the underlying decision set . It defines . The regret bounds follow this general theorem.
Discussion
In order to compare the performance of OFW and OGD, this paper uses a simple test application to test online collaborative filtering. The results show that the speed of OFW is significantly faster than that of OGD, and 100000 iterations are completed before OGD iterations are much less. In addition, OFW also reduces the average square loss faster than OGD, rather than random cost. This is reasonable because it takes more effort to compute the singular value decomposition of dense matrix than sparse matrix. OGD requires the singular value decomposition of matrix in each iteration, while OFW requires the top singular vector pair.