Projection-free Online Learning

Introduction

As a computational bottleneck when applying online learning to massive datasets, the projection-step always restricts the application. This paper provides a projection-free algorithm (OFW) for online learning with three advantages: the first one is computationally efficient, second, better regret bounds for cases such as stochastic online smooth convex optimization and parameter-free in the stochastic case and produce sparse decisions. Traditional online gradient descent method requires a projection step onto a feasible set. In Euclidean space, this projection step is essentially a step to solve a convex quadratic programming problem on a convex set. A series of regret boundaries are derived for different settings. The experiment in collaborative filtering application shows that the improved method is based on the standard data set.

 

Motivation

Projection is usually the bottleneck of the algorithm because it involves  norm and quadratic programming. On the contrary, in many practical cases, it is impossible to solve convex quadratic programming, and linear optimization can be carried out effectively. In this paper, an effective online learning algorithm is proposed, in which the linear optimization step is used to replace the projection step under various settings.

The above theorem entails several appealing advantages over the existing methods for online learning.

 

Preliminaries

In the setting of online convex optimization problem, we assume that the set  diameter bounded by D and it is possible to efficiently minimize a linear function, for any two points , we have

Then, smooth function is defined as:

The feature of this algorithm is that they predict with sparse solutions, where sparsity is defined in the following manner.

 

Algorithm

 

The algorithm produces t-sparse prediction at iteration t with restrict to the underlying decision set . It defines . The regret bounds follow this general theorem.

 

Discussion

In order to compare the performance of OFW and OGD, this paper uses a simple test application to test online collaborative filtering. The results show that the speed of OFW is significantly faster than that of OGD, and 100000 iterations are completed before OGD iterations are much less. In addition, OFW also reduces the average square loss faster than OGD, rather than random cost. This is reasonable because it takes more effort to compute the singular value decomposition of dense matrix than sparse matrix. OGD requires the singular value decomposition of matrix in each iteration, while OFW requires the top singular vector pair.

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值