vowpal_wabbit是在单机上性能极高的机器学习库,比较online learning and batch learning

vowpal_wabbit 是在单机上速度非常快的机器学习库。

本质原因是vowpal_wabbit采用的是在线学习,也即优化方法采用的是随机梯度下降的方法。相比较batch gradient,online-learnging 的速度快,但是效果可能没有batch-learning好。

在博客上,关于vowpal_wabbit中,关于online-learning和batch-learning的讨论,感觉很经典就摘录于此。

One cannot compare batch learning with online learning.

vowpal_wabbit is not a batch learner. It is an online learner. On-line-learners learn by looking at examples one at a time and slightly adjusting the weights of the model as they go.

There are advantages and disadvantages to online learning. The downside is thatconvergence to the final model is slow/gradual. The learner doesn't do a "perfect"job at extracting information from each example, because the process is iterative. Convergence on a final result is deliberately restrained/slow. This can make online learners appear weak on tiny data-sets like the above.(on line learning,在线学习收敛速度慢,在小数据集上表现不佳)

There are several upsides though:

·                   Online learners don't need to load the full data into memory (they work by examining one example at a time and adjusting the model based on the real-time observed per-example loss) so they can scale easily to billions of examples. A 2011 paper by 4 Yahoo!researchers describe show vowpal wabbit was used to learn from a tera (10^12) feature data-set in 1hour on 1k nodes. Users regularly use vw to learn from billions of examples data-sets on their desktops and laptops.on-line-learning 不需要将所有的数据集全部加载进来,所以,在单机上也是可以处理海量的数据,因为它是一条条数据进行处理

·                   Online learning is adaptive and can track changes in conditions over time, so it can learn from non-stationary data, like learning against an adaptive adversary.

·                   Learning introspection: one can observe loss convergence rates while training and identify specific issues, and even gain significant insights from specific data-set examples or features,on-line-learning可以在训练的过程中观察收敛情况。

·                   Online learners can learn in an incremental fashion so users can intermix labeled and unlabeled examples to keep learning while predicting at the same time.

·                   The estimated error is test-like (no need tosplit the data into train and test subsets)

Online learner sare very sensitive to example order. The worst possible order for an online learner is when classes are clustered together(all, or almost all, -1s appear first, followed by all 1s) like the example above does. So the first thing to do to get better results from an online learner like vowpal wabbit, is to uniformly shuffle the 1s and -1s (or simply order by time, as the examples typically appear in real-life).online-learner对样本的顺序敏感,比如在预测点击的数据集中,点击的样本集中在前面,未点击的数据集中在后面,那么学习的效果就会不好。




  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值