对比了下的spark mllib和 Liblinear 的LR的实现

最新推荐文章于 2022-04-24 11:14:47 发布

修鹏李

最新推荐文章于 2022-04-24 11:14:47 发布

阅读量4.4k

点赞数 2

分类专栏：推荐系统

本文链接：https://blog.csdn.net/map_lixiupeng/article/details/51814827

版权

12 篇文章 1 订阅

订阅专栏

对比了下的spark mllib和 Liblinear 的LR的实现：

liblinear 是基于TRON的求解方式，Mllib的LR是基于LBFGS和SGD两种实现方式都有。

http://spark.apache.org/docs/latest/mllib-linear-methods.html

Algorithms are all implemented in Scala:

https://www.csie.ntu.edu.tw/~cjlin/liblinear/

LIBLINEAR is a linear classifier for data with millions of instances and features. It supports

L2-regularized classifiers
L2-loss linear SVM, L1-loss linear SVM, and logistic regression (LR)
L1-regularized classifiers (after version 1.4)
L2-loss linear SVM and logistic regression (LR)
L2-regularized support vector regression (after version 1.9)
L2-loss linear SVR and L1-loss linear SVR.

目前我们用的单机版本的是SGD实验，理论上LBFGS比SGD好，但是由于目前我们样本的特征的问题，SGD表现比较好，和其他公司也是有统一的问题，

都猜测是word2vec映射后的特征冗余的问题。

对比Mllib的LR是基于LBFGS和Liblinear是基于TRON的效果测试：发现优化的性能非常接近。

MLLIB里面目前已经有的 online的方式，它也实现的基于spark streaming 的SGD的online 方式，但是我们还是比较偏重于FTRL