spark如何进行聚类可视化_Spark MLBase分布式机器学习系统入门:以MLlib实现Kmeans聚类算法...

1.什么是MLBaseMLBase是Spark生态圈的一部分,专注于机器学习,包含三个组件:MLlib、MLI、ML Optimizer。ML Optimizer: This layer aims to automating the task of ML pipeline construction. The optimizer solves a search problem over featur...
摘要由CSDN通过智能技术生成

1.什么是MLBase

MLBase是Spark生态圈的一部分,专注于机器学习,包含三个组件:MLlib、MLI、ML Optimizer。

ML Optimizer: This layer aims to automating the task of ML pipeline construction. The optimizer solves a search problem over feature extractors and ML algorithms included inMLI and MLlib. The ML Optimizer is currently under active development.

MLI: An experimental API for feature extraction and algorithm development that introduces high-level ML programming abstractions. A prototype of MLI has been implemented against Spark, and serves as a testbed for MLlib.

MLlib: Apache Spark's distributed ML library. MLlib was initially developed as part of the MLbase project, and the library is currently supported by the Spark community. Many features in MLlib have been borrowed from ML Optimizer and MLI, e.g., the model and algorithm APIs, multimodel training, sparse data support, design of local / distributed matrices, etc.

2.MLbase机器学习算法的流程

用户可以容易地使用MLbase这个工具来处理自己的数据。大部分的机器学习算法都包含训练以及预测两个部分,训练出模型,然后对未知样本进行预测。Spark中的机器学习包也是如此。

Spark将机器学习算法都分成了两个模块:

训练模块:通过训练样本输出模型参数

预测模块:利用模型参数初始化,预测测试样本,输出与测值。

MLbase提供了函数式编程语言Scala,利用MLlib可以很方便的实现机器学习的常用算法。

比如说,我们要做分类,只需要写如下scala代码:

1 var X = load("some_data", 2 to 10)2 var y = load("some_data", 1)3 var (fn-model, summary) = doClassify(X, y)

代码解释:X是需要分类的数据集,y

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值