spark如何进行聚类可视化_Spark MLBase分布式机器学习系统入门：以MLlib实现Kmeans聚类算法...

最新推荐文章于 2023-07-19 11:37:31 发布

weixin_39710991

最新推荐文章于 2023-07-19 11:37:31 发布

阅读量271

点赞数

文章标签： spark如何进行聚类可视化

本文链接：https://blog.csdn.net/weixin_39710991/article/details/111908544

版权

1.什么是MLBaseMLBase是Spark生态圈的一部分，专注于机器学习，包含三个组件：MLlib、MLI、ML Optimizer。ML Optimizer: This layer aims to automating the task of ML pipeline construction. The optimizer solves a search problem over featur...

摘要由CSDN通过智能技术生成

1.什么是MLBase

MLBase是Spark生态圈的一部分，专注于机器学习，包含三个组件：MLlib、MLI、ML Optimizer。

ML Optimizer: This layer aims to automating the task of ML pipeline construction. The optimizer solves a search problem over feature extractors and ML algorithms included inMLI and MLlib. The ML Optimizer is currently under active development.

MLI: An experimental API for feature extraction and algorithm development that introduces high-level ML programming abstractions. A prototype of MLI has been implemented against Spark, and serves as a testbed for MLlib.

MLlib: Apache Spark's distributed ML library. MLlib was initially developed as part of the MLbase project, and the library is currently supported by the Spark community. Many features in MLlib have been borrowed from ML Optimizer and MLI, e.g., the model and algorithm APIs, multimodel training, sparse data support, design of local / distributed matrices, etc.

2.MLbase机器学习算法的流程

用户可以容易地使用MLbase这个工具来处理自己的数据。大部分的机器学习算法都包含训练以及预测两个部分，训练出模型，然后对未知样本进行预测。Spark中的机器学习包也是如此。

Spark将机器学习算法都分成了两个模块：

训练模块：通过训练样本输出模型参数

预测模块：利用模型参数初始化，预测测试样本，输出与测值。

MLbase提供了函数式编程语言Scala，利用MLlib可以很方便的实现机器学习的常用算法。

比如说，我们要做分类，只需要写如下scala代码：

1 var X = load("some_data", 2 to 10)2 var y = load("some_data", 1)3 var (fn-model, summary) = doClassify(X, y)

代码解释：X是需要分类的数据集，y

最低0.47元/天解锁文章

weixin_39710991

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
spark如何进行聚类可视化_Spark MLBase分布式机器学习系统入门：以MLlib实现Kmeans聚类算法...

1.什么是MLBaseMLBase是Spark生态圈的一部分，专注于机器学习，包含三个组件：MLlib、MLI、ML Optimizer。ML Optimizer: This layer aims to automating the task of ML pipeline construction. The optimizer solves a search problem over featur...
复制链接

扫一扫