Spark ml 官方文档 - ML Pipelines

最新推荐文章于 2024-04-17 18:49:48 发布

Arlison ^O^ ???

最新推荐文章于 2024-04-17 18:49:48 发布

阅读量567

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/arlison/article/details/103730807

版权

Spark MLlib 中的 Transformer 包括特征转换器和学习模型，实现 transform() 方法用于 DataFrame 的转换。Estimator 表征学习算法，实现 fit() 方法以 DataFrame 为基础训练模型。Pipeline 是一系列 Transformer 和 Estimator 阶段的序列，依次运行进行数据转换和模型训练。PipelineModel 是 Pipeline 的训练结果，用于测试阶段，确保训练和测试数据经历相同处理步骤。

摘要由CSDN通过智能技术生成

Transformer: A Transformer is an abstraction that includes feature transformers and learned models. Technically, a Transformer implements a method transform(), which converts one DataFrame into another, generally by appending one or more columns.
Estimator: An Estimator abstracts the concept of a learning algorithm or any algorithm that fits or trains on data. Technically, an Estimator implements a method fit(), which accepts a DataFrame and produces a Model, which is a Transformer.
Pipeline：A Pipeline is specified as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through each stage. For Transformer stages, the transform() method is called on the DataFrame. For Estimator stages, the fit() method is called to produce a Transformer, and that Transformer’s transform() method is called on the DataFrame.