Spark机器学习有哪些算法？

最新推荐文章于 2025-11-03 22:06:14 发布

原创最新推荐文章于 2025-11-03 22:06:14 发布 · 6.3k 阅读

CC 4.0 BY-SA版权

文章标签：

22 篇文章

订阅专栏

本文介绍了Apache Spark MLlib中的多种机器学习算法和技术，包括分类、回归、聚类、推荐系统等，并概述了特征转换、模型评估及超参数调优等关键组件。

MLlib contains many algorithms and utilities, including: MLLib包括许多算法和工具，包括：

Classification: logistic regression, naive Bayes,... 分类：逻辑回归，朴素贝叶斯……
Regression: generalized linear regression, isotonic regression,... 回归：线性回归，保序回归
Decision trees, random forests, and gradient-boosted trees 决策树：随机森林，梯度提升决策树
Recommendation: alternating least squares (ALS) 推荐：交替最小二乘法 (ALS)
Clustering: K-means, Gaussian mixtures (GMMs),... 聚类：K-means，高斯混合模型(GMMs),...
Topic modeling: latent Dirichlet allocation (LDA) 主题模型：隐含狄利克雷分布 (LDA)
Feature transformations: standardization, normalization, hashing,... 特征传播：标准、正常、哈希
..
Model evaluation and hyper-parameter tuning 模型评估和超参数整定
ML Pipeline construction 机器学习管道创建
ML persistence: saving and loading models and Pipelines 机器学习持久化：保持和载入模型和管道
Survival analysis: accelerated failure time model 生存分析：加速失效时间模型
Frequent itemset and sequential pattern mining: FP-growth, 频繁项集挖掘和序列模式挖掘技术： association rules, PrefixSpan FP-growth算法，关联规则
Distributed linear algebra: singular value decomposition (SVD), 分布式线性代数：奇异值分解 (SVD),
principal component analysis (PCA),... 主成分分析(PCA),...
Statistics: summary statistics, hypothesis testing,... 统计：汇总统计，假设检验,...