最近一直在学习hadoop 一直没有梳理接触到的东西,常见算法分类:
推荐系统(推荐引擎):
基于用户的协同过滤算法UserCF 近邻算法,容易实现
基于物品的协同过滤算法ItemCF 速度快,容易实现分布式计算
SlopeOne算法 @Deprecated at mahout 0.8
KNN Linear interpolation item–based推荐算法 最近邻算法 @Deprecated at mahout 0.8
SVD推荐算法 奇异值分解, 需要降维, 大量预处理
Tree Cluster-based 推荐算法 树形聚类 大量预处理 @Deprecated at mahout 0.8
分类算法:
支持向量机(SVM)
逻辑回归(LR)
梯度下降法(SGD)
神经网络
随机森林(RF) ,天猫推荐算法大战中经常用到(RF + GBDT) 可并行 mapreduce
朴素贝叶斯(Naive Beyes),还有一种补充的贝叶斯算法 cbeyes,效果一般比beyes 要好, 可并行 mapreduce
聚类算法:
canopy clustering
kmeans clustering
层次聚类法
频繁模式挖掘
mahout(0.9) 最新版支持的常用算法
Latest release version 0.9 has
User and Item based recommenders
Matrix factorization based recommenders
K-Means, Fuzzy K-Means clustering
Latent Dirichlet Allocation
Singular Value Decomposition
Logistic regression classifier
(Complementary) Naive Bayes classifier
Random forest classifier
High performance java collections
A vibrant community
另外:注意 mahout 官网公告,mahout 已经不再支持新的算法了,请关注 最新的 spark。
原文:
Mahout News
25 April 2014 - Goodbye MapReduce
The Mahout community decided to move its codebase onto modern data processing systems that offer a richer programming model and more efficient execution than Hadoop MapReduce. Mahout will therefore reject new MapReduce algorithm implementations from now on. We will however keep our widely used MapReduce algorithms in the codebase and maintain them.
We are building our future implementations on top of a DSL for linear algebraic operations which has been developed over the last months. Programs written in this DSL are automatically optimized and executed in parallel on Apache Spark.
Furthermore, there is an experimental contribution undergoing which aims to integrate the h20 platform into Mahout.
转载于:https://blog.51cto.com/now51jq/1548312