Collaborative Filtering
- User-Based Collaborative Filtering - single machine
- Item-Based Collaborative Filtering - single machine / MapReduce 【可以用多种相似度】
- Matrix Factorization with Alternating Least Squares - single machine /MapReduce
- Matrix Factorization with Alternating Least Squares on Implicit Feedback- single machine / MapReduce
- Weighted Matrix Factorization, SVD++, Parallel SGD - single machine
Classification
- Logistic Regression - trained via SGD - single machine 【SGD: Stochastic gradient descent 随机梯度下降】
- Naive Bayes/ Complementary Naive Bayes - MapReduce 【】
- Random Forest - MapReduce
- Hidden Markov Models - single machine
- Multilayer Perceptron - single machine
Clustering
- Canopy Clustering - single machine / MapReduce (deprecated, will be removed once Streaming k-Means is stable enough)
- k-Means Clustering - single machine / MapReduce
- Fuzzy k-Means - single machine / MapReduce
- Streaming k-Means - single machine / MapReduce
- Spectral Clustering - MapReduce
Dimensionality Reduction
- Singular Value Decomposition - single machine
- Lanczos Algorithm - single machine / MapReduce
- Stochastic SVD - single machine /MapReduce / Spark
- Principal Component Analysis (via Stochastic SVD)- single machine / MapReduce
Topic Models
- Latent Dirichlet Allocation - single machine / MapReduce
Miscellaneous
- Frequent Pattern Mining - MapReduce
- RowSimilarityJob - compute pairwise similarities between the rows of a matrix -MapReduce
- ConcatMatrices - combine 2 matrices or vectors into a single matrix - MapReduce
- Collocations - find co-locations of tokens in text - MapReduce