一、关于 Recommenders
- github : https://github.com/recommenders-team/recommenders
- 官网:https://recommenders-team.github.io/recommenders/intro.html
- Slack
导言
推荐人的目标是帮助研究人员、开发人员和爱好者进行原型设计、试验并将一系列经典和最先进的推荐系统投入生产。
Recommenders 是人工智能和数据Linux基金会的一个项目。
该存储库包含构建推荐系统的示例和最佳实践,作为Jupyter笔记本提供。这些示例详细说明了我们在五个关键任务上的学习:
- 准备数据:为每个推荐算法准备和加载数据。
- 模型:使用各种经典和深度学习推荐算法构建模型,如交替最小二乘(ALS)或极限深度分解机(xDeepFM)。
- 评估:使用离线指标评估算法。
- 模型选择和优化:调整和优化推荐模型的超参数。
- 操作化:在Azure的正式生产环境中操作模型。
在 Recommenders 中提供了一些实用程序 来支持常见任务,例如以不同算法预期的格式加载数据集、评估模型输出和拆分训练/测试数据。包括几个最先进算法的实现,以便在您自己的应用程序中进行自学和定制。请参阅推荐器留档。
有关存储库的更详细概述,请参阅wiki页面上的文档。
有关已应用推荐系统的一些实际场景,请参阅场景。
二、入门
我们推荐conda用于环境管理,VS Code用于开发。要安装推荐包并在Linux/WSL上运行示例笔记本:
# 1. Install gcc if it is not installed already. On Ubuntu, this could done by using the command
# sudo apt install gcc
# 2. Create and activate a new conda environment
conda create -n <environment_name> python=3.9
conda activate <environment_name>
# 3. Install the core recommenders package. It can run all the CPU notebooks.
pip install recommenders
# 4. create a Jupyter kernel
python -m ipykernel install --user --name <environment_name> --display-name <kernel_name>
# 5. Clone this repo within VSCode or using command line:
git clone https://github.com/recommenders-team/recommenders.git
# 6. Within VSCode:
# a. Open a notebook, e.g., examples/00_quick_start/sar_movielens.ipynb;
# b. Select Jupyter kernel <kernel_name>;
# c. Run the notebook.
有关在其他平台(例如Windows和macOS)和不同配置(例如GPU、Spark和实验功能)上设置的更多信息,请参阅设置指南。
除了核心包,还提供了几个附加功能,包括:
[gpu]
:需要运行GPU模型。[spark]
:需要运行火花模型。[dev]
: repo的开发需要。[all]
:[gpu]
|[spark]
|[dev]
[experimental]
:未经过彻底测试和/或可能需要额外安装步骤的模型。
三、算法
下表列出了存储库中当前可用的推荐算法。笔记本在示例列下链接为快速启动,展示易于运行的算法示例,或作为深度潜水,详细解释算法的数学和实现。
Algorithm | Type | Description | Example |
---|---|---|---|
Alternating Least Squares (ALS) | Collaborative Filtering | Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized for scalability and distributed computing capability. It works in the PySpark environment. | Quick start / Deep dive |
Attentive Asynchronous Singular Value Decomposition (A2SVD)* | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism. It works in the CPU/GPU environment. | Quick start |
Cornac/Bayesian Personalized Ranking (BPR) | Collaborative Filtering | Matrix factorization algorithm for predicting item ranking with implicit feedback. It works in the CPU environment. | Deep dive |
Cornac/Bilateral Variational Autoencoder (BiVAE) | Collaborative Filtering | Generative model for dyadic data (e.g., user-item interactions). It works in the CPU/GPU environment. | Deep dive |
Convolutional Sequence Embedding Recommendation (Caser) | Collaborative Filtering | Algorithm based on convolutions that aim to capture both user’s general preferences and sequential patterns. It works in the CPU/GPU environment. | Quick start |
Deep Knowledge-Aware Network (DKN)* | Content-Based Filtering | Deep learning algorithm incorporating a knowledge graph and article embeddings for providing news or article recommendations. It works in the CPU/GPU environment. | Quick start / Deep dive |
Extreme Deep Factorization Machine (xDeepFM)* | Collaborative Filtering | Deep learning based algorithm for implicit and explicit feedback with user/item features. It works in the CPU/GPU environment. | Quick start |
FastAI Embedding Dot Bias (FAST) | Collaborative Filtering | General purpose algorithm with embeddings and biases for users and items. It works in the CPU/GPU environment. | Quick start |
LightFM/Factorization Machine | Collaborative Filtering | Factorization Machine algorithm for both implicit and explicit feedbacks. It works in the CPU environment. | Quick start |
LightGBM/Gradient Boosting Tree* | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems. It works in the CPU/GPU/PySpark environments. | Quick start in CPU / Deep dive in PySpark |
LightGCN | Collaborative Filtering | Deep learning algorithm which simplifies the design of GCN for predicting implicit feedback. It works in the CPU/GPU environment. | Deep dive |
GeoIMC* | Collaborative Filtering | Matrix completion algorithm that takes into account user and item features using Riemannian conjugate gradient optimization and follows a geometric approach. It works in the CPU environment. | Quick start |
GRU | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks. It works in the CPU/GPU environment. | Quick start |
Multinomial VAE | Collaborative Filtering | Generative model for predicting user/item interactions. It works in the CPU/GPU environment. | Deep dive |
Neural Recommendation with Long- and Short-term User Representations (LSTUR)* | Content-Based Filtering | Neural recommendation algorithm for recommending news articles with long- and short-term user interest modeling. It works in the CPU/GPU environment. | Quick start |
Neural Recommendation with Attentive Multi-View Learning (NAML)* | Content-Based Filtering | Neural recommendation algorithm for recommending news articles with attentive multi-view learning. It works in the CPU/GPU environment. | Quick start |
Neural Collaborative Filtering (NCF) | Collaborative Filtering | Deep learning algorithm with enhanced performance for user/item implicit feedback. It works in the CPU/GPU environment. | Quick start / Deep dive |
Neural Recommendation with Personalized Attention (NPA)* | Content-Based Filtering | Neural recommendation algorithm for recommending news articles with personalized attention network. It works in the CPU/GPU environment. | Quick start |
Neural Recommendation with Multi-Head Self-Attention (NRMS)* | Content-Based Filtering | Neural recommendation algorithm for recommending news articles with multi-head self-attention. It works in the CPU/GPU environment. | Quick start |
Next Item Recommendation (NextItNet) | Collaborative Filtering | Algorithm based on dilated convolutions and residual network that aims to capture sequential patterns. It considers both user/item interactions and features. It works in the CPU/GPU environment. | Quick start |
Restricted Boltzmann Machines (RBM) | Collaborative Filtering | Neural network based algorithm for learning the underlying probability distribution for explicit or implicit user/item feedback. It works in the CPU/GPU environment. | Quick start / Deep dive |
Riemannian Low-rank Matrix Completion (RLRMC)* | Collaborative Filtering | Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption to predict user/item interactions. It works in the CPU environment. | Quick start |
Simple Algorithm for Recommendation (SAR)* | Collaborative Filtering | Similarity-based algorithm for implicit user/item feedback. It works in the CPU environment. | Quick start / Deep dive |
Self-Attentive Sequential Recommendation (SASRec) | Collaborative Filtering | Transformer based algorithm for sequential recommendation. It works in the CPU/GPU environment. | Quick start |
Short-term and Long-term Preference Integrated Recommender (SLi-Rec)* | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller. It works in the CPU/GPU environment. | Quick start |
Multi-Interest-Aware Sequential User Modeling (SUM)* | Collaborative Filtering | An enhanced memory network-based sequential user model which aims to capture users’ multiple interests. It works in the CPU/GPU environment. | Quick start |
Sequential Recommendation Via Personalized Transformer (SSEPT) | Collaborative Filtering | Transformer based algorithm for sequential recommendation with User embedding. It works in the CPU/GPU environment. | Quick start |
Standard VAE | Collaborative Filtering | Generative Model for predicting user/item interactions. It works in the CPU/GPU environment. | Deep dive |
Surprise/Singular Value Decomposition (SVD) | Collaborative Filtering | Matrix factorization algorithm for predicting explicit rating feedback in small datasets. It works in the CPU/GPU environment. | Deep dive |
Term Frequency - Inverse Document Frequency (TF-IDF) | Content-Based Filtering | Simple similarity-based algorithm for content-based recommendations with text datasets. It works in the CPU environment. | Quick start |
Vowpal Wabbit (VW)* | Content-Based Filtering | Fast online learning algorithms, great for scenarios where user features / context are constantly changing. It uses the CPU for online learning. | Deep dive |
Wide and Deep | Collaborative Filtering | Deep learning algorithm that can memorize feature interactions and generalize user features. It works in the CPU/GPU environment. | Quick start |
xLearn/Factorization Machine (FM) & Field-Aware FM (FFM) | Collaborative Filtering | Quick and memory efficient algorithm to predict labels with user/item features. It works in the CPU/GPU environment. | Deep dive |
注意:*表示Microsoft发明/贡献的算法。
独立或正在孵化的算法和实用程序是contrib文件夹的候选者。这将包含可能不容易放入核心存储库或需要时间重构或成熟代码并添加必要测试的贡献。
算法 | 类型 | 描述 | 示例 |
---|---|---|---|
SARplus* | 协同过滤 | Spark SAR的优化实现 | 快速入门 |
算法比较
我们提供了一个基准笔记本来说明如何评估和比较不同的算法。在这个笔记本中,MoviLens数据集使用分层拆分以75/25的比例分成训练/测试集。使用下面的每个协同过滤算法训练推荐模型。我们使用文献中报告的经验参数值在这里。对于排名指标,我们使用k=10
(前10个推荐项目)。我们在标准NC6s_v2上运行比较Azure DSVM(6个vCPU、112 GB内存和1个P100 GPU)。Spark ALS在本地独立模式下运行。在此表中,我们显示了Movielens100k的结果,运行算法15个时期。
算法 | MAP | nDCG@k | Precision@k | Recall@k | RMSE | MAE | R2 | 解释方差 |
---|---|---|---|---|---|---|---|---|
ALS | 0.004732 | 0.044239 | 0.048462 | 0.017796 | 0.965038 | 0.753001 | 0.255647 | 0.251648 |
BiVAE | 0.146126 | 0.475077 | 0.411771 | 0.219145 | N/A | N/A | N/A | N/A |
BPR | 0.132478 | 0.441997 | 0.388229 | 0.212522 | N/A | N/A | N/A | N/A |
FastAI | 0.025503 | 0.147866 | 0.130329 | 0.053824 | 0.943084 | 0.7443 | ||
SVD | 0.012873 | 0.095930 | 0.091198 | 0.032783 | 0.938681 | 0.742690 | 0.291967 | 0.291971 |
2025-01-23(四)