Recommenders - 推荐系统的最佳实践

编程乐园

于 2025-01-26 06:15:00 发布

阅读量1.4k

点赞数 34

文章标签： Recommenders 推荐系统算法模型评估 model 开源

本文链接：https://blog.csdn.net/lovechris00/article/details/145312854

版权

文章目录

在这里插入图片描述

一、关于 Recommenders

github : https://github.com/recommenders-team/recommenders
官网：https://recommenders-team.github.io/recommenders/intro.html
Slack

导言

推荐人的目标是帮助研究人员、开发人员和爱好者进行原型设计、试验并将一系列经典和最先进的推荐系统投入生产。

Recommenders 是人工智能和数据Linux基金会的一个项目。

该存储库包含构建推荐系统的示例和最佳实践，作为Jupyter笔记本提供。这些示例详细说明了我们在五个关键任务上的学习：

准备数据：为每个推荐算法准备和加载数据。
模型：使用各种经典和深度学习推荐算法构建模型，如交替最小二乘（ALS）或极限深度分解机（xDeepFM）。
评估：使用离线指标评估算法。
模型选择和优化：调整和优化推荐模型的超参数。
操作化：在Azure的正式生产环境中操作模型。

在 Recommenders 中提供了一些实用程序来支持常见任务，例如以不同算法预期的格式加载数据集、评估模型输出和拆分训练/测试数据。包括几个最先进算法的实现，以便在您自己的应用程序中进行自学和定制。请参阅推荐器留档。

有关存储库的更详细概述，请参阅wiki页面上的文档。

有关已应用推荐系统的一些实际场景，请参阅场景。

二、入门

我们推荐conda用于环境管理，VS Code用于开发。要安装推荐包并在Linux/WSL上运行示例笔记本：

# 1. Install gcc if it is not installed already. On Ubuntu, this could done by using the command
# sudo apt install gcc

# 2. Create and activate a new conda environment
conda create -n <environment_name> python=3.9
conda activate <environment_name>

# 3. Install the core recommenders package. It can run all the CPU notebooks.
pip install recommenders

# 4. create a Jupyter kernel
python -m ipykernel install --user --name <environment_name> --display-name <kernel_name>

# 5. Clone this repo within VSCode or using command line:
git clone https://github.com/recommenders-team/recommenders.git

# 6. Within VSCode:
#   a. Open a notebook, e.g., examples/00_quick_start/sar_movielens.ipynb;  
#   b. Select Jupyter kernel <kernel_name>;
#   c. Run the notebook.

有关在其他平台（例如Windows和macOS）和不同配置（例如GPU、Spark和实验功能）上设置的更多信息，请参阅设置指南。

除了核心包，还提供了几个附加功能，包括：

[gpu]：需要运行GPU模型。
[spark]：需要运行火花模型。
[dev]： repo的开发需要。
[all]：[gpu]|[spark]|[dev]
[experimental]：未经过彻底测试和/或可能需要额外安装步骤的模型。

三、算法

下表列出了存储库中当前可用的推荐算法。笔记本在示例列下链接为快速启动，展示易于运行的算法示例，或作为深度潜水，详细解释算法的数学和实现。

Algorithm	Type	Description	Example
Alternating Least Squares (ALS)	Collaborative Filtering	Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized for scalability and distributed computing capability. It works in the PySpark environment.	Quick start / Deep dive
Attentive Asynchronous Singular Value Decomposition (A2SVD)*	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism. It works in the CPU/GPU environment.	Quick start
Cornac/Bayesian Personalized Ranking (BPR)	Collaborative Filtering	Matrix factorization algorithm for predicting item ranking with implicit feedback. It works in the CPU environment.	Deep dive
Cornac/Bilateral Variational Autoencoder (BiVAE)	Collaborative Filtering	Generative model for dyadic data (e.g., user-item interactions). It works in the CPU/GPU environment.	Deep dive
Convolutional Sequence Embedding Recommendation (Caser)	Collaborative Filtering	Algorithm based on convolutions that aim to capture both user’s general preferences and sequential patterns. It works in the CPU/GPU environment.	Quick start
Deep Knowledge-Aware Network (DKN)*	Content-Based Filtering	Deep learning algorithm incorporating a knowledge graph and article embeddings for providing news or article recommendations. It works in the CPU/GPU environment.	Quick start / Deep dive
Extreme Deep Factorization Machine (xDeepFM)*	Collaborative Filtering	Deep learning based algorithm for implicit and explicit feedback with user/item features. It works in the CPU/GPU environment.	Quick start
FastAI Embedding Dot Bias (FAST)	Collaborative Filtering	General purpose algorithm with embeddings and biases for users and items. It works in the CPU/GPU environment.	Quick start
LightFM/Factorization Machine	Collaborative Filtering	Factorization Machine algorithm for both implicit and explicit feedbacks. It works in the CPU environment.	Quick start
LightGBM/Gradient Boosting Tree*	Content-Based Filtering	Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems. It works in the CPU/GPU/PySpark environments.	Quick start in CPU / Deep dive in PySpark
LightGCN	Collaborative Filtering	Deep learning algorithm which simplifies the design of GCN for predicting implicit feedback. It works in the CPU/GPU environment.	Deep dive
GeoIMC*	Collaborative Filtering	Matrix completion algorithm that takes into account user and item features using Riemannian conjugate gradient optimization and follows a geometric approach. It works in the CPU environment.	Quick start
GRU	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks. It works in the CPU/GPU environment.	Quick start
Multinomial VAE	Collaborative Filtering	Generative model for predicting user/item interactions. It works in the CPU/GPU environment.	Deep dive
Neural Recommendation with Long- and Short-term User Representations (LSTUR)*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with long- and short-term user interest modeling. It works in the CPU/GPU environment.	Quick start
Neural Recommendation with Attentive Multi-View Learning (NAML)*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with attentive multi-view learning. It works in the CPU/GPU environment.	Quick start
Neural Collaborative Filtering (NCF)	Collaborative Filtering	Deep learning algorithm with enhanced performance for user/item implicit feedback. It works in the CPU/GPU environment.	Quick start / Deep dive
Neural Recommendation with Personalized Attention (NPA)*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with personalized attention network. It works in the CPU/GPU environment.	Quick start
Neural Recommendation with Multi-Head Self-Attention (NRMS)*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with multi-head self-attention. It works in the CPU/GPU environment.	Quick start
Next Item Recommendation (NextItNet)	Collaborative Filtering	Algorithm based on dilated convolutions and residual network that aims to capture sequential patterns. It considers both user/item interactions and features. It works in the CPU/GPU environment.	Quick start
Restricted Boltzmann Machines (RBM)	Collaborative Filtering	Neural network based algorithm for learning the underlying probability distribution for explicit or implicit user/item feedback. It works in the CPU/GPU environment.	Quick start / Deep dive
Riemannian Low-rank Matrix Completion (RLRMC)*	Collaborative Filtering	Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption to predict user/item interactions. It works in the CPU environment.	Quick start
Simple Algorithm for Recommendation (SAR)*	Collaborative Filtering	Similarity-based algorithm for implicit user/item feedback. It works in the CPU environment.	Quick start / Deep dive
Self-Attentive Sequential Recommendation (SASRec)	Collaborative Filtering	Transformer based algorithm for sequential recommendation. It works in the CPU/GPU environment.	Quick start
Short-term and Long-term Preference Integrated Recommender (SLi-Rec)*	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller. It works in the CPU/GPU environment.	Quick start
Multi-Interest-Aware Sequential User Modeling (SUM)*	Collaborative Filtering	An enhanced memory network-based sequential user model which aims to capture users’ multiple interests. It works in the CPU/GPU environment.	Quick start
Sequential Recommendation Via Personalized Transformer (SSEPT)	Collaborative Filtering	Transformer based algorithm for sequential recommendation with User embedding. It works in the CPU/GPU environment.	Quick start
Standard VAE	Collaborative Filtering	Generative Model for predicting user/item interactions. It works in the CPU/GPU environment.	Deep dive
Surprise/Singular Value Decomposition (SVD)	Collaborative Filtering	Matrix factorization algorithm for predicting explicit rating feedback in small datasets. It works in the CPU/GPU environment.	Deep dive
Term Frequency - Inverse Document Frequency (TF-IDF)	Content-Based Filtering	Simple similarity-based algorithm for content-based recommendations with text datasets. It works in the CPU environment.	Quick start
Vowpal Wabbit (VW)*	Content-Based Filtering	Fast online learning algorithms, great for scenarios where user features / context are constantly changing. It uses the CPU for online learning.	Deep dive
Wide and Deep	Collaborative Filtering	Deep learning algorithm that can memorize feature interactions and generalize user features. It works in the CPU/GPU environment.	Quick start
xLearn/Factorization Machine (FM) & Field-Aware FM (FFM)	Collaborative Filtering	Quick and memory efficient algorithm to predict labels with user/item features. It works in the CPU/GPU environment.	Deep dive

注意：*表示Microsoft发明/贡献的算法。

独立或正在孵化的算法和实用程序是contrib文件夹的候选者。这将包含可能不容易放入核心存储库或需要时间重构或成熟代码并添加必要测试的贡献。

算法	类型	描述	示例
SARplus*	协同过滤	Spark SAR的优化实现	快速入门

算法比较

我们提供了一个基准笔记本来说明如何评估和比较不同的算法。在这个笔记本中，MoviLens数据集使用分层拆分以75/25的比例分成训练/测试集。使用下面的每个协同过滤算法训练推荐模型。我们使用文献中报告的经验参数值在这里。对于排名指标，我们使用k=10（前10个推荐项目）。我们在标准NC6s_v2上运行比较Azure DSVM（6个vCPU、112 GB内存和1个P100 GPU）。Spark ALS在本地独立模式下运行。在此表中，我们显示了Movielens100k的结果，运行算法15个时期。

算法	MAP	nDCG@k	Precision@k	Recall@k	RMSE	MAE	R2	解释方差
ALS	0.004732	0.044239	0.048462	0.017796	0.965038	0.753001	0.255647	0.251648
BiVAE	0.146126	0.475077	0.411771	0.219145	N/A	N/A	N/A	N/A
BPR	0.132478	0.441997	0.388229	0.212522	N/A	N/A	N/A	N/A
FastAI	0.025503	0.147866	0.130329	0.053824	0.943084	0.7443



SVD	0.012873	0.095930	0.091198	0.032783	0.938681	0.742690	0.291967	0.291971