Recommenders - 推荐系统的最佳实践


在这里插入图片描述


一、关于 Recommenders


导言

推荐人的目标是帮助研究人员、开发人员和爱好者进行原型设计、试验并将一系列经典和最先进的推荐系统投入生产。

Recommenders 是人工智能和数据Linux基金会的一个项目。

该存储库包含构建推荐系统的示例和最佳实践,作为Jupyter笔记本提供。这些示例详细说明了我们在五个关键任务上的学习:

  • 准备数据:为每个推荐算法准备和加载数据。
  • 模型:使用各种经典和深度学习推荐算法构建模型,如交替最小二乘(ALS)或极限深度分解机(xDeepFM)。
  • 评估:使用离线指标评估算法。
  • 模型选择和优化:调整和优化推荐模型的超参数。
  • 操作化:在Azure的正式生产环境中操作模型。

Recommenders 中提供了一些实用程序 来支持常见任务,例如以不同算法预期的格式加载数据集、评估模型输出和拆分训练/测试数据。包括几个最先进算法的实现,以便在您自己的应用程序中进行自学和定制。请参阅推荐器留档

有关存储库的更详细概述,请参阅wiki页面上的文档。

有关已应用推荐系统的一些实际场景,请参阅场景


二、入门

我们推荐conda用于环境管理,VS Code用于开发。要安装推荐包并在Linux/WSL上运行示例笔记本:

# 1. Install gcc if it is not installed already. On Ubuntu, this could done by using the command
# sudo apt install gcc

# 2. Create and activate a new conda environment
conda create -n <environment_name> python=3.9
conda activate <environment_name>

# 3. Install the core recommenders package. It can run all the CPU notebooks.
pip install recommenders

# 4. create a Jupyter kernel
python -m ipykernel install --user --name <environment_name> --display-name <kernel_name>

# 5. Clone this repo within VSCode or using command line:
git clone https://github.com/recommenders-team/recommenders.git

# 6. Within VSCode:
#   a. Open a notebook, e.g., examples/00_quick_start/sar_movielens.ipynb;  
#   b. Select Jupyter kernel <kernel_name>;
#   c. Run the notebook.

有关在其他平台(例如Windows和macOS)和不同配置(例如GPU、Spark和实验功能)上设置的更多信息,请参阅设置指南

除了核心包,还提供了几个附加功能,包括:

  • [gpu]:需要运行GPU模型。
  • [spark]:需要运行火花模型。
  • [dev]: repo的开发需要。
  • [all][gpu]|[spark]|[dev]
  • [experimental]:未经过彻底测试和/或可能需要额外安装步骤的模型。

三、算法

下表列出了存储库中当前可用的推荐算法。笔记本在示例列下链接为快速启动,展示易于运行的算法示例,或作为深度潜水,详细解释算法的数学和实现。

AlgorithmTypeDescriptionExample
Alternating Least Squares (ALS)Collaborative FilteringMatrix factorization algorithm for explicit or implicit feedback in large datasets, optimized for scalability and distributed computing capability. It works in the PySpark environment.Quick start / Deep dive
Attentive Asynchronous Singular Value Decomposition (A2SVD)*Collaborative FilteringSequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism. It works in the CPU/GPU environment.Quick start
Cornac/Bayesian Personalized Ranking (BPR)Collaborative FilteringMatrix factorization algorithm for predicting item ranking with implicit feedback. It works in the CPU environment.Deep dive
Cornac/Bilateral Variational Autoencoder (BiVAE)Collaborative FilteringGenerative model for dyadic data (e.g., user-item interactions). It works in the CPU/GPU environment.Deep dive
Convolutional Sequence Embedding Recommendation (Caser)Collaborative FilteringAlgorithm based on convolutions that aim to capture both user’s general preferences and sequential patterns. It works in the CPU/GPU environment.Quick start
Deep Knowledge-Aware Network (DKN)*Content-Based FilteringDeep learning algorithm incorporating a knowledge graph and article embeddings for providing news or article recommendations. It works in the CPU/GPU environment.Quick start / Deep dive
Extreme Deep Factorization Machine (xDeepFM)*Collaborative FilteringDeep learning based algorithm for implicit and explicit feedback with user/item features. It works in the CPU/GPU environment.Quick start
FastAI Embedding Dot Bias (FAST)Collaborative FilteringGeneral purpose algorithm with embeddings and biases for users and items. It works in the CPU/GPU environment.Quick start
LightFM/Factorization MachineCollaborative FilteringFactorization Machine algorithm for both implicit and explicit feedbacks. It works in the CPU environment.Quick start
LightGBM/Gradient Boosting Tree*Content-Based FilteringGradient Boosting Tree algorithm for fast training and low memory usage in content-based problems. It works in the CPU/GPU/PySpark environments.Quick start in CPU / Deep dive in PySpark
LightGCNCollaborative FilteringDeep learning algorithm which simplifies the design of GCN for predicting implicit feedback. It works in the CPU/GPU environment.Deep dive
GeoIMC*Collaborative FilteringMatrix completion algorithm that takes into account user and item features using Riemannian conjugate gradient optimization and follows a geometric approach. It works in the CPU environment.Quick start
GRUCollaborative FilteringSequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks. It works in the CPU/GPU environment.Quick start
Multinomial VAECollaborative FilteringGenerative model for predicting user/item interactions. It works in the CPU/GPU environment.Deep dive
Neural Recommendation with Long- and Short-term User Representations (LSTUR)*Content-Based FilteringNeural recommendation algorithm for recommending news articles with long- and short-term user interest modeling. It works in the CPU/GPU environment.Quick start
Neural Recommendation with Attentive Multi-View Learning (NAML)*Content-Based FilteringNeural recommendation algorithm for recommending news articles with attentive multi-view learning. It works in the CPU/GPU environment.Quick start
Neural Collaborative Filtering (NCF)Collaborative FilteringDeep learning algorithm with enhanced performance for user/item implicit feedback. It works in the CPU/GPU environment.Quick start / Deep dive
Neural Recommendation with Personalized Attention (NPA)*Content-Based FilteringNeural recommendation algorithm for recommending news articles with personalized attention network. It works in the CPU/GPU environment.Quick start
Neural Recommendation with Multi-Head Self-Attention (NRMS)*Content-Based FilteringNeural recommendation algorithm for recommending news articles with multi-head self-attention. It works in the CPU/GPU environment.Quick start
Next Item Recommendation (NextItNet)Collaborative FilteringAlgorithm based on dilated convolutions and residual network that aims to capture sequential patterns. It considers both user/item interactions and features. It works in the CPU/GPU environment.Quick start
Restricted Boltzmann Machines (RBM)Collaborative FilteringNeural network based algorithm for learning the underlying probability distribution for explicit or implicit user/item feedback. It works in the CPU/GPU environment.Quick start / Deep dive
Riemannian Low-rank Matrix Completion (RLRMC)*Collaborative FilteringMatrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption to predict user/item interactions. It works in the CPU environment.Quick start
Simple Algorithm for Recommendation (SAR)*Collaborative FilteringSimilarity-based algorithm for implicit user/item feedback. It works in the CPU environment.Quick start / Deep dive
Self-Attentive Sequential Recommendation (SASRec)Collaborative FilteringTransformer based algorithm for sequential recommendation. It works in the CPU/GPU environment.Quick start
Short-term and Long-term Preference Integrated Recommender (SLi-Rec)*Collaborative FilteringSequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller. It works in the CPU/GPU environment.Quick start
Multi-Interest-Aware Sequential User Modeling (SUM)*Collaborative FilteringAn enhanced memory network-based sequential user model which aims to capture users’ multiple interests. It works in the CPU/GPU environment.Quick start
Sequential Recommendation Via Personalized Transformer (SSEPT)Collaborative FilteringTransformer based algorithm for sequential recommendation with User embedding. It works in the CPU/GPU environment.Quick start
Standard VAECollaborative FilteringGenerative Model for predicting user/item interactions. It works in the CPU/GPU environment.Deep dive
Surprise/Singular Value Decomposition (SVD)Collaborative FilteringMatrix factorization algorithm for predicting explicit rating feedback in small datasets. It works in the CPU/GPU environment.Deep dive
Term Frequency - Inverse Document Frequency (TF-IDF)Content-Based FilteringSimple similarity-based algorithm for content-based recommendations with text datasets. It works in the CPU environment.Quick start
Vowpal Wabbit (VW)*Content-Based FilteringFast online learning algorithms, great for scenarios where user features / context are constantly changing. It uses the CPU for online learning.Deep dive
Wide and DeepCollaborative FilteringDeep learning algorithm that can memorize feature interactions and generalize user features. It works in the CPU/GPU environment.Quick start
xLearn/Factorization Machine (FM) & Field-Aware FM (FFM)Collaborative FilteringQuick and memory efficient algorithm to predict labels with user/item features. It works in the CPU/GPU environment.Deep dive

注意:*表示Microsoft发明/贡献的算法。

独立或正在孵化的算法和实用程序是contrib文件夹的候选者。这将包含可能不容易放入核心存储库或需要时间重构或成熟代码并添加必要测试的贡献。

算法类型描述示例
SARplus*协同过滤Spark SAR的优化实现快速入门

算法比较

我们提供了一个基准笔记本来说明如何评估和比较不同的算法。在这个笔记本中,MoviLens数据集使用分层拆分以75/25的比例分成训练/测试集。使用下面的每个协同过滤算法训练推荐模型。我们使用文献中报告的经验参数值在这里。对于排名指标,我们使用k=10(前10个推荐项目)。我们在标准NC6s_v2上运行比较Azure DSVM(6个vCPU、112 GB内存和1个P100 GPU)。Spark ALS在本地独立模式下运行。在此表中,我们显示了Movielens100k的结果,运行算法15个时期。

算法MAPnDCG@kPrecision@kRecall@kRMSEMAER2解释方差
ALS0.0047320.0442390.0484620.0177960.9650380.7530010.2556470.251648
BiVAE0.1461260.4750770.4117710.219145N/AN/AN/AN/A
BPR0.1324780.4419970.3882290.212522N/AN/AN/AN/A
FastAI0.0255030.1478660.1303290.0538240.9430840.7443
SVD0.0128730.0959300.0911980.0327830.9386810.7426900.2919670.291971

2025-01-23(四)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

编程乐园

请我喝杯伯爵奶茶~!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值