230801

Exploring the Upper Limits of Text-Based Collaborative Filtering Using Large Language Models: Discoveries and Insights

dataset
Due to memory issues for some E2E training experiments, we constructed interaction sequences for each user by selecting their latest 23 items. We remove users with less than 5 interactions, simply because we do not consider cold user settings. After the basic pre-processing, we randomly selected 200,000 users (and their interactions) from both MIND and HM datasets, and 50,000 users from Bili.

在这里插入图片描述

Q1: How does the recommender system’s performance respond to the continuous increase in the item encoder’s size? Is the performance limits attainable at the scale of hundreds of billions?
在这里插入图片描述
在这里插入图片描述
All LMs are frozen in this study.
(answer to Q1) the TCF model with a 175B parameter LM may not have reached its performance ceiling.

Q2: Can super-large LMs, such as GPT-3 with 175-billion parameters, generate universal text representations?
在这里插入图片描述

(answer to Q2) even the item representation learned by an extremely large LM (e.g., GPT-3) may not result in a universal representation, at least not for the text recommendation task.

Q3: Can recommender models with a 175-billion parameter LM as the item encoder easily beat the simplest ID embedding based models (IDCF), especially for warm item recommenda- tion?
在这里插入图片描述
在这里插入图片描述
This is a significant advancement, as no previous study has explicitly claimed that TCF by freezing a NLP encoder can attain performance comparable to its IDCF counterparts for warm or popular item recommendation.

The answer to Q3 is that, for text-centric recommendation, TCF with the SASRec backbone and utilizing a 175B-parameter frozen LM can achieve similar performance to standard IDCF, even for popular item recommendation. However, even by retraining a super-large LM item encoder, TCF with a DSSM7 backbone has little chance to compete with its corresponding IDCF. The simple IDCF still remains a highly competitive approach in the warm item recom- mendation setting.

Q4: How close is the TCF paradigm to a universal recommender model?

we first pre-train a SASRec-based TCF model with the 175B parameter frozen LM as item encoder in a large-scale text recommendation dataset.We then directly evaluate the pre-trained model in the testing set of MIND, HM and QB.
在这里插入图片描述
(answer to Q4) while TCF models with large LMs do exhibit a certain degree of transfer learning capability, they still fall significantly short of being a universal recommender model, as we had initially envisioned.

Q5: Will the classic TCF paradigm be replaced by a recent prompt engineering based rec- ommendation method that utilizes ChatGPT (called ChatGPT4Rec)?

We randomly selected 1024 users from the testing sets of MIND, HM, and Bili, and created two tasks for ChatGPT. In the first task (Task 1 in Table 6), ChatGPT was asked to select the most preferred item from four candidates (one ground truth and three randomly selected items), given the user’s historical interactions as a condition. The second task (Task 2 in Table 6) was to ask ChatGPT to rank the top-10 preferred items from 100 candidates (one ground truth and 99 randomly selected items, excluding all historical interactions), also provided with the user’s historical interactions as input.
在这里插入图片描述
the answer to Q5 is that based on its current performance and limitations, ChatGPT is unable to substitute the classical TCF paradigm.

Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)

在这里插入图片描述

VIP5: Towards Multimodal Foundation Models for Recommendation

P5的多模态版本
task

  1. sequential recommendation
  2. direct recommendation
  3. explanation
    在这里插入图片描述
    framework
    在这里插入图片描述
    experiment
    dataset
    在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models

在这里插入图片描述
we posit that instead of solely learning from narrowly defined data in the closed systems, recommender systems should be the open- world systems that can proactively acquire knowledge from the external world.

open-world knowledge for recommendation

  1. reasoning knowledge: inferred from the user behavior history enables a more comprehensive understanding of the users.
  2. factual knowledge: provides valuable common sense information about the candidate items and thereby improves the recommendation quality.

shortcomings of LLMs as recommenders.

  1. Predictive accuracy: LLMs is generally outperformed by classical recommenders
  2. Inference latency.
  3. Compositional gap: Requiring direct recommendation results from LLMs is currently beyond their capability and cannot fully exploit the open-world knowledge encoded in LLMs

framework
在这里插入图片描述

Knowledge Reasoning and Generation

two challenges:

  1. compositional gap: User’s clicks on items are motivated by multiple key aspects and user’s interests are diverse and multifaceted, which involve multiple reasoning steps.
  2. the generated factual knowledge may be correct but useless, as it may not align with the inferred user preferences.

factorization prompting
在这里插入图片描述

  1. preference reasoning prompt
  2. Item factual prompt

Knowledge Adaptation

new challenges:

  1. The knowledge generated by LLMs is usually in the form of text, which cannot be directly leveraged by traditional RSs that typically process categorical features.
  2. Even if some LLMs are open-sourced, the decoded outputs are usually large dense vectors (e.g., 4096 for each token) and lie in a semantic space that differs significantly from the recommendation space.
  3. The generated knowledge may contain noise or unreliable information

Experiments
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值