A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems (20230816 arxiv)
motivation
- 现有的微调LLM的工作只能针对候选集有限的场景(ctr以及负采样),忽略了模型对于整体物品排序的能力。all-rank recommendation sorting。
要想实现这个目标,模型应该具有三种能力:(1)模型必须高效(2)生成有意义的物品(3)生成推荐系统中真实存在的物品。
method
- Grounding Language Space to Recommendation Space. instruction-tuning and recommendation-specific instruction-tuning.
- Grounding Recommendation Space to Actual ItemsSpace.
Experiments
conclusion: - When training data is limited, the conventional sequential baselines (GRU4Rec, Caser, SASRec) exhibit significantly worse performance than BIGRec implemented with LLM. Furthermore, the improvement of BIGRec over these baselines in the top-ranked positions is much higher, indicating that BIGRec may tend to rank items of interest to users in higher positions.
- While GPT4Rec-LLaMA is also LLM-based, it exhibits poor performance compared to BIGRec. We attribute this to the fact that BM25, which can also be thought of as a method to ground the LLM outputs into actual items, is not suitable for the recommendation task. BM25 is a retrieval tool designed for document-level text.
- The improvement of BIGRec over conventional models is significantly higher for the Game dataset compared to the Movie dataset. This difference is possibly due to the varying properties of popularity bias between the two datasets. Conventional methods tend to capture popular bias, while BIGRec is less affected by popularity bias.
conclusion:
- These findings highlight the practicality of utilizing LLMs for recommendation systems. Meanwhile, the results also suggest that
there is potential to further enhance BIGRec’s performance by increasing the size of the training set and expanding the range of items it has encountered during grounding. - BIGRec performs exceptionally well in scenarios where data is scarce and works effectively with less reliance on popularity bias.
- incorporating the collaborative information into both BIGRec and traditional models could bring model performance improvements;
- incorporating collaborative information into BIGRec yields a more significant enhancement compared to incorporating information into a different conventional model.
Leveraging Large Language Models for Pre-trained Recommender Systems (20230821 arxiv)
LLM版本的P5
data phase
- Therefore, we sample behavioral sequences in long-term preferences (10%), medium-term preferences (30%), and short-term preferences (60%).
Training Phase
autoregressive blank infilling;
Mask Mechanism.
Positional Encoding.
Inference phase
dynamic position mechanism
Experiments
rating, sequential recommendation, explanation, review, and direct recommendation
real-world dataset (alipay)
LLMRec: Benchmarking Large Language Models on Recommendation Task (arxiv 20230823)
LLM版本的P5,但是任务是单独训练的。
Furthermore, ChatGLM, LLaMA, and Alpaca were unable to produce standard results following the prompt requirement, hence, were not directly applicable to accuracy-based tasks.
Comparision with P5
their performances are still far from stastified when compared with
previous methods like P5.
First, the compared LLM methods have
fewer amount of fine-tuning parameters. Take ChatGLM-6B
as an example, we use P-tuning V2 (Liu et al. 2022) method
to finetune the model, leading to 13% of total trainable parameters compared to P5-B, which greatly limits the model’s
potential to bridge the gap between pretrained NLP tasks
and finetuned recommendation tasks. Second, the amount
of training data. Due to the limited calculating resouce, we
could only finetune all compared LLM methods using one
type of prompt for 10 Epochs for each kind of recommendation task
Third, the diversity in training data. As aforementioned, there are different types of task samples using multiple kinds of prompt in P5’s training data, which greatly improves the diversity in training data and further strengthen the generization ability of the corresponding models. The
multi-task training strategy in P5’s setting could also improves the model’s performance to some extent, which is
also demonstrated in (Ruder 2017).
Large Language Models as Zero-Shot Conversational Recommenders (CIKM2023)
These findings reveal the unique importance of the superior
content/context knowledge in LLMs for CRS tasks, offering great
potential to LLMs as an effective approach in CRS.
Finding 1 - LLMs outperform fine-tuned CRS models in a zero-shot setting.
Finding 2 - GPT-based models achieve superior performance than open-sourced LLMs.
Finding 3 - LLMs may generate out-of-dataset item titles, but few hallucinated recommendations.
Finding 4 - LLMs mainly rely on content/context knowledge to make recommendations.
Finding 5 - GPT-based LLMs possess better content/context
knowledge than existing CRS.
Finding 6 - LLMs generally possess weaker collaborative knowledge than existing CRS.
Finding 7 - Reddit provides more content/context information than the other two CRS datasets.
Finding 8 - Collaborative information is insufficient for satisfactory recommendations, given the current models.
Finding 9 - Collaborative information can be dataset- or platform-dependent.
Finding 10 - LLM recommendations suffer from popularity bias in CRS.
Finding 11 - Recommendation performance of LLMs is sensitive to geographical regions.
Enhancing Recommender Systems with Large Language Model Reasoning Graphs (arxiv 20230821)
motivation
- 论文自称graph-based recommendation缺少复杂推理的能力,没法完全理解用户兴趣。
- LLM做推理能够有效理解用户的兴趣。
framework
- Chained Graph Reasoning.推理每个物品点击的原因,从而构建成图
- Divergent Extension. 扩充链
- Self-verification and Scoring.mask 确认
- Knowledge Base Self-improving. cache
Experiments
datasets:
amazon beauty; amazon clothing; movielens
利用LLM来构建原因图,将数据变成蕴含LLM知识的结构化数据,进而帮助模型获得更优质的图中知识。
RecMind: Large Language Model Powered Agent For Recommendation (arxiv 20230828)
motivation
- 现有的dnn-based model和pretrain language model的方法不能充分捕捉文本知识。
- 已有的结合LLM的RS主要依赖LLM本身,忽略了LLM能够操纵tool的能力。同时没有充分利用LLM的推理知识。
Architecture
- planning:
- personalized memory(individualized user information) / world knowledge(item metadata; real time information from web search)
- tools (databae tool; search tool; text summarization tool)
experiments
following P5
ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation
Prompt Distillation for Efficient LLM-based Recommendation
不是真正的LLM;P5的小改进:1. 加了一个task continuous prompt。2.为了加速把一个batch中的数据都换成一个task的。
SGPT: GPT Sentence Embeddings for Semantic Search
motivation 现有的检索方法基本都是利用encoder-only的bert,然而scaling decoder-only的模型能够取得比encoder-only更强的能力。
Experiments
SGPT-CE
asymmetric search
conclusion:
- [44] bert-based cross-encoder is sota
- From the results, we infer that performance scales both as we re-rank more documents or increase model size.
SGPT-BE
加权平均,强调靠后的embedding
只更新bias parameter
conclusion
- we find that in the unsupervised setting, decoder transformers (GPT) strongly underperform encoders (BERT).
- after fine-tuning on the same dataset with the same hyperparameters, decoders (SGPT) with 125M parameters closely trail the 110M parameter encoder (SBERT) for the 12th layer.
- Weighted mean pooling outperforms mean and last token pooling for SGPT 125M.
Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
motivation llm缺乏领域知识,而推荐没法进行多样的任务。结合两者的优势提出利用agent的思想来做推荐。
method
tool set:
- information query: This module can efficiently retrieve detailed item information from the backend item information database using Structured Query Language (SQL) expressions.
- Item retrieval: hard condition -> SQL; soft condition->embedding similarity
- Item ranking:
mechanism:
- candidate memory bus: store current item candidates, 不用把item粘在prompt的后面了。
- plan-first execution with dynamic demonstrations, plan, execution
- Reflection. 直接让模型再判断一次模型输出是否合理
expriments
Semantic-enhanced Contrastive Learning for Session-based Recommendation
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
利用根据caption构造的带有解释的QA对来微调模型,使得模型能够更好地理解图文信息,减少幻觉。
Controllable Natural Language Generation with Contrastive Prefixes
Explainable Information Retrieval: A Survey
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Task-aware Retrieval with Instructions
将instruction引入到检索任务中