20230829笔记

文章探讨了如何通过大型语言模型(LLM)如BIGRec在推荐系统中增强排序和生成真实物品的能力。LLM在有限数据下表现优于传统方法,且对流行度偏见影响较小。研究还提到,利用LLM进行预训练推荐系统的潜力以及在处理不同类型任务和多样数据时的性能优化策略。
摘要由CSDN通过智能技术生成

A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems (20230816 arxiv)

在这里插入图片描述
motivation

  1. 现有的微调LLM的工作只能针对候选集有限的场景(ctr以及负采样),忽略了模型对于整体物品排序的能力。all-rank recommendation sorting。
    要想实现这个目标,模型应该具有三种能力:(1)模型必须高效(2)生成有意义的物品(3)生成推荐系统中真实存在的物品。

method
在这里插入图片描述

  1. Grounding Language Space to Recommendation Space. instruction-tuning and recommendation-specific instruction-tuning.
  2. Grounding Recommendation Space to Actual ItemsSpace.
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    Experiments
    在这里插入图片描述
    在这里插入图片描述
    conclusion:
  3. When training data is limited, the conventional sequential baselines (GRU4Rec, Caser, SASRec) exhibit significantly worse performance than BIGRec implemented with LLM. Furthermore, the improvement of BIGRec over these baselines in the top-ranked positions is much higher, indicating that BIGRec may tend to rank items of interest to users in higher positions.
  4. While GPT4Rec-LLaMA is also LLM-based, it exhibits poor performance compared to BIGRec. We attribute this to the fact that BM25, which can also be thought of as a method to ground the LLM outputs into actual items, is not suitable for the recommendation task. BM25 is a retrieval tool designed for document-level text.
  5. The improvement of BIGRec over conventional models is significantly higher for the Game dataset compared to the Movie dataset. This difference is possibly due to the varying properties of popularity bias between the two datasets. Conventional methods tend to capture popular bias, while BIGRec is less affected by popularity bias.
    在这里插入图片描述

conclusion:

  1. These findings highlight the practicality of utilizing LLMs for recommendation systems. Meanwhile, the results also suggest that
    there is potential to further enhance BIGRec’s performance by increasing the size of the training set and expanding the range of items it has encountered during grounding.
  2. BIGRec performs exceptionally well in scenarios where data is scarce and works effectively with less reliance on popularity bias.

在这里插入图片描述
在这里插入图片描述

  1. incorporating the collaborative information into both BIGRec and traditional models could bring model performance improvements;
  2. incorporating collaborative information into BIGRec yields a more significant enhancement compared to incorporating information into a different conventional model.
    在这里插入图片描述

Leveraging Large Language Models for Pre-trained Recommender Systems (20230821 arxiv)

LLM版本的P5
data phase

  1. Therefore, we sample behavioral sequences in long-term preferences (10%), medium-term preferences (30%), and short-term preferences (60%).

Training Phase
autoregressive blank infilling;
Mask Mechanism.
Positional Encoding.
在这里插入图片描述
Inference phase
dynamic position mechanism

Experiments
rating, sequential recommendation, explanation, review, and direct recommendation

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
real-world dataset (alipay)
在这里插入图片描述

LLMRec: Benchmarking Large Language Models on Recommendation Task (arxiv 20230823)

LLM版本的P5,但是任务是单独训练的。
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
Furthermore, ChatGLM, LLaMA, and Alpaca were unable to produce standard results following the prompt requirement, hence, were not directly applicable to accuracy-based tasks.
在这里插入图片描述
Comparision with P5
their performances are still far from stastified when compared with
previous methods like P5.
First, the compared LLM methods have
fewer amount of fine-tuning parameters. Take ChatGLM-6B
as an example, we use P-tuning V2 (Liu et al. 2022) method
to finetune the model, leading to 13% of total trainable parameters compared to P5-B, which greatly limits the model’s
potential to bridge the gap between pretrained NLP tasks
and finetuned recommendation tasks. Second, the amount
of training data. Due to the limited calculating resouce, we
could only finetune all compared LLM methods using one
type of prompt for 10 Epochs for each kind of recommendation task
Third, the diversity in training data. As aforementioned, there are different types of task samples using multiple kinds of prompt in P5’s training data, which greatly improves the diversity in training data and further strengthen the generization ability of the corresponding models. The
multi-task training strategy in P5’s setting could also improves the model’s performance to some extent, which is
also demonstrated in (Ruder 2017).

Large Language Models as Zero-Shot Conversational Recommenders (CIKM2023)

These findings reveal the unique importance of the superior
content/context knowledge in LLMs for CRS tasks, offering great
potential to LLMs as an effective approach in CRS.
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
Finding 1 - LLMs outperform fine-tuned CRS models in a zero-shot setting.
Finding 2 - GPT-based models achieve superior performance than open-sourced LLMs.
Finding 3 - LLMs may generate out-of-dataset item titles, but few hallucinated recommendations.
在这里插入图片描述

Finding 4 - LLMs mainly rely on content/context knowledge to make recommendations.

Finding 5 - GPT-based LLMs possess better content/context
knowledge than existing CRS.

在这里插入图片描述
Finding 6 - LLMs generally possess weaker collaborative knowledge than existing CRS.

在这里插入图片描述

Finding 7 - Reddit provides more content/context information than the other two CRS datasets.

Finding 8 - Collaborative information is insufficient for satisfactory recommendations, given the current models.

Finding 9 - Collaborative information can be dataset- or platform-dependent.

在这里插入图片描述

Finding 10 - LLM recommendations suffer from popularity bias in CRS.

在这里插入图片描述
Finding 11 - Recommendation performance of LLMs is sensitive to geographical regions.

Enhancing Recommender Systems with Large Language Model Reasoning Graphs (arxiv 20230821)

motivation

  1. 论文自称graph-based recommendation缺少复杂推理的能力,没法完全理解用户兴趣。
  2. LLM做推理能够有效理解用户的兴趣。

framework
在这里插入图片描述
在这里插入图片描述

  1. Chained Graph Reasoning.推理每个物品点击的原因,从而构建成图
  2. Divergent Extension. 扩充链
  3. Self-verification and Scoring.mask 确认
  4. Knowledge Base Self-improving. cache

Experiments
datasets:
amazon beauty; amazon clothing; movielens
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
利用LLM来构建原因图,将数据变成蕴含LLM知识的结构化数据,进而帮助模型获得更优质的图中知识。

RecMind: Large Language Model Powered Agent For Recommendation (arxiv 20230828)

motivation

  1. 现有的dnn-based model和pretrain language model的方法不能充分捕捉文本知识。
  2. 已有的结合LLM的RS主要依赖LLM本身,忽略了LLM能够操纵tool的能力。同时没有充分利用LLM的推理知识。

Architecture
在这里插入图片描述

  1. planning:
    在这里插入图片描述
  2. personalized memory(individualized user information) / world knowledge(item metadata; real time information from web search)
  3. tools (databae tool; search tool; text summarization tool)

experiments
following P5
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

Prompt Distillation for Efficient LLM-based Recommendation

不是真正的LLM;P5的小改进:1. 加了一个task continuous prompt。2.为了加速把一个batch中的数据都换成一个task的。

SGPT: GPT Sentence Embeddings for Semantic Search

motivation 现有的检索方法基本都是利用encoder-only的bert,然而scaling decoder-only的模型能够取得比encoder-only更强的能力。

在这里插入图片描述
Experiments
SGPT-CE
asymmetric search
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

conclusion:

  1. [44] bert-based cross-encoder is sota
  2. From the results, we infer that performance scales both as we re-rank more documents or increase model size.

SGPT-BE
加权平均,强调靠后的embedding
在这里插入图片描述
在这里插入图片描述
只更新bias parameter

在这里插入图片描述
conclusion

  1. we find that in the unsupervised setting, decoder transformers (GPT) strongly underperform encoders (BERT).
  2. after fine-tuning on the same dataset with the same hyperparameters, decoders (SGPT) with 125M parameters closely trail the 110M parameter encoder (SBERT) for the 12th layer.
  3. Weighted mean pooling outperforms mean and last token pooling for SGPT 125M.

在这里插入图片描述

在这里插入图片描述

Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations

motivation llm缺乏领域知识,而推荐没法进行多样的任务。结合两者的优势提出利用agent的思想来做推荐。

method
在这里插入图片描述
tool set:

  1. information query: This module can efficiently retrieve detailed item information from the backend item information database using Structured Query Language (SQL) expressions.
  2. Item retrieval: hard condition -> SQL; soft condition->embedding similarity
  3. Item ranking:

mechanism:

  1. candidate memory bus: store current item candidates, 不用把item粘在prompt的后面了。
  2. plan-first execution with dynamic demonstrations, plan, execution
  3. Reflection. 直接让模型再判断一次模型输出是否合理

expriments
在这里插入图片描述

在这里插入图片描述

Semantic-enhanced Contrastive Learning for Session-based Recommendation

CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning

利用根据caption构造的带有解释的QA对来微调模型,使得模型能够更好地理解图文信息,减少幻觉。

Controllable Natural Language Generation with Contrastive Prefixes

Explainable Information Retrieval: A Survey

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Task-aware Retrieval with Instructions

将instruction引入到检索任务中

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值