Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
FUNCTION VECTORS IN LARGE LANGUAGE MODELS
主要研究将任务的形式转化为向量
Generative Representational Instruction Tuning
Contrastive Multiview Coding
Integrating text and image: Determining multimodal document intent in instagram posts
图文之间的关系以及意图
Large Language Model based Long-tail Query Rewriting in Taobao Search
利用离线测试的偏序关系进行对比学习。
Representation Learning with Large Language Models for Recommendation
problem: 1.Scalability issues in practical recommenders
2.Limitations stemming from text-only reliance
LLM rerank的缺陷:1. 幻觉2.长度限制导致无法捕捉全局用户协同信号3.速度慢
从理论解释:实际就是对比学习
文本存在的问题:1.缺失属性、2.噪声文本数据 例如评价数据;采用的方法是利用LLM生成profile
类似华为的工作,用语义信息蒸馏embedding
ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation
lifelong sequential behavior incomprehension problem
通过检索筛选用户看过的物品,避免无用信息
Collaborative Large Language Model for Recommender Systems
序列预测加正则化 id+semantic
Nomic Embed: Training a Reproducible Long Context Text Embedder
text embedder的工作,增强context length并且参数规模小
通过训练bert
- masked language modeling
- unsupervised contrastive pretraining
- supervised contrastive finetuning
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
通过修改图片让clip聚焦特定区域
Link-Context Learning for Multimodal LLMs
根据事例学习全新概念
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
most hallucinations are closely tied to the knowledge aggregation patterns manifested in the self-attention matrix, i.e., MLLMs tend to generate new tokens by focusing on a few summary tokens, but not all the previous tokens.
‘Aggregation pattern’ seems to be the nature of LLM. 浅层anchor token聚合信息,深层预测下一个词基于anchor token
‘Aggregation pattern’ leads to hallucination of current MLLMs.过于信任总结词而忽略具体图片信息。
通过解码的过程降低总结词的权重而增强具体图片信息的权重来减轻幻觉。
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
利用场景图训MLLMs解决关系幻觉。
Meta-Task Prompting Elicits Embedding from Large Language Models
In this paper, we empirically show that simply averaging different embedding derived from multiple meta-tasks can achieve
superior performance for both intrinsic and downstream evaluation benchmarks.
多任务示例 ensemble
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
一个全面的实验探讨
what are the key design decisions that influence VLM capabilities and downstream use?
为了探究关键设计首先提出了一个评估基准。全部的设计针对LLava 1.5
- optimization procedure:
第一阶段可以不要,直接训projection和LLM
冻住vit效果好。
视觉表征的选择:Clip类的图文对比的标准基本是最好的。