Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

最新推荐文章于 2024-08-09 22:24:25 发布

weixin_45154287

最新推荐文章于 2024-08-09 22:24:25 发布

阅读量214

点赞数

文章标签：语言模型人工智能自然语言处理

本文链接：https://blog.csdn.net/weixin_45154287/article/details/133696769

版权

Abstract

1. 早期研究：

retrieve required knowledge from explicit knowledge bases (KBs), which often introduces irrelevant information to the question（从显式知识库中检索需要的知识，引入了不相关的信息）；

implicit knowledge engine to acquire the necessary knowledge for answering（隐式知识）。

2. 本文框架：

Prophet—a concep tually simple framework

3. 具体做法：

（1）first train a vanilla VQA model on a specific knowledge-based VQA dataset without external knowledge（首先在特定基于知识的VQA数据集上训练一个VQA模型，不引入外部知识）；

（2）extract two types of complementary answer heuristics from the model: answer candidates and answer-aware examples（提取两种答案启发）；

（3）the two types of answer heuristics are encoded into the prompts to enable GPT-3 to better

comprehend the task（两种答案启发作为prompt来使得GPT-3更好地理解这个任务）。

1. Introduction

1. 早期工作：

（1）Early knowledge-based VQA benchmarks additionally provide structured knowledge bases (KBs) and annotate required knowledge facts for all the questions（提供基于结构化知识库的数据，为所有问题标注需要的知识）；

（2） retrieve knowledge entries from explicit KBs（从显式知识库中检索知识，例如Wikipedia and ConceptNet）：

局限性：

① the required knowledge may not be successfully retrieved from the KBs（需要的知识可能没有检索到）;

② plenty of irrelevant knowledge is inevitably introduced（不相关的知识可能会引入）。

（3）pretrained large language models, e.g., GPT-3 [ 3], as implicit knowledge engines for knowledge acquisition（预训练大语言模型，隐式知识的获取）：

局限性：

① The generated captions cannot cover all the necessary information in the image（生成的caption并不能包括图像中所有必要的信息）；

② GPT-3 employs a few-shot learning paradigm that requires a few in-context examples to adapt to new tasks.

2. 本文方法：

（1）引入了两种答案启发：answer candidates（answer candidates refer to a list of promising answers to the testing input, where each answer is associated with a confidence score，一系列可能答案，每个答案都有一个置信度得分） and answer-aware examples（answer-aware examples refer to a list of in-context examples, where each example has a similar answer to the testing input）

2. Related Work

Visual Question Answering (VQA)

Knowledge-based VQA

In-context learning

3. The Prophet Framework

3.1 Preliminaries

GPT-3： autoregressive language model（是一个自回归语言模型，用大量的语料库进行训练）

formulates a new downstream task as a text sequence generation task on the frozen GPT-3 model（该模型将新的下游任务视为文本序列生成任务）

3.2 Stage-1. Answer Heuristics Generation

answer candidates（由图像和问题预测答案，这里的答案包括预测得分）

answer-aware examples（与真实答案相似的例子，这里感觉就是寻找答案相关的question）

Answer candidates

选择前K个有最高得分的答案

Answer-aware examples

这里作者假设：问题q和图像i融合后的特征应该是某一个答案空间，那么，如果某一对question-image对和训练数据集中的question-image对的融合特征非常相似，我们认为这两组特征所对应的question-image和答案有很大相关性。

3.3 Stage-2. Heuristics-enhanced Prompting

4. Experiments

4.1 Datasets

4.2 Implementation Details

图像模型：grid-based features extracted from CLIP’s visual encoder with a RN50× 64 backbone

语言模型：BERT-large model

基础VQA模型：MCAN-large

weixin_45154287

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

（1）first train a vanilla VQA model on a specific knowledge-based VQA dataset without external knowledge（首先在特定基于知识的VQA数据集上训练一个VQA模型，不引入外部知识）；GPT-3： autoregressive language model（是一个自回归语言模型，用大量的语料库进行训练）answer-aware examples（与真实答案相似的例子，这里感觉就是寻找答案相关的question）
复制链接

扫一扫