RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

宇宙飞船冲上月球

于 2024-04-23 10:15:29 发布

阅读量1.2k

点赞数 12

文章标签：人工智能

本文链接：https://blog.csdn.net/yzsjwd/article/details/138113519

版权

RAT利用检索信息和思考步骤修正LLM在长视野任务中的推理，通过迭代改进减少幻觉。评估显示在多种能力上优于传统方法，且实现简单，易于集成。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

RAT（检索增强思维）是一种旨在解决长视野生成任务中上下文感知推理的AI提示策略。RAT 的核心思想是在生成初始的零样本 CoT 之后，利用与任务查询相关的检索信息、当前和过去的思想步骤来逐一修正每个思考步骤。

该方法通过迭代修正思维链的方式来显著提高 LLM 在长期生成任务中的推理和生成能力，同时大幅减少幻觉（hallucination）现象。

在这里插入图片描述

Pipeline of RAT

在这里插入图片描述

假设我们的 Task Prompt (I) 经过 LLM + Cot 的方式生成了一系列思考过程 Thoughts (T1 , . . . ,Tn)

{user} 
##Question:
{question}
##Instruction: 
Try to answer this question/instruction with step-by-step thoughts and make the answer more structural. Use /n/n to split the answer into several paragraphs. Just respond to the instruction directly. DO NOT add additional explanations or introducement in the answer unless you are asked to. 

{assistant} 
...

由于潜在的幻觉问题，Thoughts 中的每一步 Ti 都可能出错。为了通过 RAG 对整个 Cot 过程进行增强，我们将前 i 步思考过程 T1 , . . . , Ti 结合 Task Prompt I 进行问题改写得到新的问题 Qi

{user} 
##Question:
{question}
##Content:
{answer}
##Instruction:
I want to verify the content correctness of the given question, especially the last sentences.
Please summarize the content with the corresponding question.
This summarization will be used as a query to search with Bing search engine.
The query should be short but need to be specific to promise Bing can find related knowledge or pages.
You can also use search syntax to make the query short and clear enough for the search engine to find relevant language
data.
Try to make the query as relevant as possible to the last few sentences in the content.
**IMPORTANT**
Just output the query directly. DO NOT add additional explanations or introducement in the answer unless you are
asked to.
{assistant}
...

改写后的问题 Qi 将用于 RAG 进行检索召回，得到召回结果 Ri

最后基于 T1 , . . . , Ti-1 和 Ri 生成新版 Ti* ，并代替原来的 Ti 拼接进上下文，做下一次 RAT 生成

{user}
##Existing Text in Wiki Web:
{content}
##Question:
{question}
##Answer:
{answer}
##Instruction:
I want to revise the answer according to retrieved related text of the question in WIKI pages.
You need to check whether the answer is correct.
If you find some errors in the answer, revise the answer to make it better.
If you find some necessary details are ignored, add it to make the answer more plausible according to the related text.
If you find the answer is right and do not need to add more details, just output the original answer directly.
**IMPORTANT**
Try to keep the structure (multiple paragraphs with its subtitles) in the revised answer and make it more structural
for understanding. Split the paragraphs with /n/n characters. Just output the revised answer directly. DO NOT add
additional explanations or annoucement in the revised answer unless you are asked to.
{assistant}
...

evalution

分别对四种能力进行评测

Embodied Planing：任务规划能力，选择在我的世界 Minecraft 游戏中设计了100个不同难度的任务目标，使用最后的任务达成率作为评估指标，Task prompt 如下：
Give you nothing in the inventory, generate a step-by-step plan for the task of obtaining a {placeholder:acacia_boat} in Minecraft survival mode, and describe the object Minecraft item and its number at every step. For every step, start with ’STEP’ as start.
Code Generation：代码能力，选取 HumanEval+、HumanEval、MBPP、MBPP+ 数据集进行测试，涵盖了从简单的函数实现到更复杂的算法挑战的广泛编程问题，选择了经典的通过率 pass@k 作为评估指标
Mathematical Reasoning：数学推理能力，选取 GSM8K 和 GSM-HARD，包含一些长步骤数学推理的问题，使用准确率作为评估指标
Creative Writing：写作能力，设计了一些文本生成任务，这里使用人工 elo 偏好排序进行评估，Task prompt 如下：
a. Write a survey paper to summarize the xxxx
b. Describe of xxxxx

比较明显的发现，RAT在这些具有挑战性的长推理和生成任务具有较大的优势
在这里插入图片描述