问题改写提示词提升多跳问题的检索效果，用户输入部分放到提示词最后

jieshenai

已于 2025-05-16 12:59:52 修改

阅读量819

点赞数 15

分类专栏：大模型应用文章标签：自然语言处理人工智能

于 2025-05-16 10:47:10 首次发布

本文链接：https://blog.csdn.net/sjxgghg/article/details/148001223

版权

大模型应用专栏收录该内容

5 篇文章

订阅专栏

文章目录

在使用大模型处理多跳问题（multi-hop question）时，我们常常面临一个挑战：原始问题可能不够具体或缺乏关键实体信息，导致语义搜索系统难以准确检索到相关答案。为了解决这个问题，现在大家常使用问题改写，获取深层次的知识。下述是一套有效的问题改写提示词（prompt），专门用于“问题改写”阶段，帮助模型生成更清晰、更具实体导向的新问题。

这套提示词经过实际测试，效果不错。

实验效果

qwen-2.5-7B 作为问题改写的大模型。
在 hotpot 数据集上的测试，1000条数据构建向量数据库：

直接使用用户问题在向量数据库中做召回TopK@10 hite_rate命中率可以达到82%左右。
使用问题改写后，TopK@10 + TopK@10 hite_rate命中率可以达到91%左右。

其实 qwen-2.5-7B的问题改写能力不强，如果你不使用下述提示词，会发现很多问题改写都失败了，无法获得下一步的信息。但一些强大的大模型比如 gpt-4o等大参数的模型表现很好。

使用下述问题改写的提示词达到的效果，可以与gpt-4o问题改写相媲美！

一、提示词设计思路详解

以下是我在项目中使用的提示词模板，专门用于引导大模型进行高质量的问题改写：

query_rewrite = """
You are given the following four elements:

1. **Original Question**
2. **Relevant Supporting Text(s)**

Your task is to **create a new, better question** that would help a semantic search system (like vector-based retrieval) find relevant information more accurately.

### 🔍 Follow These Clear Steps:

**Step 1: Understand the original question.**
Identify what the question is asking — focus on the key person, object, or event it refers to.

**Step 2: Extract the key detail from the supporting text.**
Look carefully at the relevant text and **find the most important new information** — especially **names**, dates, roles, or titles.
👉 **You must include this key information in the new question.**

**Step 3: Create a natural follow-up question.**
Now, think of a new question that:

* Focuses on the subject identified from the relevant text (e.g., a person).
* Moves the conversation toward what the original question was looking for (but in a clearer or more direct way).

**Step 4: Write the new question clearly and completely.**
Your final question must:

* **Include the key entity or name (e.g., a person) from the relevant text.**
* Be directly connected to the original topic.
* Make it easier for a search system to retrieve the right answer.

### 🚫 Do Not:

* Leave out key names or details that were introduced in the relevant text.
* Repeat the original question exactly.

### ✅ Example (Just for Reference - Do Not Copy):

If the original question was:

> "Which team does the quarterback picked first in the 2010 draft play for?"

And the relevant text tells us:

> "Sam Bradford was taken first in the 2010 draft."

Then your new question **must include 'Sam Bradford'** and could be something like:

> "Which NFL team did Sam Bradford play for during the early 2010s?"

### 🎯 Output Format:
1. A multi-step, logically coherent explanation showing your reasoning process.
2. A json block at the end containing the final inferred question.

{{
  "new_question": "Your clearly written, specific, entity-rich question goes here."
}}

### Input Format:
- Question: {user_question}
- Relevant Texts: {relevant_texts}
""".lstrip()

这个提示词的设计有几个关键点：

分步指导明确：从理解原问题、提取关键信息，到构造新问题，每一步都有清晰指引。
强调实体信息：要求必须包含从支持文本中提取的关键实体，如人名、地名等。
输出格式规范：以 JSON 格式返回结果，便于后续解析和集成到系统中。
示例辅助理解：通过一个具体例子帮助模型更好地理解任务目标。

二、为什么这个提示词如此有效？

根据我的实践观察，以下几点是该提示词成功的核心原因：

1. 实体抽取能力增强

多跳问题往往需要模型在多个文档之间跳跃推理，而原始问题通常模糊不清。通过要求模型从支持文本中提取关键实体（如人名、时间、地点等），可以显著提高问题的具体性。

例如，在“谁是2010年冠军？”这个问题中，如果支持文本指出“Sam Bradford 是 2010 年冠军”，那么新的问题就可以被改写为“Sam Bradford 效力于哪支 NFL 球队？”，这显然更容易被搜索引擎识别并定位答案，或者在向量数据库中检索。

2. 输出格式统一，利于自动化处理

在提示词中给了一个例子，让大模型根据相关文本中的信息，完成原始问题中未知内容的替换

三、实战案例演示

我们来看一个具体的例子，感受一下这套提示词的实际效果：

原始问题：
“《百年孤独》这本书的作者，还写过哪些书？”

相关文本：
“加布里埃尔·加西亚·马尔克斯（Gabriel García Márquez）是哥伦比亚著名作家，他创作了《百年孤独》。”

按照提示词的步骤，模型会这样推理并生成新问题：

原问题询问的是《百年孤独》的作者，写过哪些书？
支持文本提供了关键信息：“Gabriel García Márquez 创作了这本书。”
新问题是：“Gabriel García Márquez 写过哪些著名小说？” 或者更精确地指向主题：“Gabriel García Márquez 的代表作是什么？”

这样改写后的问题不仅包含了关键实体，而且更适合搜索引擎检索，极大提高了找到正确答案的概率。