使用子问题查询引擎解决复杂查询

在这篇文章中,我们将展示如何使用子问题查询引擎(Sub Question Query Engine)通过多数据源回答复杂问题。这个引擎首先将复杂查询分解为针对每个相关数据源的子问题,然后收集所有中间响应并合成最终响应。

准备工作

如果你在Colab上打开这个Notebook,你可能需要安装LlamaIndex 🦙。

!pip install llama-index

import os

os.environ["OPENAI_API_KEY"] = "sk-..."  # 请替换为你自己的API密钥

import nest_asyncio
nest_asyncio.apply()

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core import Settings

# 使用LlamaDebugHandler打印子问题的跟踪记录
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])
Settings.callback_manager = callback_manager

下载数据

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt' 

# 加载数据
pg_essay = SimpleDirectoryReader(input_dir="./data/paul_graham/").load_data()

# 构建索引和查询引擎
vector_query_engine = VectorStoreIndex.from_documents(
    pg_essay,
    use_async=True,
).as_query_engine()

**********
Trace: index_construction
    |_CBEventType.NODE_PARSING ->  0.112481 seconds
      |_CBEventType.CHUNKING ->  0.105627 seconds
    |_CBEventType.EMBEDDING ->  0.959998 seconds
**********

设置子问题查询引擎

# 设置基础查询引擎作为工具
query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="pg_essay",
            description="Paul Graham essay on What I Worked On",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

运行查询

response = query_engine.query(
    "How was Paul Grahams life different before, during, and after YC?"
)

Generated 3 sub questions.
[pg_essay] Q: What did Paul Graham work on before YC?
[pg_essay] Q: What did Paul Graham work on during YC?
[pg_essay] Q: What did Paul Graham work on after YC?
[pg_essay] A: After YC, Paul Graham worked on starting his own investment firm with Jessica.
[pg_essay] A: During his time at YC, Paul Graham worked on various projects. He wrote all of YC's internal software in Arc and also worked on Hacker News (HN), which was a news aggregator initially meant for startup founders but later changed to engage intellectual curiosity. Additionally, he wrote essays and worked on helping the startups in the YC program with their problems.
[pg_essay] A: Paul Graham worked on writing essays and working on YC before YC.
**********
Trace: query
    |_CBEventType.QUERY ->  66.492657 seconds
      |_CBEventType.LLM ->  2.226621 seconds
      |_CBEventType.SUB_QUESTION ->  62.387177 seconds
        |_CBEventType.QUERY ->  62.386864 seconds
          |_CBEventType.RETRIEVE ->  0.271039 seconds
            |_CBEventType.EMBEDDING ->  0.269134 seconds
          |_CBEventType.SYNTHESIZE ->  62.115674 seconds
            |_CBEventType.TEMPLATING ->  2.8e-05 seconds
            |_CBEventType.LLM ->  62.108522 seconds
      |_CBEventType.SUB_QUESTION ->  2.421552 seconds
        |_CBEventType.QUERY ->  2.421303 seconds
          |_CBEventType.RETRIEVE ->  0.227773 seconds
            |_CBEventType.EMBEDDING ->  0.224198 seconds
          |_CBEventType.SYNTHESIZE ->  2.193355 seconds
            |_CBEventType.TEMPLATING ->  4.2e-05 seconds
            |_CBEventType.LLM ->  2.183101 seconds
      |_CBEventType.SUB_QUESTION ->  1.530997 seconds
        |_CBEventType.QUERY ->  1.530781 seconds
          |_CBEventType.RETRIEVE ->  0.25523 seconds
            |_CBEventType.EMBEDDING ->  0.252898 seconds
          |_CBEventType.SYNTHESIZE ->  1.275401 seconds
            |_CBEventType.TEMPLATING ->  3.2e-05 seconds
            |_CBEventType.LLM ->  1.26685 seconds
      |_CBEventType.SYNTHESIZE ->  1.877223 seconds
        |_CBEventType.TEMPLATING ->  1.6e-05 seconds
        |_CBEventType.LLM ->  1.875031 seconds
**********

print(response)

输出结果:

Paul Graham's life was different before, during, and after YC. Before YC, he focused on writing essays and working on YC. During his time at YC, he worked on various projects, including writing software, developing Hacker News, and providing support to startups in the YC program. After YC, he started his own investment firm with Jessica. These different phases in his life involved different areas of focus and responsibilities.

遍历子问题

from llama_index.core.callbacks import CBEventType, EventPayload

for i, (start_event, end_event) in enumerate(
    llama_debug.get_event_pairs(CBEventType.SUB_QUESTION)
):
    qa_pair = end_event.payload[EventPayload.SUB_QUESTION]
    print("Sub Question " + str(i) + ": " + qa_pair.sub_q.sub_question.strip())
    print("Answer: " + qa_pair.answer.strip())
    print("====================================")

输出结果:

Sub Question 0: What did Paul Graham work on before YC?
Answer: Paul Graham worked on writing essays and working on YC before YC.
====================================
Sub Question 1: What did Paul Graham work on during YC?
Answer: During his time at YC, Paul Graham worked on various projects. He wrote all of YC's internal software in Arc and also worked on Hacker News (HN), which was a news aggregator initially meant for startup founders but later changed to engage intellectual curiosity. Additionally, he wrote essays and worked on helping the startups in the YC program with their problems.
====================================
Sub Question 2: What did Paul Graham work on after YC?
Answer: After YC, Paul Graham worked on starting his own investment firm with Jessica.
====================================

可能遇到的错误

  1. API Key 未设置或无效:

    • 确保你在代码中正确设置了 OPENAI_API_KEY 环境变量。
  2. 网络问题:

    • 在下载数据时可能会遇到网络连接问题,确保你的网络连接稳定。
  3. 库安装失败:

    • 安装 llama-index 库时可能会遇到版本或兼容性问题,确保你的Python环境兼容该库。

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

参考资料:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值