如何使用SubQuestionQueryEngine处理复杂查询

最新推荐文章于 2024-10-03 09:02:12 发布

llzwxh888

最新推荐文章于 2024-10-03 09:02:12 发布

阅读量317

点赞数 10

文章标签： python

本文链接：https://blog.csdn.net/ppoojjj/article/details/140320299

版权

在这篇文章中，我们将展示如何使用 SubQuestionQueryEngine 处理复杂的查询问题，以及如何将这些问题拆分成多个子问题并从多个数据源中提取信息来综合最终的答案。

准备工作

首先，如果你在Colab中打开此笔记本，你可能需要安装LlamaIndex。

!pip install llama-index
import os
os.environ["OPENAI_API_KEY"] = "sk-..."  # 请用你的OpenAI API密钥替换
import nest_asyncio
nest_asyncio.apply()
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core import Settings

# 使用LlamaDebugHandler来打印捕获的子问题的跟踪信息
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])
Settings.callback_manager = callback_manager

下载数据

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

加载数据

# 加载数据
pg_essay = SimpleDirectoryReader(input_dir="./data/paul_graham/").load_data()

# 构建索引和查询引擎
vector_query_engine = VectorStoreIndex.from_documents(
    pg_essay,
    use_async=True,
).as_query_engine()

设置SubQuestionQueryEngine

# 设置基础查询引擎作为工具
query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="pg_essay",
            description="Paul Graham essay on What I Worked On",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

运行查询

response = query_engine.query(
    "How was Paul Grahams life different before, during, and after YC?"
)
print(response)

捕获子问题

# 遍历捕获的SUB_QUESTION事件的子问题项
from llama_index.core.callbacks import CBEventType, EventPayload

for i, (start_event, end_event) in enumerate(
    llama_debug.get_event_pairs(CBEventType.SUB_QUESTION)
):
    qa_pair = end_event.payload[EventPayload.SUB_QUESTION]
    print("Sub Question " + str(i) + ": " + qa_pair.sub_q.sub_question.strip())
    print("Answer: " + qa_pair.answer.strip())
    print("====================================")

示例输出

Sub Question 0: What did Paul Graham work on before YC?
Answer: Paul Graham worked on writing essays and working on YC before YC.
====================================
Sub Question 1: What did Paul Graham work on during YC?
Answer: During his time at YC, Paul Graham worked on various projects. He wrote all of YC's internal software in Arc and also worked on Hacker News (HN), which was a news aggregator initially meant for startup founders but later changed to engage intellectual curiosity. Additionally, he wrote essays and worked on helping the startups in the YC program with their problems.
====================================
Sub Question 2: What did Paul Graham work on after YC?
Answer: After YC, Paul Graham worked on starting his own investment firm with Jessica.
====================================