40 LlamaIndex多步查询引擎：分解复杂查询-CSDN博客

本文链接：https://blog.csdn.net/xycxycooo/article/details/141353029

LlamaIndex多步查询引擎：分解复杂查询

在本指南中，我们将介绍如何设置一个多步查询引擎，该引擎能够将复杂查询分解为连续的子问题。如果你在Colab上打开此笔记本，你可能需要安装LlamaIndex。

安装LlamaIndex

%pip install llama-index-llms-openai
!pip install llama-index

下载数据

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

加载文档，构建VectorStoreIndex

import os

os.environ["OPENAI_API_KEY"] = "sk-..."
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from IPython.display import Markdown, display

# LLM (gpt-3.5)
gpt35 = OpenAI(temperature=0, model="gpt-3.5-turbo")

# LLM (gpt-4)
gpt4 = OpenAI(temperature=0, model="gpt-4")

# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(documents)

查询索引

from llama_index.core.indices.query.query_transform.base import (
    StepDecomposeQueryTransform,
)

# gpt-4
step_decompose_transform = StepDecomposeQueryTransform(llm=gpt4, verbose=True)

# gpt-3
step_decompose_transform_gpt3 = StepDecomposeQueryTransform(
    llm=gpt35, verbose=True
)

index_summary = "Used to answer questions about the author"

# 设置Logging为DEBUG以获取更详细的输出
from llama_index.core.query_engine import MultiStepQueryEngine

query_engine = index.as_query_engine(llm=gpt4)
query_engine = MultiStepQueryEngine(
    query_engine=query_engine,
    query_transform=step_decompose_transform,
    index_summary=index_summary,
)

response_gpt4 = query_engine.query(
    "Who was in the first batch of the accelerator program the author started?",
)

display(Markdown(f"<b>{response_gpt4}</b>"))

sub_qa = response_gpt4.metadata["sub_qa"]
tuples = [(t[0], t[1].response) for t in sub_qa]
print(tuples)

示例输出

[('Who is the author of the accelerator program?', 'The author of the accelerator program is Paul Graham.'), ('Who was in the first batch of the accelerator program started by Paul Graham?', 'The first batch of the accelerator program started by Paul Graham included the founders of Reddit, Justin Kan and Emmett Shear who later founded Twitch, Aaron Swartz who had helped write the RSS spec and later became a martyr for open access, and Sam Altman who later became the second president of YC.')]

查询作者在哪个城市创立了他的第一家公司Viaweb

response_gpt4 = query_engine.query(
    "In which city did the author found his first company, Viaweb?",
)

print(response_gpt4)

使用gpt-3.5进行查询

query_engine = index.as_query_engine(llm=gpt35)
query_engine = MultiStepQueryEngine(
    query_engine=query_engine,
    query_transform=step_decompose_transform_gpt3,
    index_summary=index_summary,
)

response_gpt3 = query_engine.query(
    "In which city did the author found his first company, Viaweb?",
)

print(response_gpt3)