[AI OpenAI-doc] 文件搜索 Beta

最新推荐文章于 2025-08-12 12:48:17 发布

原创

最新推荐文章于 2025-08-12 12:48:17 发布 · 1.5k 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #openai

文件搜索通过从其模型外部获取的知识增强了助手的功能，例如专有产品信息或用户提供的文档。OpenAI 自动解析和分块您的文档，创建并存储嵌入，并使用向量和关键字搜索来检索相关内容，以回答用户的查询。

快速入门

在这个示例中，我们将创建一个助手，可以帮助回答关于公司财务报表的问题。

步骤 1：创建启用了文件搜索的新助手

在助手的 tools 参数中启用 file_search，创建一个新的助手。

from openai import OpenAI
 
client = OpenAI()
 
assistant = client.beta.assistants.create(
  name="Financial Analyst Assistant",
  instructions="您是一位专业的财务分析师。请使用您的知识库来回答关于审计财务报表的问题。",
  model="gpt-4-turbo",
  tools=[{
   
   "type": "file_search"}],
)

启用了 file_search 工具后，模型会根据用户消息决定何时检索内容。

步骤 2：上传文件并将它们添加到向量存储库

要访问您的文件，文件搜索工具使用 Vector Store 对象。上传您的文件并创建一个 Vector Store 来容纳它们。一旦创建了 Vector Store，您应该轮询其状态，直到所有文件都不再处于“in_progress”状态，以确保所有内容都已完成处理。SDK 提供了一次性上传和轮询的帮助程序。

# Create a vector store caled "Financial Statements"
vector_store = client.beta.vector_stores.create(name="Financial Statements")
 
# Ready the files for upload to OpenAI
file_paths = ["edgar/goog-10k.pdf", "edgar/brka-10k.txt"]
file_streams = [open(path, "rb") for path in file_paths]
 
# Use the upload and poll SDK helper to upload the files, add them to the vector store,
# and poll the status of the file batch for completion.
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
  vector_store_id=vector_store.id, files=file_streams
)
 
# You can print the status and the file counts of the batch to see the result of this operation.
print(file_batch.status)
print(file_batch.file_counts)

步骤 3：更新助手以使用新的向量存储库

为了使文件对您的助手可访问，请使用新的 vector_store id 更新助手的 tool_resources。

assistant = client.beta.assistants.update(
  assistant_id=assistant.id,
  tool_resources={
   
   "file_search": {
   
   "vector_store_ids": [vector_store.id]}},
)

步骤 4：创建一个线程

您也可以将文件作为消息附件附加到您的线程上。这样做将创建另一个与线程关联的向量存储库，或者，如果已经有一个向量存储库附加到此线程上，则将新文件附加到现有线程向量存储库上。当您在此线程上创建一个运行时，文件搜索工具将查询助手的向量存储库和线程上的向量存储库。

在这个例子中，用户附加了一份苹果公司最新的 10-K 报告。

# 将用户提供的文件上传到 OpenAI
message_file = client.files.create(
  file=open("edgar/aapl-10k.pdf", "rb"), purpose="assistants"
)
 
# 创建一个线程并将文件附加到消息中
thread = client.beta.threads.create(
  messages=[
    {
   
   
      "role": "user",