在本文中,我们将介绍如何将Weaviate与LlamaIndex结合使用,构建和查询向量存储索引。此方法可以帮助我们有效地管理和查询大量文档数据。本文还将提供一个实际的Demo代码,展示如何一步步实现上述功能。
创建Weaviate客户端
首先,我们需要安装所需的包:
%pip install llama-index-vector-stores-weaviate
!pip install llama-index
然后,创建一个Weaviate客户端:
import os
import openai
# 设置环境变量
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY_HERE"
openai.api_key = os.environ["OPENAI_API_KEY"]
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
import weaviate
# 连接到云端Weaviate
resource_owner_config = weaviate.AuthClientPassword(
username="<username>",
password="<password>",
)
client = weaviate.Client(
"https://llama-test-ezjahb4m.weaviate.network",
auth_client_secret=resource_owner_config,
)
# 本地连接可以使用如下代码
# client = weaviate.Client("http://localhost:8080")
加载文档,建立向量存储索引
下载数据并加载文档:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
构建向量存储索引:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.weaviate import WeaviateVectorStore
from IPython.display import Markdown, display
# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
# 初始化存储上下文
from llama_index.core import StorageContext
vector_store = WeaviateVectorStore(weaviate_client=client, index_name="LlamaIndex")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# 创建向量存储索引
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
查询索引
# 设置日志级别为DEBUG以获得详细输出
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
display(Markdown(f"<b>{response}</b>"))
# 输出结果
加载已创建的索引
resource_owner_config = weaviate.AuthClientPassword(
username="<username>",
password="<password>",
)
client = weaviate.Client(
"https://llama-test-ezjahb4m.weaviate.network",
auth_client_secret=resource_owner_config,
)
vector_store = WeaviateVectorStore(
weaviate_client=client, index_name="LlamaIndex"
)
# 从向量存储中加载索引
loaded_index = VectorStoreIndex.from_vector_store(vector_store)
# 设置日志级别为DEBUG以获得详细输出
query_engine = loaded_index.as_query_engine()
response = query_engine.query("What happened at interleaf?")
display(Markdown(f"<b>{response}</b>"))
元数据过滤查询
from llama_index.core import Document
# 插入一个示例文档
doc = Document.example()
print(doc.metadata)
print("-----")
print(doc.text[:100])
loaded_index.insert(doc)
from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters
# 创建元数据过滤器
filters = MetadataFilters(
filters=[ExactMatchFilter(key="filename", value="README.md")]
)
query_engine = loaded_index.as_query_engine(filters=filters)
response = query_engine.query("What is the name of the file?")
display(Markdown(f"<b>{response}</b>"))
可能遇到的错误
- 网络连接问题:在连接到Weaviate服务器时,可能会遇到网络连接问题。请检查网络连接和服务器地址。
- API密钥问题:如果出现API密钥无效的情况,请确保环境变量已正确设置,并且API密钥有效。
- 数据格式问题:加载文档时,确保文档格式正确,否则可能会导致解析错误。
如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!
参考资料: