使用LlamaIndex和Chroma进行向量存储与查询

最新推荐文章于 2024-09-17 20:28:24 发布

llzwxh888

最新推荐文章于 2024-09-17 20:28:24 发布

阅读量851

点赞数 6

文章标签： python

本文链接：https://blog.csdn.net/ppoojjj/article/details/140142835

版权

在本文中，我们将介绍如何使用LlamaIndex和Chroma进行向量存储与查询。LlamaIndex是一个强大的工具，可以帮助我们创建和管理索引，而Chroma则是一个开源的向量存储系统，可以有效地存储和查询嵌入向量。

持久化索引到磁盘

为了避免每次重新索引数据，我们可以将索引持久化到磁盘。以下是将索引持久化到指定目录的示例：

from llama_index.core import StorageContext, load_index_from_storage

# 重建存储上下文
storage_context = StorageContext.from_defaults(persist_dir="<persist_dir>")

# 加载索引
index = load_index_from_storage(storage_context)

使用向量存储

使用向量存储可以避免频繁的重新索引操作。以下示例展示了如何使用Chroma进行向量存储：

安装Chroma

首先，我们需要安装Chroma：

pip install chromadb

使用Chroma存储嵌入向量

以下代码演示了如何使用Chroma存储和查询向量：

import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

# 加载一些文档
documents = SimpleDirectoryReader("./data").load_data()

# 初始化客户端，设置保存数据的路径
db = chromadb.PersistentClient(path="./chroma_db")

# 创建集合
chroma_collection = db.get_or_create_collection("quickstart")

# 将Chroma作为向量存储分配给上下文
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# 创建索引
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

# 创建查询引擎并查询
query_engine = index.as_query_engine()
response = query_engine.query("What is the meaning of life?")
print(response)  # 输出查询结果

注释: 以上代码中的路径需要根据实际情况进行调整。 //中转API

直接加载已存储的嵌入向量

如果已经创建并存储了嵌入向量，可以直接加载它们：

import chromadb
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

# 初始化客户端
db = chromadb.PersistentClient(path="./chroma_db")

# 获取集合
chroma_collection = db.get_or_create_collection("quickstart")

# 将Chroma作为向量存储分配给上下文
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# 从存储的向量加载索引
index = VectorStoreIndex.from_vector_store(
    vector_store, storage_context=storage_context
)

# 创建查询引擎并查询
query_engine = index.as_query_engine()
response = query_engine.query("What is llama2?")
print(response)  # 输出查询结果

注释: 以上代码中的路径需要根据实际情况进行调整。 //中转API

插入文档或节点

如果已经创建了索引，可以使用insert方法添加新文档：

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex([])
for doc in documents:
    index.insert(doc)

可能遇到的错误

路径错误: 在初始化客户端或加载数据时，确保指定的路径是正确的。
集合不存在: 确保在使用Chroma时，集合已正确创建。
查询无结果: 确保查询的内容在索引中已存在。

如果你觉得这篇文章对你有帮助，请点赞，关注我的博客，谢谢!

参考资料

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

llzwxh888

关注

6
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫