LlamaIndex --- Storing

需要重新演唱

已于 2024-07-25 15:59:13 修改

阅读量259

点赞数 5

分类专栏： llamaindex 文章标签：人工智能大模型 RAG

于 2024-07-25 15:59:02 首次发布

本文链接：https://blog.csdn.net/xycxycooo/article/details/140692491

版权

llamaindex 专栏收录该内容

28 篇文章 1 订阅

订阅专栏

Storing 相关内容

概念解释

Storing（存储）：在数据加载和索引之后，通常希望将数据存储起来，以避免重新索引的时间和成本。默认情况下，索引数据仅存储在内存中。

Persisting to disk（持久化到磁盘）：将索引数据写入磁盘，以便长期保存和快速恢复。

Vector Stores（向量存储）：用于存储向量嵌入的存储系统，可以避免频繁的重新索引操作。

持久化到磁盘

最简单的存储索引数据的方法是使用每个索引内置的 .persist() 方法，将所有数据写入指定位置的磁盘。这种方法适用于任何类型的索引。

示例代码：

# 持久化索引数据到磁盘
index.storage_context.persist(persist_dir="<persist_dir>")

对于组合图（Composable Graph），可以这样操作：

# 持久化组合图的根索引数据到磁盘
graph.root_index.storage_context.persist(persist_dir="<persist_dir>")

然后，可以通过加载持久化的索引来避免重新加载和重新索引数据：

from llama_index.core import StorageContext, load_index_from_storage

# 重建存储上下文
storage_context = StorageContext.from_defaults(persist_dir="<persist_dir>")

# 从存储中加载索引
index = load_index_from_storage(storage_context)

提示：如果初始化索引时使用了自定义的转换、嵌入模型等，需要在 load_index_from_storage 时传入相同的选项，或者将其设置为全局设置。

使用向量存储

如索引部分所述，最常见的索引类型之一是 VectorStoreIndex。创建 VectorStoreIndex 中的嵌入API调用可能会在时间和金钱上产生较高的成本，因此希望将它们存储起来，以避免频繁的重新索引。

LlamaIndex 支持多种向量存储，这些存储在架构、复杂性和成本上各不相同。在本例中，我们将使用 Chroma，一个开源的向量存储。

首先需要安装 Chroma：

pip install chromadb

要使用 Chroma 存储 VectorStoreIndex 中的嵌入，需要：

初始化 Chroma 客户端
创建一个集合（Collection）来存储数据
将 Chroma 分配为 StorageContext 中的 vector_store
使用该 StorageContext 初始化 VectorStoreIndex

示例代码：

import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

# 加载一些文档
documents = SimpleDirectoryReader("./data").load_data()

# 初始化客户端，设置数据保存路径
db = chromadb.PersistentClient(path="./chroma_db")

# 创建集合
chroma_collection = db.get_or_create_collection("quickstart")

# 将 Chroma 分配为 vector_store 到上下文
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# 创建索引
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

# 创建查询引擎并查询
query_engine = index.as_query_engine()
response = query_engine.query("What is the meaning of life?")
print(response)

如果已经创建并存储了嵌入，希望直接加载它们而不重新加载文档或创建新的 VectorStoreIndex：

import chromadb
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

# 初始化客户端
db = chromadb.PersistentClient(path="./chroma_db")

# 获取集合
chroma_collection = db.get_or_create_collection("quickstart")

# 将 Chroma 分配为 vector_store 到上下文
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# 从存储的向量加载索引
index = VectorStoreIndex.from_vector_store(
    vector_store, storage_context=storage_context
)

# 创建查询引擎
query_engine = index.as_query_engine()
response = query_engine.query("What is llama2?")
print(response)

提示：如果想更深入地了解 Chroma 的使用，可以参考更详细的示例。

插入文档或节点

如果已经创建了一个索引，可以使用 insert 方法向索引中添加新文档：

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex([])
for doc in documents:
    index.insert(doc)

有关文档管理的更多详细信息和示例笔记本，请参阅文档管理指南。

总结

通过本课程，我们详细讲解了Storing的概念及其在LlamaIndex中的应用。我们介绍了如何将索引数据持久化到磁盘，以及如何使用向量存储来存储和加载嵌入。这些内容将帮助学生更好地理解和应用LlamaIndex中的数据存储功能。

需要重新演唱

关注

5
点赞
踩
3

收藏

觉得还不错? 一键收藏
打赏
0
评论
LlamaIndex --- Storing

通过本课程，我们详细讲解了Storing的概念及其在LlamaIndex中的应用。我们介绍了如何将索引数据持久化到磁盘，以及如何使用向量存储来存储和加载嵌入。这些内容将帮助学生更好地理解和应用LlamaIndex中的数据存储功能。
复制链接

扫一扫