利用 Llama-Index为你的应用程序注入智能搜索

黑金IT

于 2024-09-04 18:19:22 发布

阅读量408

点赞数 8

文章标签： llama python 向量数据

本文链接：https://blog.csdn.net/ylong52/article/details/141900595

版权

Llama-Index 是一个基于大型语言模型的索引和检索工具，它允许用户快速检索和使用大量文本数据。要安装 Llama-Index，你需要确保你的 Python 环境已经设置好，并且你有足够的系统资源来运行它，因为它可能需要较大的内存和计算能力。

安装 Llama-Index 的一般步骤：

确保你已经安装了 Python 和 pip。你可以通过运行以下命令来检查 Python 版本：

pip install llama-index

运行 Llama-Index 的测试或示例。通常，你可以在 Llama-Index 的官方文档或 GitHub 仓库中找到测试或示例代码，以确保它在你的系统上正常工作。

请注意，Llama-Index 可能依赖于其他库和工具，如 transformers、faiss、pytorch 或 tensorflow 等。如果在安装过程中遇到依赖性问题，你可能需要先安装这些依赖项。

使用 Llama-Index 来索引一些文档，首先你需要创建一些文档并将其放入文件夹中。以下是一个完整的示例，展示了如何创建文档、读取这些文档、构建索引以及使用索引进行查询。

以下是插入数据并创建索引的步骤：

创建文档：首先，你需要创建一些文档。在这个示例中，我们将创建一个简单的文本文件。
读取文档：使用 Llama-Index 的 SimpleDirectoryReader 来读取这些文档。
创建索引：从读取的文档中创建索引。
查询索引：使用创建的索引进行查询。

代码默认是选择使用 OpenAI 提供的嵌入模型，你需要在代码中设置 OpenAI 的 API 密钥。这通常通过设置环境变量来完成，或者你可以直接在代码中指定 API 密钥。

from llama_index.embeddings.openai import OpenAIEmbedding

# 直接在代码中设置 API 密钥
api_key = 'your_openai_api_key_here'
embed_model = OpenAIEmbedding(api_key=api_key)

假设你的工作目录中有一个名为 documents 的空文件夹，以下是 Python 脚本的示例：

import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from IPython.display import Markdown, display

# 创建文档
document_path = "documents"
if not os.path.exists(document_path):
    os.makedirs(document_path)

# 写入一些文档
documents = [
    ("documents/doc1.txt", "The capital of France is Paris."),
    ("documents/doc2.txt", "The capital of Italy is Rome."),
    ("documents/doc3.txt", "Paris is also known as the City of Light.")
]

for doc_path, content in documents:
    with open(doc_path, "w") as f:
        f.write(content)

# 读取文档
documents = SimpleDirectoryReader(document_path).load_data()

# 从文档创建索引
index = VectorStoreIndex.from_documents(documents)

# 创建查询引擎
query_engine = index.as_query_engine()

# 执行查询
response = query_engine.query("What is the capital of France?")

# 显示查询结果
display(Markdown(f"<b>{response}</b>"))

在这个脚本中：

我们首先检查 documents 文件夹是否存在，如果不存在则创建它。
然后，我们创建并写入三个简单的文本文件。

使用 SimpleDirectoryReader 读取这些文档。
使用 VectorStoreIndex 从文档创建索引。
使用 query_engine 进行查询，并使用 IPython 的 display 函数显示查询结果。

运行这个脚本后，你将能够看到关于法国首都的查询结果。这个示例展示了如何从空文件夹开始，创建文档，索引这些文档，并进行查询。

** 选择使用 Hugging Face 提供的模型或其他兼容的嵌入模型 **
Llama-Index 是设计来与各种大型语言模型（LLMs）一起工作的，包括 OpenAI 提供的模型。如果你不希望使用 OpenAI 的服务，Llama-Index 也支持使用其他来源的嵌入模型和语言模型。

如果你想避免使用 OpenAI 并使用其他嵌入模型，你可以选择使用 Hugging Face 提供的模型或其他兼容的嵌入模型。以下是如何使用 Hugging Face 嵌入模型的示例：

首先，确保你已经安装了 sentence-transformers 库，这是一个常用于生成文本嵌入的库：

pip install sentence-transformers

然后，你可以在 Llama-Index 中配置使用 Hugging Face 嵌入模型：

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# 选择一个 Hugging Face 模型
model_name = "sentence-transformers/all-MiniLM-L6-v2"

# 创建嵌入模型实例
embed_model = HuggingFaceEmbedding(model_name=model_name)

在这个示例中，我们使用了 sentence-transformers/all-MiniLM-L6-v2 模型来生成文本嵌入。你可以选择任何其他兼容的 Hugging Face 模型。

接下来，你可以使用这个嵌入模型来创建索引，而不需要依赖 OpenAI 的 API 密钥：

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# 假设你已经有了一些文档
documents = SimpleDirectoryReader("path_to_your_documents").load_data()

# 使用 Hugging Face 嵌入模型创建索引
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)

# 创建查询引擎
query_engine = index.as_query_engine()

# 执行查询
response = query_engine.query("Your query goes here")

# 打印响应
print(response)