使用Elasticsearch进行向量存储的实践教程

最新推荐文章于 2024-07-30 16:28:13 发布

llzwxh888

最新推荐文章于 2024-07-30 16:28:13 发布

阅读量435

点赞数 4

文章标签： elasticsearch jenkins 大数据 python

本文链接：https://blog.csdn.net/ppoojjj/article/details/140538433

版权

在现代的AI应用中，处理和存储向量化的数据是实现高效查询和检索的重要任务。Elasticsearch（ES）作为一个分布式的搜索引擎，擅长处理全文搜索和分析。通过结合向量化存储，Elasticsearch可以帮助我们高效地进行相似度搜索。本文将介绍如何使用Elasticsearch进行向量存储，包括设置和使用方法。

安装Elasticsearch

首先，我们需要安装Elasticsearch。可以通过以下命令安装：

pip install llama-index-vector-stores-elasticsearch

设置ElasticsearchStore

接下来，我们将设置ElasticsearchStore类。下面的示例展示了几种不同的连接方式。

本地连接

from llama_index.vector_stores import ElasticsearchStore

# 设置Elasticsearch索引名称和URL
index_name = "my_index"
es_url = "http://localhost:9200"

# 本地连接Elasticsearch
es_local = ElasticsearchStore(
    index_name=index_name,
    es_url=es_url,
)

使用用户名和密码连接到Elastic Cloud

from llama_index.vector_stores import ElasticsearchStore

# 设置参数
index_name = "my_index"
es_cloud_id = "<cloud-id>"  # 在部署页面中找到
es_user = "elastic"
es_password = "<password>"  # 在创建部署时提供或可以重置

# 使用用户名和密码连接Elasticsearch
es_cloud_user_pass = ElasticsearchStore(
    index_name=index_name,
    es_cloud_id=es_cloud_id,
    es_user=es_user,
    es_password=es_password,
)

使用API Key连接到Elastic Cloud

from llama_index.vector_stores import ElasticsearchStore

# 设置参数
index_name = "my_index"
es_cloud_id = "<cloud-id>"
es_api_key = "<api-key>"  # 在Kibana安全部分创建API Key

# 使用API Key连接Elasticsearch
es_cloud_api_key = ElasticsearchStore(
    index_name=index_name,
    es_cloud_id=es_cloud_id,
    es_api_key=es_api_key,
)

添加向量数据到Elasticsearch

我们可以使用add和async_add方法将向量数据添加到Elasticsearch索引：

from llama_index.vector_stores import ElasticsearchStore
from llama_index.vector_stores.base import BaseNode

# 创建虚拟节点数据
node = BaseNode(id="1", embedding=[0.1, 0.2, 0.3], text="This is a test node")

# 添加节点到Elasticsearch索引
es_local.add(nodes=[node])

# 异步添加节点到Elasticsearch索引
import asyncio
asyncio.run(es_local.async_add(nodes=[node]))

查询向量数据

使用query和aquery方法可以查询相似的节点：

from llama_index.vector_stores.base import VectorStoreQuery

# 创建一个查询向量
query_embedding = [0.1, 0.2, 0.3]
query = VectorStoreQuery(embedding=query_embedding)

# 同步查询
result = es_local.query(query=query)

# 异步查询
async def main():
    result = await es_local.aquery(query=query)
    print(result)

asyncio.run(main())

可能遇到的错误

ConnectionError 错误：
- 如果AsyncElasticsearch客户端无法连接到Elasticsearch服务器，会抛出ConnectionError。请确保服务器地址和端口配置正确，并且Elasticsearch服务已经启动。
ValueError 错误：
- 如果既没有提供es_client、es_url，也没有提供es_cloud_id，会抛出ValueError。请确保至少提供其中之一用于连接。
ImportError 错误：
- 如果未安装elasticsearch['async'] Python包，在调用add或async_add方法时会抛出ImportError。请确保安装了相应的依赖包。
BulkIndexError 错误：
- 如果批量索引操作失败，会抛出BulkIndexError。请检查提供的数据和索引配置是否正确。
Exception：
- 如果Elasticsearch的delete_by_query方法失败，会抛出Exception。请检查文件ID是否存在和查询是否正确。