PyMilvus 教程

最新推荐文章于 2024-08-21 08:42:22 发布

吉小雨

最新推荐文章于 2024-08-21 08:42:22 发布

阅读量1.7k

点赞数 8

分类专栏： python库文章标签： flask python 后端

本文链接：https://blog.csdn.net/jixiaoyu0209/article/details/140444906

版权

python库专栏收录该内容

53 篇文章 0 订阅

订阅专栏

PyMilvus 教程

PyMilvus 是一个 Python 客户端库，用于与 Milvus 进行交互。Milvus 是一个开源的向量数据库，专为处理大量向量数据而设计，广泛应用于图像检索、推荐系统、自然语言处理等领域。通过 PyMilvus，可以方便地连接 Milvus 服务器，创建集合、插入数据、执行向量搜索等操作。

官方文档链接

PyMilvus官方文档

安装 PyMilvus

首先，确保你已经安装了 PyMilvus 库。如果还没有安装，可以使用 pip 进行安装：

pip install pymilvus

快速入门

以下是一个快速入门示例，展示了如何使用 PyMilvus 连接 Milvus 服务器，创建集合，插入数据，并执行向量搜索。

1. 连接 Milvus 服务器

from pymilvus import connections

# 连接到 Milvus 服务器
connections.connect("default", host="localhost", port="19530")

2. 创建集合

from pymilvus import FieldSchema, CollectionSchema, DataType, Collection

# 定义字段
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
]

# 定义集合模式
schema = CollectionSchema(fields, description="测试集合")

# 创建集合
collection = Collection(name="example_collection", schema=schema)

3. 插入数据

import random

# 生成随机向量数据
vectors = [[random.random() for _ in range(128)] for _ in range(10)]

# 插入数据
collection.insert([vectors])

4. 执行向量搜索

# 加载集合
collection.load()

# 定义搜索参数
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}

# 执行搜索
results = collection.search(
    data=[vectors[0]],  # 查询向量
    anns_field="embedding",
    param=search_params,
    limit=5,
    expr=None
)

# 输出结果
for result in results:
    print(result)

详细分解

1. 连接 Milvus 服务器

from pymilvus import connections

# 连接到 Milvus 服务器
connections.connect("default", host="localhost", port="19530")

使用 connections.connect 方法连接到 Milvus 服务器。"default" 是连接的别名，host 和 port 分别指定服务器的地址和端口。

2. 创建集合

from pymilvus import FieldSchema, CollectionSchema, DataType, Collection

# 定义字段
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
]

# 定义集合模式
schema = CollectionSchema(fields, description="测试集合")

# 创建集合
collection = Collection(name="example_collection", schema=schema)

使用 FieldSchema 定义集合中的字段，包括字段名、数据类型和其他属性。
使用 CollectionSchema 定义集合的模式，包括字段和描述。
使用 Collection 创建集合。

3. 插入数据

import random

# 生成随机向量数据
vectors = [[random.random() for _ in range(128)] for _ in range(10)]

# 插入数据
collection.insert([vectors])

生成随机向量数据，用于插入集合。每个向量有 128 个维度。
使用 collection.insert 方法插入数据。

4. 执行向量搜索

# 加载集合
collection.load()

# 定义搜索参数
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}

# 执行搜索
results = collection.search(
    data=[vectors[0]],  # 查询向量
    anns_field="embedding",
    param=search_params,
    limit=5,
    expr=None
)

# 输出结果
for result in results:
    print(result)

使用 collection.load 方法加载集合，使其可供搜索。
定义搜索参数，包括距离度量类型和搜索参数。
使用 collection.search 方法执行向量搜索。data 是查询向量，anns_field 是待搜索的向量字段，param 是搜索参数，limit 指定返回的最大结果数，expr 是可选的过滤表达式。
输出搜索结果。

高级功能

除了基本的操作，PyMilvus 还支持许多高级功能，例如索引管理、集合管理和数据操作等。

1. 创建索引

index_params = {
    "index_type": "IVF_FLAT",
    "params": {"nlist": 100},
    "metric_type": "L2"
}

# 创建索引
collection.create_index(field_name="embedding", index_params=index_params)

使用 create_index 方法为向量字段创建索引。index_params 包括索引类型、参数和距离度量类型。

2. 删除集合

# 删除集合
collection.drop()

使用 drop 方法删除集合。

3. 删除数据

# 删除符合条件的数据
expr = "id in [1, 2, 3]"
collection.delete(expr)

使用 delete 方法删除符合条件的数据。expr 是一个过滤表达式。

示例应用

以下是一个完整的示例应用，展示了如何使用 PyMilvus 执行基本操作和高级功能：

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection
import random

# 连接到 Milvus 服务器
connections.connect("default", host="localhost", port="19530")

# 定义字段
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
]

# 定义集合模式
schema = CollectionSchema(fields, description="测试集合")

# 创建集合
collection = Collection(name="example_collection", schema=schema)

# 生成随机向量数据
vectors = [[random.random() for _ in range(128)] for _ in range(10)]

# 插入数据
collection.insert([vectors])

# 创建索引
index_params = {
    "index_type": "IVF_FLAT",
    "params": {"nlist": 100},
    "metric_type": "L2"
}
collection.create_index(field_name="embedding", index_params=index_params)

# 加载集合
collection.load()

# 定义搜索参数
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}

# 执行搜索
results = collection.search(
    data=[vectors[0]],  # 查询向量
    anns_field="embedding",
    param=search_params,
    limit=5,
    expr=None
)

# 输出结果
for result in results:
    print(result)

# 删除数据
expr = "id in [1, 2, 3]"
collection.delete(expr)

# 删除集合
collection.drop()

总结

通过本教程，我们详细解析了 PyMilvus 的基本操作和高级功能，展示了如何使用 PyMilvus 连接 Milvus 服务器，创建集合，插入数据，执行向量搜索，创建索引，删除数据和集合。PyMilvus 提供了强大的功能来管理和查询向量数据，希望这篇教程对你有所帮助。更多详细信息和示例请参考 PyMilvus 官方文档。