milvus插入数据时,明明不超长,但总是报长度错误?

在处理插入milvus数据时,设置了字段长度为512. 明明考虑了预留,插入的数据中没有这么长的,但还是会有报错 类似:MilvusException: (code=0, message=the length (564) of 78th string exceeds max length (512)
查找max(len(x) for x in temp_list)之类  都没有超过512过,也没超过256过,不知道哪里的数据有问题..
反复截段文本等测试后发现,例如用len(x)看到的字符串长度是10,但保存进milus的长度,并不是..

举例,把数据库长度设为一个小值16:
FieldSchema(name="question", dtype=DataType.VARCHAR, auto_id=False, max_length=16)

再把数据缩到只有一行 测试结果插入成功:

line contents is : 你好呀你好 and length is 5
Batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.02s/it]
index handle result: Status(code=0, message=)
insert result: (insert count: 1, delete count: 0, upsert count: 0, timestamp: 449735609509740549, success count: 1, err count: 0)

再增加一点文字长度 就报错了:

line contents is : 你好呀你好呀 and length is 6
Batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.03it/s]
index handle result: Status(code=0, message=)
[2024-05-13 20:59:27,915 decorators.py:134                              ERROR] RPC error: [batch_insert], <MilvusException: (code=0, message=the length (18) of 0th string exceeds max length (16))>, <Time:{'RPC start': '2024-05-13 20:59:27.912751', 'RPC error': '2024-05-13 20:59:27.915058'}>
Traceback (most recent call last):
  File "/root/temp_dir/run_task.py", line 55, in <module>
    XXX().create_insert_vector_db()
  File "/root/temp_dir/app/service/vector_db/xx_pre_handle.py", line 63, in create_insert_vector_db
    ).get_or_create_db(fields, description, "possible_question_embeddings", entities)
  File "/root/temp_dir/app/service/vector_db/milvus_db.py", line 23, in get_or_create_db
    return self.create_and_insert(fields, description, index_field_name, entities)
  File "/root/temp_dir/app/service/vector_db/milvus_db.py", line 28, in create_and_insert
    self.insert_db(entities)
  File "/root/temp_dir/app/service/vector_db/milvus_db.py", line 40, in insert_db
    insert_result = self.collection.insert(entities)
  File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 497, in insert
    res = conn.batch_insert(
  File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 135, in handler
    raise e from e
  File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 131, in handler
    return func(*args, **kwargs)
  File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 170, in handler
    return func(self, *args, **kwargs)
  File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 110, in handler
    raise e from e
  File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 74, in handler
    return func(*args, **kwargs)
  File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 566, in batch_insert
    raise err from err
  File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 560, in batch_insert
    check_status(response.status)
  File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/client/utils.py", line 54, in check_status
    raise MilvusException(status.code, status.reason, status.error_code)
pymilvus.exceptions.MilvusException: <MilvusException: (code=0, message=the length (18) of 0th string exceeds max length (16))>


所以,可能是因为UTF-8或其他编码的原因,一些非ASCII字符可能被编码成多个字节 以保存进milvus。
所以,解决方案是 建表时FieldSchema中把max_length 设置为4倍或其他倍数于预期的最大长度。


 

Milvus 是一个开源的高性能向量数据库,专为大规模的向量数据(如深度学习中的特征表示)提供存储和搜索服务。在 Python 中,你可以使用 Milvus SDK 来方便地对数据进行向量化操作,以下是一些基本步骤: 1. **安装 Milvus**:首先,你需要从 Milvus 的 GitHub 仓库或 PyPI(Python Package Index)安装 Milvus SDK,例如使用 pip: ``` pip install milvus ``` 2. **连接 Milvus**:创建 Milvus 接口对象并连接到服务器,如果本地运行,通常是 localhost 和默认端口(19530): ```python from milvus import Milvus milvus = Milvus(host="localhost", port=19530) ``` 3. **加载数据**:将 Python 列表或数组转换为向量数据,通常是 numpy 数组,然后构建索引: ```python import numpy as np vectors = np.random.rand(100, 128) # 假设我们有100个128维向量 collection_name = "my_collection" vector_field_name = "vector_field" if not milvus.has_collection(collection_name): # 创建集合和向量字段 schema = {"fields": [{"name": vector_field_name, "type": "FLOAT_VECTOR", "dim": 128}]} milvus.create_collection(schema, collection_name) # 插入数据 milvus.insert(collection_name, vectors) ``` 4. **向量化搜索**:使用查询向量执行相似度搜索,例如使用 `IVF` + `FLAT` 或 `HNSW` 等搜索方法: ```python query_vector = np.random.rand(128) top_k = 10 params = {"nprobe": 32} results = milvus.search(collection_name, query_vector, top_k, params) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值