qdrant学习之文件参数配置

二人天

已于 2024-07-24 14:13:27 修改

阅读量791

点赞数 9

文章标签：学习服务器 python rust

于 2024-07-24 14:10:57 首次发布

本文链接：https://blog.csdn.net/huy123444/article/details/140661706

版权

qdrant配置文件内容:
{"params":“vectors": "text-dense":{ "size" :768," distance” :" Cosine” ,hnsw_config":{ “m" :32" ef_construct" :128}))," shard _number" :2,"replication_factor" :1'write consistency factor" :1" on disk payload" :true," sparse_vectors" :{ “text-sparse" :{“index”:})},"hnsw config":{“m” :16," ef construct" :100,"full scan threshold" :10000," max indexing threads" :0," on disk" :false}"optimizer config" :{ "deleted threshold" :0.2," vacuum min vector number" :1000,default segment number" :0," max segment size" :null," memmap threshold" :null,indexing threshold" :20000," flush interval sec" :5," max optimization threads" :1)"wal config":{"wal capacity mb" :32" wal segments ahead" :0}"quantization config”:null}

一些重要参数与解读
size:存储vector的维度，目前知识库已经基本建立，如果不出大问题就不做修改。hnsw_config.m:插入向量后与已有的图所产生的边的最大数量(邻居数量)，数量多方便精准搜索但是资源消耗增加(邻居的数量)
hnsw_config.ef_construct:构建图时，会搜索最近邻然后产生边，该参数定义了构建图时所搜索的节点数量。(找邻居)
shard_numeber:colletion所含的分片数量，shard可以帮助qdrant将数据进行分散存储，切多shard可以提高qdrant查询的并行能力，shard还可以将collection中的数据进行隔离，保证一定程度的安宅。
replication factor:shard的副本数量，目前我们是0副本，这很危险，如果数据损坏无法恢复，且shard的副本同样也可以提升检索的效率，但是损耗资源。
write_consistency_factor:指定写入操作必须写入多少的副本才认为该操作是成功的。
hnsw_config.ful_scan_threshold:在一定范围内,什么搜索算法搜不如暴力全搜索，所以没有超过
这个数值，qdrant会选择暴力搜索。
hnsw config.on disk:是否将索引存储在磁盘
optimizer_config.deleted threshold:对于一个segment,当其中的向量删除的阈值为20%
qdrant会对该segment进行优化，可能是合并?
optimizer_config.vacuum_min_vector_number:当一个segment中的向量数量超过这个数，就会启
用vacuum优化，标记删除从而减少iooptimizer_config.max_segment number:超过这个数值，会自动合并最小的segmentoptimizer_config.memmmp threshold:超过这个数值的向量(Kilobytes?)会存储在磁盘
optimizer_config.indexing threshold:超过这个数值会使用vetor_indexing