向量搜索和Faiss 使用指南

九城风雪

于 2024-01-07 20:52:13 发布

阅读量289

点赞数

分类专栏：机器学习算法文章标签： faiss

原文链接：https://zhuanlan.zhihu.com/p/595249861

版权

机器学习算法专栏收录该内容

15 篇文章 5 订阅

订阅专栏

FAISS 是 Facebook 推出的向量搜索库，里面提供了高性能的向量搜索工具。

原文传送门

James Briggs, Faiss: The Missing Manual

（基本上是照着这个 blog 简化讲的）

[IVF] D. Baranchuk, et al.,Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors(2018),ECCV

[PQ1] Y. Matsui, et al.,A Survey of Product Quantization(2018),ITE Trans. on MTA

[OPQ] T. Ge, et. al.,Optimized Product Quantization(2014),TPAMI

[PQ2] H. Jégou, et al.,Product quantization for nearest neighbor search(2010),TPAMI

[IMI] A. Babenko, V. Lempitsky,The Inverted Multi-Index(2012),CVPR

[ReRank] H. Jégou, et al.,Searching in One Billion Vectors: Re-rank with Source Coding(2011),ICASSP

[HNSW1] Y. Malkov, D. Yashunin, Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs (2016), IEEE Transactions on Pattern Analysis and Machine Intelligence

[HNSW2] Y. Malkov et al., Approximate Nearest Neighbor Search Small World Approach (2011), International Conference on Information and Communication Technologies & Applications

[HNSW3] Y. Malkov et al., Scalable Distributed Algorithm for Approximate Nearest Neighbor Search Problem in High Dimensional General Metric Spaces (2012), Similarity Search and Applications, pp. 132-147

[HNSW4] Y. Malkov et al., Approximate nearest neighbor algorithm based on navigable small world graphs (2014), Information Systems, vol. 45, pp. 61-68

向量搜索是什么？

给定一个向量数据库 �:={�1,�2,⋯,��} ，和一个待查询的向量 ��∈�� ,从 � 个向量里面找到距离 � 某种距离（比如 L2 距离）最近的 � 个向量。

其应用包括

从语料库里面找到距离某个语句最相近的一句话。
从图片库里面找到距离某张图片最类似的一张图片。
还能查找别的，比如视频、音频、动图、基因序列、搜索条目等。

这些东西（图片、词语、句子、视频等）都可以用向量表示出来

把词语用向量表示出来

这个事情看起来很简单，但是当我们的数据库变得特别大时（比如上亿），这件事情就变得比较困难了。因此这里就专门来研究如何做这样的向量搜索。

Faiss 是什么？

Faiss 是 Facebook AI 开发的用于高效的近邻搜索的库。

其基本工作流程是：给定一个向量数据库，先用 Faiss 来索引（index）它们。这样，在给定一个向量时，就可以用这个索引进行近邻查询。

为什么会有不同的方法？

有些算法是近似方法，而不同的方法可以平衡 1）准确性；2）查询速度；3）内存占用。

从方法论上来说，提速有两个途径：

减小向量的大小：可以通过各种降低维度、减少表示向量的比特数等方法。
减少搜索的范围：通过把向量聚类、组织成树结构等方法，这样只需要查找离得比较近的少量样本即可。

下面就来介绍一些不同的方法。

一、最直接的方法——IndexFlatL2

如下图所示，一个最直接的方法就是计算待查询向量 xq与数据集中的所有向量 y的距离，然后排序找到最近的 k个。

使用 Faiss 的实现方法如下

import faiss
index = faiss.IndexFlatL2(d)
index.add(X)  # X is a Nxd matrix
D, I = index.search(xq, k)  # xq is a Mxd matrix to query M vectors

# we have 4 vectors to return (k) - so we initialize a zero array to hold them
vecs = np.zeros((k, d))
# then iterate through each ID from I and add the reconstructed vector to our zero-array
for i, val in enumerate(I.tolist()):
    vecs[i, :] = index.reconstruct(val)

返回的 D 和 I 分别为最近 k 个的距离和 index。后续可以通过 reconstruct 函数传入 index，索引找到最近邻向量们的值。

L2 distance calculation between a query vector xq and our indexed vectors (shown as y)

该方法查找到的最近邻肯定是最准确的，但是其问题在于每次查找的时间会随着数据库中向量的个数线性增长。

Euclidean (L2) and Inner Product (IP) flat index search times using faiss-cpu on an M1 chip. Both using vector dimensionality of 100. IndexFlatIP is shown to be slightly faster than IndexFlatL2.

其中 FlatIP 这里没讲到，但是其实也是一个穷举搜索的方法，只不过是找内积最大的向量。这里 IP 指的是 inner product。

这一类方法适用的场景是：

对于精度要求非常高
对于搜索时间不是很关注，或者数据集比较小（<10k）

总结：

该方法搜索质量最高，但是速度慢、占用内存大。

Flat indexes come with perfect search-quality at the cost of slow search speeds. Memory utilization of flat indexes is reasonable.

二、一个提升速度的简单优化——Inverted File Index（IVF）

一个简单的提速方法就是先把数据库中的向量划分成多个不同的格子（Voronoi cells），而每个格子都有一个中心点（centroid）。要查找某个向量的最近邻时，先从这些中心点里面找到一个最近的，然后再在相应的格子里面找到最近的向量。

We can imagine our vectors as each being contained within a Voronoi cell — when we introduce a new query vector, we first measure its distance between centroids, then restrict our search scope to that centroid’s cell.

实现方法如下

nlist = 50  # how many cells
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)
index.train(X)
index.add(X)
D, I = index.search(xq, k)

index.make_direct_map()
index.reconstruct(I[0])

注意，这里多了一个训练的过程。在这个过程中，会划分出 nlist 个格子，并且计算出每个格子的中心点。由于每个原始向量会被先划分到不同的格子里面，因此要想通过 index 返回找到原始向量之前，需要先跑一下 make_direct_map 才行。

搜寻方法大致如下：给定一个待查询的向量，然后再诸多格子中心点中找到那个最近的中心点，接着再再这个格子内的向量中找近邻。

当然，这个方法是近似的。当待查询的向量靠近格子的边缘时，最近邻搜寻的结果会变得不太准。如下面这幅图所展示的，这个待查询的点只会搜寻红色格子内的向量，但是即使它距离绿色格子中的某些向量也很近，但是也不会去搜索，从而可能错过最近的点。

Our query vector xq lands on the edge of the magenta cell. Despite being closer to datapoints in the teal cell, we will not compare these if nprobe == 1 — as this means we would restrict search scope to the magenta cell only.

一个补救的方法就是不仅仅只是搜索那一个中心点离得最近的格子，而是可以多搜索一些临近的格子。这件事情可以通过调整nprob 参数来实现。比如下面设置了 nprobe=8，这样就可以搜索最近的 8 个格子。

实现方法如下

nlist = 50  # how many cells
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)
index.train(X)
index.add(X)
index.nprobs = 4
D, I = index.search(xq, k)

该方法中最重要的两个参数是

nprobe：搜索的时候在最近的多少格子中查找；
nlist：构建索引的时候创建多少个格子；

nlist 越大，虽然比较的中心点数量会多，但是每个格子中的向量会减小；一般来说，这样可以提高查询速度。

当然，nprobs 越大，查询所需要花费的时间也会越多，搜索质量也会越高。但是总的来说，还是会比原本的 IndexFlatL2 快多了。

Search-time and recall for IVF using different nprobe and nlist values.

Query time / number of vectors for the IVFFlat index with different nprobe values — 1, 5, 10, and 20

关于内存占用，nprobe 完全不会影响内存占用，nlist 也对内存的影响非常小。

Memory usage of the index is affected only by the nlist parameter. However, for our Sift1M dataset, the index size changed only very slightly.

总结起来说，IVF 在牺牲一定搜索质量的前提下，能够达到较好的速度和内存占用。

IVF — great search-quality, good search-speed, and reasonable memory usage. The ‘half-filled’ segments of the bars represent the range in performance encountered while modifying index parameters.

三、一个减少内存占用的优化——Product Quantization

有时候我们还关注 index 的内存占用，可能把所有的向量存起来都太大了，那么我们可以使用 PQ 方法来减小向量存储的内容。这个做法大致上如下图所示，以下图为例

Three steps of product quantization

首先，会把 d 维的向量划分为 m=5 个子向量；
然后，对于每一个子向量，会在所有样本上对其进行聚类，聚成 2^(bits)=28=256
最后，相应的向量只需要存储子向量中心点的 ID 即可。

实现方法如下：

m = 8  # number of centroid IDs in final compressed vectors
bits = 8 # number of bits in each centroid

quantizer = faiss.IndexFlatL2(d)  # we keep the same L2 distance flat index
index = faiss.IndexIVFPQ(quantizer, d, nlist, m, bits) 

index.train(X)
index.add(X)
index.nprobe = 10
D, I = index.search(xq, k)

当然，随着 PQ 的加入，运行时间减小了，但是运行的精度也会受损。PQ 对于运行时间的优化，在向量数据库更大的时候效果会更好。

下图是查询时间随着数据集大小的变化关系。

Query time / number of vectors for our three indexes

四、适用于低维向量的方法——Locality Sensitive Hashing（LSH）

一般来说，给定一个待查询向量 xq，要找到和它最近邻的向量，需要 O(N) 的复杂度，其中 N 是向量数据库的大小，即要把它和几乎每一个向量都比较一次。但是这里讲的 LSH 的方法，只需要 O(1) 的复杂度！

之所以需要这么大的复杂度，是因为我们之前需要把待查询变量 xq 和向量数据库里面的每个向量都作比较。但这其实没必要！我们应该只需要和那些可能成为最近邻的几个向量做比较，即 candidate pairs。

在讲 LSH 之前，先回顾一下哈希，这个大家应该都或多或少听说过。比如在 Python 中 dict 就可以看成一个哈希表。通常构建这个哈希表，我们是希望减小碰撞的概率，即减小两个不同的 key 被映射到同一个 hash bucket 上的概率。这样， key 就能够唯一对应存储了 value 的 hash bucket。

但是在最近邻查找的 LSH 方法中，我们并不是希望完全避免碰撞，我们希望能够把相似的向量都映射到同一个 hash bucket 中。我们会维护若干个这样的哈希函数。在给定一个待查询向量 xq 时，我们只需要查找被任意一个哈希函数映射到相同的 hash bucket 中的其他向量即可，从而减少了查找的范围并且更快速地找到最近邻。

具体如何实现最近邻查找呢？方法有很多。

这里介绍一种 shingling - MinHashing - banding 的实现方式。（该部分讲解比较细致，不感兴趣的可以跳过）

大致的流程如下，后面会依次讲解：

A high-level view of the LSH process we will be working through in this article.

1、k-Shingling

给定一段文字，和一个滑动窗口的长度，把相邻的两个字符组合起来，形成一个 shingles 的集合。

k-Shingling consists of moving through a string and adding k characters at a time to a ‘shingle set’.

用 python 来描述就是这样的

def shingle(text: str, k: int):
    shingle_set = []
    for i in range(len(text) - k+1):
        shingle_set.append(text[i:i+k])
    return set(shingle_set)

a = "flying fish flew by the space station"
a = shingle(a, k)

输出：{'y ', ‘pa’, ‘ng’, ‘yi’, ‘st’, ‘sp’, ‘ew’, ‘ce’, ‘th’, ‘sh’, ‘fe’, ‘e ‘, ‘ta’, ‘fl’, ’ b’, ‘in’, ‘w ‘, ’ s’, ’ t’, ‘he’, ’ f’, ‘ti’, ‘fi’, ‘is’, ‘on’, ‘ly’, 'g ', ‘at’, ‘by’, 'h ', ‘ac’, ‘io’}

接下来需要把语料库里面所有的这些 shingles 的并集作为 vocab。

当一个待查询向量 xq 给定后，我们就可以依据这个 vocab 来得到一个关于这个查询变量的 one-hot 表示。（个人感觉，这个表述应该不准确，应该叫做 0-1 向量。）

vocab = list(a.union(b).union(c)...)  # a, b, c, ... are shingle sets
xq = "some sentense"
xq = shingle(xq, k)
xq_1hot = [1 if x in xq else 0 for x in vocab]

2、Minhashing

现在已经得到了一个稀疏的 0-1 向量，下面要把它转化为一个 dense 向量，或者叫做 signature。过程比较难以描述，直接上代码。

from random import shuffle

def create_hash_func(size: int):
    # function for creating the hash vector/function
    hash_ex = list(range(1, size+1))
    shuffle(hash_ex)
    return hash_ex

def build_minhash_func(vocab_size: int, nbits: int):
    # function for building multiple minhash vectors
    hashes = []
    for _ in range(nbits):
        hashes.append(create_hash_func(vocab_size))
    return hashes

def create_hash(vector: list):
    # use this function for creating our signatures (eg the matching)
    signature = []
    for func in minhash_func:
        for i in range(1, len(vocab)+1):
            idx = func.index(i)
            signature_val = vector[idx]
            if signature_val == 1:
                signature.append(idx)
                break
    return signature

# create 20 minhash vectors
minhash_func = build_minhash_func(len(vocab), 20)
# generate signature
xq_sig = create_hash(xq_1hot)

下图就展示了如何从一个 shingled sparse vector (1, 0, 0, 1, 0, 1) 和一些 hash functions，转化为 signature 的过程。

截止到目前，我们可以把一个向量映射为几乎等长的整数向量，每个维度都是 1~len(vocab) 之间的一个整数。

3、Banding

下面为了找到和待查询向量相似（而不是完全一样）的向量，我们考虑把 signature 切成小段，然后再使用另一个哈希函数把这些小段映射到 hash buckets 上。最后比较在小段上相同的向量。任意小段能匹配上待查询向量的向量，都会被作为 candidates 进入最后的最近邻筛选。

具体实现如下

def split_vector(signature, b):
    assert len(signature) % b == 0
    r = int(len(signature) / b)
    # code splitting signature in b parts
    subvecs = []
    for i in range(0, len(signature), r):
        subvecs.append(signature[i : i+r])
    return subvecs

band_a = split_vector(a_sig, 10)
band_xq = split_vector(xq_sig, 10)

for a_rows, xq_rows in zip(band_a, band_xq):
    if a_rows == xq_rows:
        print(f"Candidate pair: {a_rows} == {xq_rows}")
        # we only need one band to match
        break

We split our signature into b sub-vectors, each is processed through a hash function (we can use a single hash function, or b hash functions) and mapped to a hash bucket.

We split the signatures into subvectors. Each equivalent subvector across all signatures must be processed through the same hash function. However, it is not necessary to use different hash functions for each subvector (we can use just one hash function for them all).

最后的做法是需要把待查询向量 xq 和数据库中的向量都做这样的比较，找出能匹配上的，然后再匹配得上的这些向量中找最近邻。实际做法中，数据库中各个向量的 hashing 过程（上图左边）可以预先做好，这样就能在 O(1) 时间内实现一次查找了。

4、效果

那这样操作的效果如何呢？我们可以这样检验一下，是不是相似度较高的 pair 都能别识别为 match。

Chart showing the distribution of candidate-pairs (1s) and non-candidates (0s) against the cosine similarity of pair signatures.

从这张图可以看出，基本上最相似的 pairs 都能够匹配上。

同时，我们可以通过调整参数 b，来实现召回数量和被漏召回的平衡。一个样本被找回的概率满足下面公式：

这个图大概长这样

b 数值越大被召回的数量越多，但是后续的计算时间成本也越高。

Calculated probability P against similarity s for different b values. Note that r will be len(signature) / b (in this case len(signature) == 100).

这个曲线其实调节了 false negative （FN）和 false positive （FP）之间的关系。

Increasing b (shifting left) increases FPs while decreasing FNs.

最后，我们来看一下，这样的最近邻查找方法在 Faiss 中应该如何实现呢？实现方法如下

nbits = d*4  # resolution of bucketed vectors
# initialize index and add vectors
index = faiss.IndexLSH(d, nbits)
index.add(X)
# and search
D, I = index.search(xq, k)

这里的 nbits 到底跟前面的咋对应，我没有太搞明白。我猜 nbits 是和前面的 b 类似（成正比），所以如果要保证足够的精度，需要要求 nbits 与 d 成正比。而 nbits 太大时，计算会变得很慢。

总结一下：

该方法能覆盖下面这样一些范围，但只能适用于向量维度 < 100 的情形。

LSH — a wide range of performances heavily dependent on the parameters set. Good quality results in slower search, and fast search results in worse quality. Poor performance for high-dimensional data. The ‘half-filled’ segments of the bars represent the range in performance encou

五、一个高精度、省时间方法——Hierarchical Navigable Small Worlds (HNSW)

在讲这个之前，先科普一下小世界（small world）：每个人通过少量的几个中间人就可以和全世界几乎任何一个人认识。比如在 2016 年的 facebook 上有 15.9 亿用户，但是每个用户平均只需要通过 3.57 步就可以接触到 facebook 上的任意一个用户（每一步指的是直接好友关系；即，只需要 2.57 个中间好友，就可以把他们联系起来）。

Example of a navigable small-world graph, all nodes within the graph are connected by a small number of edge traversals. Small world graph theory assumes the same to be true even for huge networks with billions of vertices.

而 HNSW 就是把数据库向量分成若干层，通过每一层的查找，都可以减少一个到达最近邻点的中间点。

With HNSW, we break networks into several layers, which are traversed during the search.

实现方法如下：

# set HNSW index parameters
M = 64  # number of connections each vertex will have
ef_search = 32  # depth of layers explored during search
ef_construction = 64  # depth of layers explored during index construction

# initialize index (d == 128)
index = faiss.IndexHNSWFlat(d, M)
# set efConstruction and efSearch parameters
index.hnsw.efConstruction = ef_construction
index.hnsw.efSearch = ef_search
# add data to index
index.add(X)

# search
t
D, I = index.search(xq, k)

其中最重要的事如下三个参数：

M：构建 HNSW 的时候，每个向量和多少个最近邻相连接。
efSearch：在搜索的时候，每层查询多少个点。
efConstruction：在构建图的时候，每层查找多少个点。

当然，这三个参数越大搜索效果越好。

M 和 efSearch 越大，会使得搜索时间越长；而 efConstruction 越大，会使得构建索引的时间越长。

但是 HNSW 也有一个缺点，那就是当 M 比较大的时候，会消耗很多的内存。

Index memory usage for different M values on the Sift1M dataset.

不过 efSearch 和 efConstruction 不会影响到内存使用。我们可以通过调整这两个参数来平衡搜索速度和搜索质量。

总结一下，该方法在速度、内存占用方面非常优秀。几乎是在高维向量查找任务中目前最好的方法。

HNSW — great search-quality, good search-speed, but substantial index sizes. The ‘half-filled’ segments of the bars represent the range in performance encountered while modifying index parameters.

六、在 Faiss 中组合使用这些方法

在 Faiss 中，可以像拼积木一样把不同的方法串起来，形成一个适用于特定任务的最近邻查找方法，即复合索引（composite index）。

一般来说，复合索引包含这几个部分：

Vector transform：在新建索引之前，把向量进行一次预处理；
Coarse quantizer：把数据库中的向量进行聚类、分层等，以方便缩小查找的范围；
Fine quantizer：把向量本身切分成更小的部分，以减小内存占用；
Refinement：在最后会重新在原始空间上计算距离，并且重新排序；当然这一步也可以有一些别的选项。

不同的部分可以选用不同的方法，其中包括我们前面讲到的 Flat、IVF、HNSW、PQ、LSH 等，也包括一些这里没有提到的方法。

在 Faiss 中可以提供这些方法，连接起来使用：

d = xb.shape[1]
m = 32
nbits = 8
nlist = 256

# we initialize our OPQ and coarse+fine quantizer steps separately
opq = faiss.OPQMatrix(d, m)
# d now refers to shape of rotated vectors from OPQ (which are equal)
vecs = faiss.IndexFlatL2(d)
sub_index = faiss.IndexIVFPQ(vecs, d, nlist, m, nbits)
# now we merge the preprocessing, coarse, and fine quantization steps
index = faiss.IndexPreTransform(opq, sub_index)
# we will add all of the previous steps to our final refinement step
index = faiss.IndexRefineFlat(q)

# train the index, and index vectors
index.train(xb)
index.add(xb)

也提供了 index_factory 来比较快速地构建

d = xb.shape[1]
# in index string, m==32, nlist==256, nbits is 8 by default

index = faiss.index_factory(d, "OPQ32,IVF256,PQ32,RFlat")

# train and index vectors
index.train(xb)
index.add(xb)

这里介绍几个比较好用的复合索引

1、IVFADC（IVF256,PQ32x8)

IVF256,PQ32 前面基本上已经讲过了，基本上就是下图所描述的一个过程

这里还使用了一个 Asymmetric Distance Computation （ADC）技术

With symmetric distance computation (SDC, left) we quantize xq before comparing it to our previously quantized xb vectors. ADC (right) skips the quantization of xq and compares it directly to the quantized xb vectors.

使用方法如下：

index = faiss.index_factory(d, "IVF256,PQ32x8")
index.train(xb)
index.add(xb)
index.nprobe = 8
D, I = index.search(xq, k)

这里使用了 256 个 IVF 格子（Voronoi cells），PQ 技术中把向量划分为 m=32 个子向量，使用 nbits=8 （每个格子压缩成 8 bits）。

结果如下

2、Optimized Product Quantization

OPQ 是一个预处理技术，它可以先把向量进行旋转，从而使得后续被划分为若干子向量的时候每个子向量的分布会更好一些。

使用方法如下：

# we can add pre-processing vector rotation to
# improve distribution for the PQ step using OPQ
index = faiss.index_factory(d, "OPQ32,IVF256,PQ32x8")
index.train(xb)
index.add(xb)

ivf = faiss.extract_index_ivf(index)
ivf.nprobe = 13

D, I = index.search(xq, k)

OPQ 后面的 32 合适和 PQ 里面的 32 对应的，表示把向量划分为 32 个子向量。

PQ 和 OPQ 的效果如下：

Search time (top) and recall (bottom) for various nprobe values. We have included "IVF256,Flat" for comparison. The flat index has much higher memory usage at 520MB.

3、Multi-D-ADC

全称是 multi-dimensional indexing, alongside a PQ step which produces anasymmetricdistancecomputation at search time。

该方法使用 coarse quantizer 的方法是 IMI（inverted multi-index）。它是 IVF 的一个扩展，在 recall 和 speed 上都会更好，但会增加内存的使用量。其大致的想法和 IVF 类似，也是划分出很多格子，然后先找最近邻，再在相应的格子内去查找。唯一不同的就是，这里会划分出多套格子，每一套格子都是在向量空间的一个不同的子空间上进行的。

Voronoi cells split across multiple vector subspaces. Given a query vector xq, we would compare each xq subvector to its respective subspace cells.

使用方法如下

index = faiss.index_factory(d, "IMI2x8,PQ32")
index.train(xb)  # index construction time is large for IMI
index.add(xb)
imi = faiss.extract_index_ivf(index)  # access nprobe
imi.nprobe = 620
D, I = index.search(xq, k)

实验结果如下

Search time (top) and recall (bottom) for various nprobe values. We have included "IMI2x8,Flat" for comparison. The flat index has much higher memory usage at 520MB.

4、HNSW+IVF

前面我们讲了 IVF 和 HNSW，但是我们是否可以把两种方法结合到一起呢？

答案是可以的！

我们先对 IVF 中找到的格子中心点（centroid）建立一个 HNSW 的图，这样本来需要先在所有中心点找最近的那个，现在也变成了用 HNSW 来近似找到最近邻的中心点。

当然，要使得这个方法能够发挥最大的作用，我们需要把原本的“少中心大格子”转变为“多中心小格子”。

HNSW can be used to quickly find the approximate nearest neighbor using IVF cell centroids.

使用方法如下

index = faiss.index_factory(d, "IVF4096_HNSW32,Flat")
index.train(xb)
index.add(xb)
index.nprobe = 146
D, I = index.search(xq, k)

效果如下

Search time (top) and recall (bottom) for various nprobe values. At the cost of longer search times, we can increase recall by decreasing nlist.

七、总结

在 Sift1M dataset 数据集上，使用 M1 chip with 8-core CPU、8GB unified memory 跑，可以得到下面的结果。

向量维度 d=128，数据集大小=1M，查找 k=10 近邻。

Index	Memory (MB)	Query Time (ms)	Recall	Notes
Flat	~500	18	1.0	适用于查询时间要求不高的小数据集
IVF	~520	1 - 9	0.7 - 0.95	一个扩展性比较高的方案
LSH	20 - 600	1.7 - 30	0.4 - 0.85	对于低维向量最好的方案
HNSW	600 - 1600	0.6 - 2.1	0.5 - 0.95	适用于对于精度和速度要求高的场景，但是费内存