faiss技术积累

最新推荐文章于 2025-04-08 08:30:20 发布

杨树_

最新推荐文章于 2025-04-08 08:30:20 发布

阅读量1.3w

点赞数 2

分类专栏： ANN 文章标签： faiss

本文链接：https://blog.csdn.net/xiaoxu2050/article/details/84982478

版权

ANN 专栏收录该内容

2 篇文章

订阅专栏

Faiss教程：入门 https://www.cnblogs.com/houkai/p/9316129.html

Faiss教程：基础 https://www.cnblogs.com/houkai/p/9316136.html

Faiss教程：GPU https://www.cnblogs.com/houkai/p/9316176.html

Faiss教程：索引(1) https://www.cnblogs.com/houkai/p/9316155.html

源自：https://waltyou.github.io/Faiss-In-Project/

1、建立一个IDMap的索引 “IDMap，Flat”

2、使用gpu索引

#使用单个gpu
res = faiss.StandardGpuResources()  # use a single GPU
# 创建一个cpu版的Flat索引
index_flat = faiss.IndexFlatL2(d)
#将cpu版的索引转换成gpu版  第二个参数用于指定使用那块GPU设备
gpu_index_flat = faiss.index_cpu_to_gpu(res, 0, index_flat)

#添加数据方法和cpu相同
pgu_index_flat.add(xb)
print gpu_index_flat.ntotal

#搜索数据
k = 4 
D, I = gpu_index_flat.search(xq, k)
print I[:5]
print I[-5:]

注意：一个gpu可以被多gpu索引共享，只要它不发生并发请求。

3、使用多个GPU

ngpus = faiss.get_num_gpus()

print "number of GPUs", ngpus

cpu_index = faiss.IndexFlatL2(d)
gpu_index = faiss.index_cpu_to_all_gpus(cpu_index)

gpu_index.add(xb)
print gpu_index.ntotal

k = 4
D, I = gpu_index.search(xq, k)
print I[:5]
print I[-5:]

# prepare index
dimensions = 128
INDEX_KEY = "IDMap,Flat"
index = faiss.index_factory(dimensions, INDEX_KEY)
if USE_GPU:
    res = faiss.StandardGpuResources()
    index = faiss.index_cpu_to_gpu(res, 0, index)

id = 0
index_dict = {}
for file_name in image_list:
    ret, sift_feature = calc_sift(sift, file_name)
    if ret == 0 and sift_feature.any():
        # record id and path
        index_dict.update({id: (file_name, sift_feature)})
        ids_list = np.linspace(ids_count, ids_count, num=sift_feature.shape[0], dtype="int64")
        id += 1
        index.add_with_ids(sift_feature, ids_list)

：

预处理和后期处理 Pre and post processing

https://github.com/facebookresearch/faiss/wiki/Pre--and-post-processing

自定义ID

数据的预处理

index_factory

index_factory通过字符串来创建索引，字符串包括三部分：预处理、倒排、编码。
预处理支持：

PCA：PCA64表示通过PCA降维到64维（PCAMatrix实现）;PCAR64表示PCA后添加一个随机旋转。
OPQ：OPQ16表示为数据集进行16字节编码进行预处理（OPQMatrix实现），对PQ索引很有效但是训练时也会慢一些。

倒排支持：

IVF：IVF4096表示使用粗量化器IndexFlatL2将数据分为4096份
IMI：IMI2x8表示通过Mutil-index使用2x8个bits（MultiIndexQuantizer）建立2^(2*8)份的倒排索引。
IDMap：如果不使用倒排但需要add_with_ids，可以通过IndexIDMap来添加id

编码支持：

Flat：存储原始向量，通过IndexFlat或IndexIVFFlat实现
PQ：PQ16使用16个字节编码向量，通过IndexPQ或IndexIVFPQ实现
PQ8+16：表示通过8字节来进行PQ，16个字节对第一级别量化的误差再做PQ，通过IndexIVFPQR实现

如：
index = index_factory(128, "OPQ16_64,IMI2x8,PQ8+16"): 处理128维的向量，使用OPQ来预处理数据，16是OPQ内部处理的blocks大小，64为OPQ后的输出维度；使用multi-index建立65536（2^16）和倒排列表；编码采用8字节PQ和16字节refine的Re-rank方案。

OPQ是非常有效的，除非原始数据就具有block-wise的结构如SIFT。

如何选择合适的索引

一、是否需要确切结果

是：“Flat”

只有IndexFlatL2或者IndexFlatIP能够保证生成确切的结果。通常情况下它是用来生成其他索引的基线的。"Flat"不会压缩向量，也不会增加额外的开销。不支持add_with_ids功能，只是支持顺序添加。如果需要使用add_with_ids，可以使用“IDMap, Flat”。Flat的索引不需要训练，也没有参数。

二、是否关注内存的占用

1、不关心，使用“HNSWx”

如果数据集很小或者内存很大，基于图的方法HNSW是最好的选择，这种索引方法即快又准。x的取值在[0,64]之间，表示每个向量的连接数，x的值越大结果越准确，但是占用内存越多。通过设置efSearch参数可以权衡速度和准确度。每个向量会占用内存为(d*4 + x*2*4) bytes的内存

Supported on GPU: no

2、有点关心，使用`"...,Flat"`

...表示首先要对数据集进行聚类处理。聚类后，"Flat"将向量组织到相应桶里，过程不存在向量压缩，占用内存数和原始数据大小相同。可以修改nproce参数设置探测桶的数量，权衡检索速度和准确率。

Supported on GPU: yes (but see below, the clustering method must be supported as well)

3、比较重要，使用`"PCARx,...,SQ8"`

如果存储整个向量太过昂贵，这类索引会执行两类操作：

使用PCA进行降维，x为降维后的向量维度
使用标量量化，原向量每个维度都会映射到1Byte

因此该索引输出向量占用x Bytes的内存。

Supported on GPU: no

4、非常重要，使用`"OPQx_y,...,PQx"`

PQx是使用乘积量化器压缩向量，输出x Byte的编码，通常x值小于64，x值越大越准确，检索速度越快。

OPQ是对向量进行线性变换预处理，这样会更容易压缩，y是输出维度，要求如下：

y必须是x的倍数
y最好要小于输出向量的维度d
y最好小于4*x

Supported on GPU: yes (note: the OPQ transform is done in software, but it is not performance critical)

三、需要索引数据集有多大？

这个问题的答案将会决定选择的聚类算法(即如何设定上面...部分)。数据集会被聚类成桶，搜索的时候只有访问一部分的桶。通常情况只会对数据集的代表性样本进行聚类，也就是数据的抽样。这里会说明抽样本的最佳大小。

1、少于1M的向量 `"...,IVFx,..."`

x为聚类中心点的数量，取值大小在【4*sqrt(N)， 16*sqrt(N)】之间，其中N是数据集的大小。该索引只是使用k-means做向量聚类，需要使用30*x到256*x的向量做训练，越多越好。引支持使用GPU

1M - 10M: `"...,IMI2x10,..."`

IMI也是使用K-means方法，生成2^10个中心点，不同的是IMI分别对向量的前半部分和后半部分进行聚类处理。这样生成桶的数量变成2^(2*10)个。训练时需要大概64*2^10向量。引不支持使用GPU

10M - 100M: `"...,IMI2x12,..."`

和上面类似，只是将10变成12。

100M - 1B: `"...,IMI2x14,..."`

和上面类似，只是将10变成14

常见问题汇总

1、IndexIVFFlat或者IndexIVFScalarQuantizer索引，有修改探测分区数量的api，

index.setNumProbes(128)
nprob = index.getNumProbes(128)

而对于通过factory或者faiss.load_index（）生成的索引，例如‘ OPQ64_256,IVF4096,PQ64’，如何修改相同的属性？

答案：可以通过如下代码：

nprob = 10

#cpu搜因
faiss.ParameterSpace().set_index_parameter(index, "nprobe", nprob)

#gpu索引
faiss.GpuParameterSpace().set_index_parameter(gpu_index, "nprobe", nprob)



#或者
faiss.downcast_index(index.index).nprobe = 123

2、如何保存和加载索引？

faiss使用write_index/read_index API保存和加载索引

#创建索引
d = 1024
index = faiss.index_factory(d, "IVF1024,SQ8")

#保存索引
faiss.write_index(index, 'IVF1024_SQ8.index')

#加载索引
new_index = faiss.read_index('IVF1024_SQ8.index')

如果使用的GPU索引，会稍微麻烦一点：

#创建索引
d= 1024
index = faiss.index_factory(d, "IVF1024,SQ8")

#transfer到GPU设备上
gpu_id = 0
res = faiss.StandardGpuResources()
gpu_index = faiss.index_cpu_to_gpu(res, gpu_id, index)

#训练模型
.....

#搜索
....

#保存模型模型前，需要首先transfer到cpu上
index_cpu = faiss.index_gpu_to_cpu(gpu_index)
faiss.write_index(index_cpu, 'IVF1024_SQ8.index')



#加载模型
new_index = faiss.read_index('IVF1024_SQ8.index')

#如果需要在gpu上使用，需要再次transfer到GPU设备上
res = faiss.StandardGpuResources()
gpu_index = faiss.index_cpu_to_gpu(res, gpu_id, index)

#搜索
....

3、StandardGpuResources默认使用18%的GPU显存。如何设置使用显存大小？

res = faiss.StandardGpuResources()
res.setTempMemory(512 * 1024 * 1024)

4、使用omp的多线程

在搜索之前，添加以下代码

faiss.omp_set_num_threads(8)

并行后的加速效果，由Amdahl's law(阿姆达尔定律)决定，参考链接：

https://zh.wikipedia.org/wiki/%E9%98%BF%E5%A7%86%E8%BE%BE%E5%B0%94%E5%AE%9A%E5%BE%8B

5、PQ编码会将子向量编码成多个字节？

需要分别说明：

ProductQuantizer量化器支持长度nbits=1和长度nbits=16之间的任何编码，当编码长度是1-8个bits时将使用1个字节编码，当编码长度为9-16bits时会使用2个字节，所以如果nbits是8或者16时，内存空间是存在浪费的
IndexPQ和ProductQuantizer支持相同
IndexIVFPQ仅支持8位的量化编码
MultiIndexQuantizer支持最多16位的编码

'IVF1000,PQ8'，其中PQ8表示原始向量会被压缩成8个字节。对于PQx的写法，生成的子向量都是会被压缩成8bits，即一个字节的。

6、d = 1000 ，使用索引 'IVF1000,PQ16' 提示如下错误 “RuntimeError: Error in void faiss::ProductQuantizer::set_derived_values() at ProductQuantizer.cpp:164: Error: 'd % M == 0' failed” ，原因是什么？

貌似要求输出向量M值必须能够被d整除

7、d= 1000，使用索引“"IVF1024,PQ250"”，提示如下错误？

RuntimeError: Error in void faiss::gpu::GpuIndexIVFPQ::verifySettings_() const at GpuIndexIVFPQ.cu:438: Error: 'IVFPQ::isSupportedPQCodeLength(subQuantizers_)' failed: Number of bytes per encoded vector / sub-quantizers (250) is not supported

貌似要求原向量占用字节数(1000*4) / 生成向量字节数（250） = 16，这个比值faiss不支持，为什么呢？

8、"OPQ16_512,IVF1024,PQ64" 提示错误

RuntimeError: Error in void faiss::gpu::GpuIndexIVFPQ::verifySettings_() const at GpuIndexIVFPQ.cu:462: Error: 'requiredSmemSize <= getMaxSharedMemPerBlock(device_)' failed: Device 0 has 49152 bytes of shared memory, while 8 bits per code and 64 sub-quantizers requires 65536 bytes. Consider useFloat16LookupTables and/or reduce parameters