faiss-3: 基础模块

faiss中的索引基于几个基础算法构建,只不过在faiss中是一种高效的实现。分别是

  • k-means聚类
  • PCA降维
  • PQ编码
  • 解码

k-means聚类

关于k-means聚类算法在之前的博客《OpenCV中的Kmeans聚类》中有介绍过,是一种无监督算法,自动的基于设定的聚类中心数进行聚类。

# 导入faiss
import sys
import faiss
import numpy as np 

# 构建数据
d = 512  # 维数
n_data = 2000   
np.random.seed(0) 
data = []
mu = 3
sigma = 0.1
for i in range(n_data):
    data.append(np.random.normal(mu, sigma, d))

data = np.array(data).astype('float32')

# 聚类
ncentroids = 100  # 设定聚类中心100个
niter = 20
verbose = True
d = data.shape[1]
kmeans = faiss.Kmeans(d, ncentroids, niter=niter, verbose=verbose)
kmeans.train(data)

# 输出聚类中心
print(kmeans.centroids)
print(kmeans.centroids.shape)  # 聚类中心维度
[[3.0195723 3.0102541 3.0154822 ... 3.0012543 2.983653  2.9833684]
 [2.963544  2.9899907 2.9942687 ... 2.989136  2.9432213 2.9964879]
 [3.038704  2.9829412 3.0110557 ... 2.9955654 3.0334764 3.0621648]
 ...
 [3.1335988 3.0283628 3.0185556 ... 2.9817123 2.8893135 2.846763 ]
 [3.0277164 2.9488263 2.9712245 ... 3.046851  3.0523338 2.9560134]
 [2.9731872 2.9688954 3.0551343 ... 3.0077536 3.038112  3.0422602]]
(100, 512)

结果保存在kmeans对象中:

  • kmeans.centroids:聚类中心
  • kmeans.obj:迭代时候目标函数的值(总方差)

更多参数选择:

  • nredo:运行此次数的群集,并保持最佳质心(根据聚类目标选择)

  • verbose:使聚类更详细

  • spherical:执行球形K-means,每次迭代后,质心为L2标准化

  • INT_CORDROID:圆形心质坐标取整

  • update_index:每次迭代后重新训练索引?

  • min_points_per_centroid / max_points_per_centroid:以上训练集是分支的

  • seed:随机数发生器的种子

计算某个向量属于哪一个子类,返回聚类中心次序和L2距离

D, I = kmeans.index.search(data[:5], 1)
print(D)
print(I)
[[5.027096 ]
 [4.712036 ]
 [4.7764983]
 [4.5557623]
 [4.403679 ]]
 
[[61]
 [44]
 [61]
 [49]
 [83]]
# 返回距离每个聚类中心最近的5个向量
index = faiss.IndexFlatL2 (d)
index.add(data)
D, I = index.search (kmeans.centroids, 5)
print(D)
print(I)
[[3.8203125e+00 4.3466797e+00 4.3652344e+00 4.3691406e+00 4.3789062e+00]
 [3.6269531e+00 4.0800781e+00 4.1953125e+00 4.2304688e+00 4.2705078e+00]
 [3.7167969e+00 4.0781250e+00 4.1015625e+00 4.1796875e+00 4.2685547e+00]
 [3.4667969e+00 3.8759766e+00 4.1923828e+00 4.4365234e+00 4.5742188e+00]
 [0.0000000e+00 9.0585938e+00 9.0957031e+00 9.1376953e+00 9.1484375e+00]
 [3.7412109e+00 4.3515625e+00 4.4199219e+00 4.5517578e+00 4.5849609e+00]
 [3.8154297e+00 4.2207031e+00 4.4433594e+00 4.5625000e+00 4.5986328e+00]
 [3.5927734e+00 3.9169922e+00 4.3535156e+00 4.4375000e+00 4.4453125e+00]
 [3.9101562e+00 4.3496094e+00 4.3554688e+00 4.4189453e+00 4.4384766e+00]
 [0.0000000e+00 8.6308594e+00 8.8496094e+00 8.9980469e+00 9.0107422e+00]
 [3.8427734e+00 4.2568359e+00 4.3339844e+00 4.3779297e+00 4.3789062e+00]
 [3.5175781e+00 4.0654297e+00 4.1015625e+00 4.2148438e+00 4.2558594e+00]
 [2.3457031e+00 2.3486328e+00 6.8046875e+00 6.8427734e+00 6.9658203e+00]
 [9.7656250e-04 8.8378906e+00 8.9287109e+00 8.9941406e+00 9.0283203e+00]
 [3.7519531e+00 4.3740234e+00 4.5185547e+00 4.5371094e+00 4.6191406e+00]
 [0.0000000e+00 8.5371094e+00 8.6289062e+00 8.6933594e+00 8.9248047e+00]
 [3.5009766e+00 4.2011719e+00 4.2431641e+00 4.3730469e+00 4.4199219e+00]
 [3.3779297e+00 3.6240234e+00 3.6523438e+00 3.7822266e+00 3.9042969e+00]
 [3.7656250e+00 4.2851562e+00 4.3125000e+00 4.5205078e+00 4.5849609e+00]
 [3.7421875e+00 4.2714844e+00 4.4677734e+00 4.5693359e+00 4.5761719e+00]
 [3.8076172e+00 4.3037109e+00 4.3144531e+00 4.3457031e+00 4.3574219e+00]
 [0.0000000e+00 8.8740234e+00 8.9199219e+00 8.9599609e+00 8.9638672e+00]
 [3.3349609e+00 3.7226562e+00 3.7910156e+00 4.0488281e+00 4.3642578e+00]
 [2.9335938e+00 3.1669922e+00 3.4218750e+00 5.9628906e+00 6.0009766e+00]
 [2.2109375e+00 2.2128906e+00 6.5175781e+00 6.7275391e+00 6.8320312e+00]
 [2.0791016e+00 2.0800781e+00 6.6953125e+00 6.7089844e+00 6.7255859e+00]
 [0.0000000e+00 8.6171875e+00 8.6777344e+00 9.0263672e+00 9.0664062e+00]
 [9.7656250e-04 8.8359375e+00 8.8808594e+00 8.9208984e+00 8.9423828e+00]
 [2.4062500e+00 2.4082031e+00 6.8105469e+00 6.9472656e+00 7.0068359e+00]
 [9.7656250e-04 8.6279297e+00 8.7968750e+00 8.9531250e+00 8.9550781e+00]
 [0.0000000e+00 8.9794922e+00 9.0263672e+00 9.0439453e+00 9.0703125e+00]
 [3.2343750e+00 3.5175781e+00 3.5957031e+00 3.9169922e+00 5.3691406e+00]
 [2.2929688e+00 2.2949219e+00 6.6669922e+00 6.6992188e+00 6.7089844e+00]
 [3.8671875e+00 4.3222656e+00 4.3359375e+00 4.3564453e+00 4.4472656e+00]
 [3.7900391e+00 4.3183594e+00 4.3662109e+00 4.3964844e+00 4.5839844e+00]
 [3.7949219e+00 4.3251953e+00 4.3828125e+00 4.3964844e+00 4.4072266e+00]
 [0.0000000e+00 8.5292969e+00 8.8564453e+00 8.8642578e+00 8.8750000e+00]
 [2.8613281e+00 3.0986328e+00 3.3378906e+00 6.0507812e+00 6.0644531e+00]
 [2.9062500e+00 2.9882812e+00 3.3222656e+00 5.8398438e+00 6.0195312e+00]
 [3.7910156e+00 4.1953125e+00 4.2392578e+00 4.3154297e+00 4.3330078e+00]
 [3.8828125e+00 4.2753906e+00 4.3330078e+00 4.4433594e+00 4.4492188e+00]
 [3.4697266e+00 4.0683594e+00 4.3691406e+00 4.4130859e+00 4.4150391e+00]
 [3.8027344e+00 4.3623047e+00 4.4492188e+00 4.4511719e+00 4.5117188e+00]
 [3.8144531e+00 4.4375000e+00 4.5009766e+00 4.5097656e+00 4.5244141e+00]
 [3.7324219e+00 4.3505859e+00 4.3671875e+00 4.3886719e+00 4.4052734e+00]
 [3.5830078e+00 4.1464844e+00 4.2402344e+00 4.2773438e+00 4.4667969e+00]
 [3.3808594e+00 3.9414062e+00 4.0000000e+00 4.1132812e+00 4.1464844e+00]
 [2.9091797e+00 3.1630859e+00 3.4140625e+00 5.7685547e+00 5.8164062e+00]
 [0.0000000e+00 9.0380859e+00 9.0937500e+00 9.0957031e+00 9.1191406e+00]
 [3.6074219e+00 4.2148438e+00 4.2587891e+00 4.3574219e+00 4.3632812e+00]
 [9.7656250e-04 8.8525391e+00 8.9111328e+00 9.0576172e+00 9.1298828e+00]
 [2.3740234e+00 2.3750000e+00 6.7568359e+00 6.9619141e+00 7.0683594e+00]
 [2.2177734e+00 2.2197266e+00 6.6367188e+00 6.7744141e+00 6.8906250e+00]
 [1.9531250e-03 8.6640625e+00 9.1230469e+00 9.1894531e+00 9.2167969e+00]
 [0.0000000e+00 8.7958984e+00 9.0039062e+00 9.0644531e+00 9.0917969e+00]
 [2.8320312e+00 3.1093750e+00 3.2480469e+00 5.9785156e+00 6.0566406e+00]
 [3.8183594e+00 4.3847656e+00 4.4121094e+00 4.4179688e+00 4.4355469e+00]
 [2.4335938e+00 2.4355469e+00 6.8212891e+00 6.8222656e+00 6.8789062e+00]
 [0.0000000e+00 8.7382812e+00 8.8437500e+00 8.9365234e+00 9.0361328e+00]
 [3.8437500e+00 4.3125000e+00 4.3320312e+00 4.3818359e+00 4.4121094e+00]
 [3.3281250e+00 3.5781250e+00 3.9941406e+00 4.0322266e+00 4.4687500e+00]
 [3.7285156e+00 4.1611328e+00 4.2333984e+00 4.2851562e+00 4.3378906e+00]
 [3.7382812e+00 4.1406250e+00 4.3603516e+00 4.4218750e+00 4.4697266e+00]
 [2.2978516e+00 2.2998047e+00 6.7763672e+00 6.7939453e+00 6.7998047e+00]
 [3.7988281e+00 4.3564453e+00 4.4707031e+00 4.5048828e+00 4.5126953e+00]
 [3.7900391e+00 4.3642578e+00 4.4472656e+00 4.4902344e+00 4.4931641e+00]
 [2.1865234e+00 2.1894531e+00 6.5410156e+00 6.6201172e+00 6.7519531e+00]
 [2.2285156e+00 2.2285156e+00 6.6171875e+00 6.6210938e+00 6.6757812e+00]
 [3.8369141e+00 4.2773438e+00 4.3076172e+00 4.4609375e+00 4.4980469e+00]
 [2.9296875e-03 8.6015625e+00 8.6787109e+00 8.8613281e+00 8.9052734e+00]
 [3.8437500e+00 4.3750000e+00 4.4257812e+00 4.4609375e+00 4.4843750e+00]
 [3.8623047e+00 4.2587891e+00 4.2812500e+00 4.3554688e+00 4.3710938e+00]
 [3.6523438e+00 3.9287109e+00 3.9990234e+00 4.1103516e+00 4.1933594e+00]
 [0.0000000e+00 8.9931641e+00 9.1396484e+00 9.1943359e+00 9.2109375e+00]
 [2.2480469e+00 2.2480469e+00 6.8378906e+00 6.9013672e+00 6.9023438e+00]
 [3.7441406e+00 4.4804688e+00 4.5458984e+00 4.5761719e+00 4.5996094e+00]
 [3.6289062e+00 4.0683594e+00 4.3623047e+00 4.4296875e+00 4.4677734e+00]
 [3.7363281e+00 4.3417969e+00 4.3515625e+00 4.4511719e+00 4.4960938e+00]
 [0.0000000e+00 8.7128906e+00 8.9687500e+00 9.0869141e+00 9.1064453e+00]
 [3.4726562e+00 3.6914062e+00 4.0546875e+00 4.0927734e+00 4.4355469e+00]
 [3.5927734e+00 4.3212891e+00 4.3271484e+00 4.4785156e+00 4.5058594e+00]
 [3.1914062e+00 3.4531250e+00 3.5703125e+00 3.5722656e+00 5.5263672e+00]
 [3.7792969e+00 4.2929688e+00 4.3574219e+00 4.3710938e+00 4.3955078e+00]
 [3.7207031e+00 4.0117188e+00 4.1591797e+00 4.4082031e+00 4.4091797e+00]
 [2.3359375e+00 2.3359375e+00 6.6621094e+00 6.9716797e+00 7.1054688e+00]
 [3.6054688e+00 4.1835938e+00 4.1933594e+00 4.3076172e+00 4.3876953e+00]
 [3.0058594e+00 3.3808594e+00 3.4121094e+00 5.8144531e+00 5.9804688e+00]
 [3.6806641e+00 4.0449219e+00 4.0625000e+00 4.2500000e+00 4.3759766e+00]
 [2.2041016e+00 2.2070312e+00 6.7275391e+00 7.0244141e+00 7.0302734e+00]
 [2.8291016e+00 3.2236328e+00 3.3925781e+00 5.7548828e+00 5.9121094e+00]
 [3.8447266e+00 4.2128906e+00 4.3095703e+00 4.3681641e+00 4.3886719e+00]
 [3.0585938e+00 3.7578125e+00 3.8242188e+00 3.8652344e+00 5.5468750e+00]
 [3.5449219e+00 3.8789062e+00 4.0400391e+00 4.1953125e+00 4.2910156e+00]
 [3.5341797e+00 4.1005859e+00 4.3300781e+00 4.3359375e+00 4.3554688e+00]
 [3.6171875e+00 4.1240234e+00 4.1435547e+00 4.3378906e+00 4.3505859e+00]
 [3.7167969e+00 4.3369141e+00 4.4003906e+00 4.4765625e+00 4.5156250e+00]
 [3.7050781e+00 4.2529297e+00 4.2763672e+00 4.3886719e+00 4.4355469e+00]
 [3.2031250e+00 3.4453125e+00 3.5498047e+00 3.7802734e+00 5.4550781e+00]
 [3.3593750e+00 3.8710938e+00 4.1171875e+00 4.1445312e+00 4.5146484e+00]
 [3.6142578e+00 3.8007812e+00 4.0078125e+00 4.2519531e+00 4.3789062e+00]]
[[1083  277 1624 1555  981]
 [1411  198  620  414 1664]
 [ 140   28  317 1915 1597]
 [1704 1252 1745 1906  126]
 [ 867 1055  308 1483  788]
 [1882 1937 1568  981 1695]
 [1885 1826  481   37  779]
 [1725 1417   19 1356 1785]
 [1801  981 1624  145 1524]
 [1011  835  700  650  148]
 [1496  650  681 1624  518]
 [ 780  545  881   61  101]
 [1428 1042 1489  389  511]
 [ 454  259  329 1336  444]
 [1273 1727  106  981 1438]
 [1027 1686  106  121  835]
 [1374  287 1825  637   35]
 [1342  288 1146 1958 1348]
 [ 528 1678 1167  592  145]
 [ 440  981  337  402 1168]
 [1174  773 1686  981  519]
 [ 294   30  685  655  133]
 [ 607 1512  623 1262 1960]
 [1191  668 1140 1100  724]
 [1854 1851  121  681  276]
 [ 781 1927  704  610 1417]
 [1708  896 1707 1172 1331]
 [ 908 1558 1116  519 1686]
 [  88 1848 1624  277  773]
 [ 111  981 1417 1666  464]
 [1961 1116 1664 1185  228]
 [ 368 1905  890 1642  604]
 [1607 1376  163   48 1893]
 [1653  981  427 1523  655]
 [1087 1353   31 1120  832]
 [1800  981  413 1624  106]
 [1347  182  740 1987 1618]
 [  58 1404  105  981  604]
 [1455   81  170  198  685]
 [ 685  277  981  145  879]
 [1850 1334  981  277 1886]
 [1758  922 1600 1506 1192]
 [ 269  981 1483 1162  277]
 [1125 1624  106 1907 1401]
 [1504  511  981  106   28]
 [  62 1377  222 1839  128]
 [1205  423  939  862 1824]
 [1124  216 1104  145 1353]
 [ 435  586 1660  456  269]
 [ 740 1181 1578 1743  155]
 [1809 1571  608  497 1956]
 [1050 1203 1624 1536 1666]
 [1171  307  156  906  790]
 [ 658 1241 1337  804 1540]
 [ 425  629 1318  277  427]
 [ 783 1449  419  121  650]
 [  66  981 1459  277  315]
 [1856 1114 1055  940 1558]
 [  87 1374  916 1707 1057]
 [ 712  981  182  277  106]
 [1574  329  136  197  971]
 [ 879  981 1624  277  106]
 [ 568  389 1241  981  560]
 [1528 1697 1008  725 1523]
 [1255  835 1384  981 1115]
 [1912  981  548   94  277]
 [1004 1048 1558  574  182]
 [1299 1418 1624  121 1470]
 [1070 1384  981  145 1624]
 [ 514  879 1949 1540  329]
 [1876  981  277 1459  650]
 [ 627 1008  981  963  529]
 [1699 1132 1158 1431 1837]
 [1843  902  402 1575 1304]
 [ 340 1190  842  504 1558]
 [ 453  457  648 1079  234]
 [ 344  410   18  470 1877]
 [ 124  277  981 1624 1555]
 [ 275 1237 1381 1439  875]
 [ 793 1249  142  575  664]
 [ 711  952 1751 1229  619]
 [1722 1141 1672  917 1654]
 [1540 1707  277  879  103]
 [1176  574 1459    4  256]
 [ 584 1092  829  681  291]
 [ 687  874 1644  646  535]
 [1577 1715 1239 1623  259]
 [1897 1130 1179 1553 1587]
 [ 479  841  122   28 1561]
 [1921 1343  109  518 1686]
 [  78  981  277  896  145]
 [1736  229 1453 1157  542]
 [1858  610  616 1144    9]
 [ 741 1895 1400  131  836]
 [1774  586 1470   56  172]
 [ 249 1554  277  613  981]
 [ 159 1970  255  605  471]
 [1392  909  228  104  121]
 [ 666  540  403 1804  372]
 [ 244   99 1705 1331  231]]

PCA降维

mat = faiss.PCAMatrix (512, 64)  # 从512维降为64维
mat.train(data)
print(data.shape)
assert mat.is_trained
tr = mat.apply_py(data)
print(tr.shape)
(2000, 512)
(2000, 64)

可以看到维度从512维度降低到64维度

PQ编码/解码

d = 512  # 数据维度
cs = 4  # code size (bytes)

# 训练数据集
xt = data

# dataset to encode (could be same as train)
x = data

pq = faiss.ProductQuantizer(d, cs, 8)
pq.train(xt)  # 训练

# encode编码 
codes = pq.compute_codes(x)

# decode解码
x2 = pq.decode(codes)

# 编码-解码后与原始数据的差
avg_relative_error = ((x - x2)**2).sum() / (x ** 2).sum()
print(avg_relative_error)
0.0008764966

标量量化器(scalar quantizer)与之类似

d = 512  # 数据维度

# 训练集
xt = data

# dataset to encode (could be same as train)
x = data

# QT_8bit allocates 8 bits per dimension (QT_4bit also works)
sq = faiss.ScalarQuantizer(d, faiss.ScalarQuantizer.QT_8bit)
sq.train(xt)

# encode 编码
codes = sq.compute_codes(x)

# decode 解码
x2 = sq.decode(codes)

# 计算编码-解码后与原始数据的差
avg_relative_error = ((x - x2)**2).sum() / (x ** 2).sum()
print(avg_relative_error)
6.7287445e-08

参考

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
Go语言(也称为Golang)是由Google开发的一种静态强类型、编译型的编程语言。它旨在成为一门简单、高效、安全和并发的编程语言,特别适用于构建高性能的服务器和分布式系统。以下是Go语言的一些主要特点和优势: 简洁性:Go语言的语法简单直观,易于学习和使用。它避免了复杂的语法特性,如继承、重载等,转而采用组合和接口来实现代码的复用和扩展。 高性能:Go语言具有出色的性能,可以媲美C和C++。它使用静态类型系统和编译型语言的优势,能够生成高效的机器码。 并发性:Go语言内置了对并发的支持,通过轻量级的goroutine和channel机制,可以轻松实现并发编程。这使得Go语言在构建高性能的服务器和分布式系统时具有天然的优势。 安全性:Go语言具有强大的类型系统和内存管理机制,能够减少运行时错误和内存泄漏等问题。它还支持编译时检查,可以在编译阶段就发现潜在的问题。 标准库:Go语言的标准库非常丰富,包含了大量的实用功能和工具,如网络编程、文件操作、加密解密等。这使得开发者可以更加专注于业务逻辑的实现,而无需花费太多时间在底层功能的实现上。 跨平台:Go语言支持多种操作系统和平台,包括Windows、Linux、macOS等。它使用统一的构建系统(如Go Modules),可以轻松地跨平台编译和运行代码。 开源和社区支持:Go语言是开源的,具有庞大的社区支持和丰富的资源。开发者可以通过社区获取帮助、分享经验和学习资料。 总之,Go语言是一种简单、高效、安全、并发的编程语言,特别适用于构建高性能的服务器和分布式系统。如果你正在寻找一种易于学习和使用的编程语言,并且需要处理大量的并发请求和数据,那么Go语言可能是一个不错的选择。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

uncle_ll

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值