原理
Product quantization,国内有人直译为乘积量化,这里的乘积是指笛卡尔积(Cartesian product),意思是指把原来的向量空间分解为若干个低维向量空间的笛卡尔积,并对分解得到的低维向量空间分别做量化(quantization)。这样每个向量就能由多个低维空间的量化code组合表示。算法如下图所示。
PQ算法把D维向量分成m组, 每组进行Kmeans聚类算法.
- m组子向量的Kmeans算法可以并行求解
2)可以将D维的特征压缩成m维,压缩率D/M
代码
训练
def fit(self, vecs, iter=20, seed=123):
"""Given training vectors, run k-means for each sub-space and create
codewords for each sub-space.
This function should be run once first of all.
Args:
vecs (np.ndarray): Training vectors with shape=(N, D) and dtype=np.float32.
iter (int): The number of iteration for k-means
seed (int): The seed for random process
Returns:
object: self
"""
assert vecs.dtype == np.float32
assert vecs.ndim == 2
N, D = vecs.shape
assert self.Ks < N, "the number of training vector should be more than Ks"
assert D % self.M == 0, "input dimension must be dividable by M"
self.Ds = int(D / self.M)
np.random.seed