Dimensionality Reduction by Learning an Invariant Mapping
R. Hadsell, S. Chopra, Y. Lecun, Dimensionality Reduction by Learning an Invariant Mapping, CVPR (2006)
摘要
降维(dimensionality reduction):将高维数据点映射到低维流形上,使输入空间中相似的点在流形上相距较近(mapping a set of high dimensional input points onto a low dimensional manifold so that “similar” points in input space are mapped to nearby points on the manifold)。
已知方法的缺点:(1)与输入空间中的距离测度相关(most of them depend on a meaningful and computable distance metric in input space);(2)当新样本与训练数据关系未知时,没有能够准确映射该样本的函数(do not compute a “function” that can accurately map new input samples whose relationship to the training data is unknown)。
本文提出通过学习不变映射进行降维(Dimensionality Reduction by Learning an Invariant Mapping,DrLIM),即学习一个能够将数据均匀映射到输出流形上的全局一致非线性函数(learning a globally coherent non-linear function that maps the data evenly to the output manifold),该学习仅与近邻关系有关而无需输入空间中任何距离度量(the learning relies solely on neighborhood relationships and does not require any distance measure in the input space)。
1 引言
局部线性嵌入(Locally Linear Embedding,LLE):对类别相同的输入向量线性组合,无法处理与训练样本关系未知的数据。
样本外扩展(out-of-sample extensions):假设存在能够生成邻域矩阵的可计算核函数(assume the existence of a computable kernel function that is used to generate the neighborhood matrix)条件下,给出新样本的一致性嵌入(consistent embedding)。
此外,在输出空间中,上述方法容易使样本聚集过密而导致解退化(degenerate solutions);相反,这些方法需要找到能够被样本均匀覆盖的流形。
学习不变映射进行降维(Dimensionality Reduction by Learning an Invariant Mapping,DrLIM)通过学习全局一致非线性函数,数据映射到输出流形上:
-
仅需训练样本间的相邻关系(neighborhood relationships between training samples);
-
对输入的非线性变换映射不变(invariant to complicated non-linear trnasformations of the inputs such as lighting changes and geometric distortions);
-
无先验条件下,映射未知新样本(map new samples not seen during training, with no prior knowledge);
-
输出空间上映射平滑、一致(mapping generated by the function is in some sense “smooth” and coherent in the output space)。
对比损失函数:通过学习映射函数 G W G_{\mathbf{W}} GW的参数 W \mathbf{W} W,使原始高维空间中的近邻样本在低维流形上内聚、非近邻样本远离。低维流形上距离度量为欧氏距离(euclidean distance):
D W ( x 1 , x 2 ) = ∥ G W ( x 1 ) − G W ( x 2 ) ∥ 2 D_\mathbf{W} (\mathbf{x}_1, \mathbf{x}_2) = {\| G_\mathbf{W}(\mathbf{x}_1) - G_\mathbf{W}(\mathbf{x}_2) \|}_2 DW(x1,x2)=∥GW(x1)−GW(x2)∥2
给定近邻关系集合(a set of neighborhood relationships), D W ( x 1 , x 2 ) D_\mathbf{W} (\mathbf{x}_1, \mathbf{x}_2) DW(x1,x2)能够逼近样本在输入空间上的语义相似性(semantic similarity”of the inputs in input space)。
1.1 相关工作(Previous Work)
线性嵌入(a linear embedding):主成分分析(Principal Component Analysis,PCA)、多维缩放(Multi-Dimensional Scaling,MDS)
非线性谱方法(non-linear spectral methods):ISOMAP、局部线性嵌入(Local Linear Embedding,LLE)、拉普拉斯特征图(Laplacian Eigenmaps)。上述方法步骤为:(1)确定每个样本的近邻样本列表;(2)构造元组矩阵(gram matrix);第三,元组矩阵特征值(eigenvalue)求解。
核主成分分析(Kernel PCA)
2 学习低维映射(Learning the Low Dimensional Mapping)
问题描述:给定输入空间中样本间相邻关系(neighborhood relationships between samples),寻找将高维空间输入模式映射到时低维输出的函数(a function that maps high dimensional input patterns to lower dimensional outputs)。
输入向量集合: I = { x 1 , ⋯ , x P } \mathcal{I} = \{\mathbf{x}_1, \cdots, \mathbf{x}_P\} I={ x1,⋯,xP}, x i ∈ R D , ∀ i = 1 , 2 , ⋯ , P \mathbf{x}_i \in \frak{R}^D, \forall i = 1, 2, \cdots, P xi∈RD,∀i=1,2,⋯,P;
参数方程: G W : R D → R d , d ≪ D G_{\mathbf{W}} : \frak{R}^D \rightarrow \frak{R}^d, d \ll D GW:RD→Rd,d≪D,满足:
(1)输出空间上的距离测度逼近输入空间中的近邻关系(neighborhood relationships)
(2)对输入样本的复杂变换具有不变性
(3)对近邻关系未知的样本公平(faithful even for samples whose neighborhood relationships are unknown)
2.1 对比损失函数(The Contrastive Loss Function)
高维训练向量集合: I = { x i } \mathcal{I} = \{\mathbf{x}_i\} I={ xi};对于 I \mathcal{I} I中的每条样本 x i \mathbf{x}_i xi, S x i \mathcal{S}_{\mathbf{x}_i} Sxi表示与 x i \mathbf{x}_i xi相似的样本集合; y = 0 y = 0 y=0表示 x 1 \mathbf{x}_1 x1与 x