Nonlinear Dimensionality Reduction by Locally Linear Embedding

最新推荐文章于 2020-08-20 08:33:38 发布

大数据机器学习实验室

最新推荐文章于 2020-08-20 08:33:38 发布

阅读量827

点赞数 2

本文链接：https://blog.csdn.net/diaokui2312/article/details/107869295

版权

Nonlinear Dimensionality Reduction by Locally Linear Embedding
通过局部线性嵌入减少非线性维数
摘要：
Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction: how to discover compact representations of high-dimensional data. Here, we introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighbor-hood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not
involve local minima. By exploiting the local symmetries of linear reconstructions, LLE is able to learn the global structure of nonlinear manifolds, such as those generated by images of faces or documents of text.
科学的许多领域都依赖于探索性数据分析和可视化。分析大量多元数据的需求提出了降维的根本问题：如何发现高维数据的紧凑表示。在这里，我们介绍了局部线性嵌入（LLE），这是一种无监督的学习算法，可计算出高维输入的低维，保留邻居引擎盖的嵌入。与用于减少局部维数的聚类方法不同，LLE将其输入映射到较低维的单个全局坐标系中，并且其优化不涉及局部极小值。通过利用线性重构的局部对称性，LLE能够学习非线性流形的整体结构，例如由脸部图像或文档生成的那些。
正文：
How do we judge similarity? Our mental representations of the world are formed by processing large numbers of sensory in- puts-including, for example, the pixel in- tensities of images, the power spectra of sounds, and the joint angles of articulated bodies. While complex stimuli of this form can be represented by points in a high-dimensional vector space, they typically have a much more compact description. Coherent structure in the world leads to strong correlations between in- puts (such as between neighboring pixels in images), generating observations that lie on or close to a smooth low-dimensional manifold. To compare and classify such observations-in effect to reason about the world-depends crucially on modeling the nonlinear geometry of these low-dimensional manifolds.
我们如何判断相似性？我们对世界的心理表征是通过处理大量的感官而形成的，包括例如图像强度中的像素，声音的功率谱以及关节体的关节角度。尽管这种形式的复杂刺激可以由高维向量空间中的点表示，但它们通常具有更为紧凑的描述。世界上的连贯结构会导致输入之间（例如图像中相邻像素之间）的强相关性，从而生成位于平滑低维流形上或附近的观测值。为了比较和分类这种观察结果，推理世界，关键在于对这些低维流形的非线性几何建模。
Scientists interested in exploratory analysis or visualization of multivariate data (1) face a similar problem in dimensionality reduction. The problem, as illustrated in Fig. 1, involves mapping high-dimensional inputs into a low dimensional “description” space with as many coordimates as observed modes modes of variability.
对探索性分析感兴趣或多元数据的可视化感兴趣的科学家（1）面对a 维数约简中的相似问题。这问题，如图1所示,包括将高维输入映射到一个低维描述空间中，其坐标与观测到的可变性模式相同。
Previous approaches to this problem, based on multidimensional scaling (MDS) (2) have computed embeddings that attempt to preserve pairwise distances [or generalized disparities (3) between data points; these distances are measured along straight lines or, in more so- phisticated usages of MDS such as Isomap (4)，along shortest paths confined to the manifold of observed inputs. Here, we take a different approach, called locally linear embedding (LLE), that eliminates the need to estimate pairwise distances between widely separated data points. Unlike previous methods, LLE recovers global nonlinear structure from locally linear fits.
以前解决这个问题的方法，基于多维尺度(MDS)（2）有试图保存的计算嵌入成对距离[或广义差异（3）数据点之间；这些距离是沿着直线测量，或者，在更复杂的MDS用法中，如isomap（4），沿最短路径限制在观察到的输入的流形上。在这里，我们采用了另一种称为局部线性嵌入（LLE）的方法，该方法无需估计广泛分离的数据点之间的成对距离。与以前的方法不同，LLE从局部线性拟合中恢复全局非线性结构。
The LLE algorithm, summarized in Fig. 2, is based on simple geometric intuitions. Suppose the data consist of N real-valued vectors Xi, each of dimensionality D, sampled from some underlying manifold. Provided there is sufficient data (such that the manifold is well-sampled), we expect each data point and its neighbors to lie on or close to a locally linear patch of the manifold. We characterize the local geometry of these patches by linear coefficients that reconstruct each data point from its neighbors. Reconstruction errors are measured by the cost function.

总结在图2中的LLE算法基于简单的几何直觉。假设数据由N个实值向量Xi组成，每个向量D的维数取自某个基础流形。如果有足够的数据（以便对流形进行充分采样），我们希望每个数据点及其邻居都位于歧管的局部线性斑块上或附近。我们通过线性系数来表征这些面片的局部几何形状，该线性系数从其邻近的孔重建每个数据点。重建错误由成本函数来衡量。
which adds up the squared distances between all the data points and their reconstructions. The weights Wij summarize the contribution of the jth data point to the ith reconstruction. To compute the weights Wij, we minimize the cost first, that each data point Xi is reconstructed only from its neighbors (5), enforcing Wij = 0 if Xj does not belong to the set of neighbors of Xj。 second, that the rows of the weight matrix sum to one:∑▒jWij=1. The optimal weights Wij subject to these constraints (6) are found by solving a least-squares problem (7).
这将所有数据点及其重构之间的平方距离相加。权重Wij总结了第j个数据点对第i个重构的贡献。为了计算权重Wij，我们最小化了受两个约束的代价函数。首先，每个数据点xi只从其邻居（5)重建：第二，权重矩阵的行和为：∑▒jWij=1。通过解决最小二乘问题（7）可以找到受这些约束（6）约束的最佳权重Wij。
The constrained weights that minimize these reconstruction errors obey an important symmetry: for any particular data point, they are invariant to rotations, rescalings, and translations of that data point and its neigh bors. By symmetry, it follows that the recon struction weights characterize intrinsic geo metric properties of each neighborhood, as opposed to properties that depend on a par ticular frame of reference (8). Note that the invariance to translations is specifically en forced by the sum-to-one constraint on the rows of the weight matrix.
最小化这些重构误差的约束权重遵循重要的对称性：对于任何特定数据点，它们对于该数据点及其邻近区域的旋转，重新缩放和平移都是不变的。通过对称性可以得出，重构权重代表了每个邻域的固有几何特性，这与依赖于特定参考框架的特性相反（8）。请注意，平移的不变性是专门由权重矩阵的行上的合一约束来强制执行的。
Suppose the data lie on or near a smooth nonlinear manifold of lower dimensionality d << D. To a good approximation then, there exists a linear mapping-consisting of a translation, rotation, and rescaling-that maps the high-dimensional coordinates of each neighborhood to global internal coordinates on the manifold. By design, the reconstruction weights Wij reflect intrinsic geometric properties of the data that are invariant to exactly such transformations. We therefore expect their characterization of local geometry in the original data space to be equally valid for local patches on the manifold. In particular, the same weights Wij that recon struct the ith data point in D dimensions should also reconstruct its embedded manifold coordinates in d dimensions.
假设数据位于较低维数d << D的光滑非线性流形上或附近。要很好地近似，则存在一个线性平移，包括平移，旋转和重新缩放-映射每个维的高维坐标流形上全局内部坐标的邻域。通过设计，重建权重Wij反映了数据的固有几何特性，而这些固有特性对于精确的此类转换是不变的。因此，我们希望他们对原始数据空间中的本地地理特征的表征对于流形上的本地补丁同样有效。特别是，在D维中重构第i个数据点的权重Wij也应在D维中重构其嵌入的歧管坐标。
LLE constructs a neighborhood-preserving mapping based on the above idea. In the final step of the algorithm, each high-dimensional observation Xi is mapped to a low-dimensional vector Yi representing global internal coordinates on the manifold. This is done by choosing d-dimensional coordinates Y1 to minimize the embedding cost function
This cost function, like the previous one, is based on locally linear reconstruction errors, but here we fix the weights Wij while optimizing the coordinates Yi. The embedding cost in Eg. 2 defines a quadratic form in the vectors Y,. Subject to constraints that make the problem well-posed, it can be minimized by solving a sparse N X N eigenvalue problem (9), whose bottom d nonzero eigenvectors provide an ordered set of orthogonal coordinates centered on the origin.
LLE基于以上思想构造了一个保留邻域的映射。在该算法的最后一步中，每个高维观测值Xi被映射到一个低维向量Yi，该向量代表流形上的全局内部坐标。这是通过选择d维坐标Y1来最小化嵌入成本函数来完成的, 与上一个函数一样，此成本函数基于局部线性重建误差，但是在此，我们在优化坐标Yi的同时固定了权重Wij。嵌入成本，例如。图2定义了向量Y 1中的二次形式。受到使问题解决的约束，可以通过解决稀疏的N X N特征值问题（9）来将其最小化，该问题的底部d个非零特征值提供以原点为中心的有序正交坐标集。
Implementation of the algorithm is straightforward. In our experiments, data points were reconstructed from their K nearest neighbors, as measured by Euclidean distance or normalized dot products. For such implementations of LLE, the algorithm has only one free parameter: the number of neighbors, K.Once neighbors are chosen, the optimal weights Wij and coordinates Yi are computed by standard methods in linear algebra. The algorithm involves a single pass through the three steps in Fig. 2 and finds global minima of the reconstruction and embedding costs in Eqs. 1 and 2.
该算法的实现很简单。在我们的实验中，通过欧几里得距离或归一化的点积测量，从它们的K个最近邻居重建了数据点。对于LLE的此类实现，该算法只有一个自由参数：邻居数K。一旦选择了邻居，则通过标准方法在线性代数中计算最佳权重Wij和坐标Yi。该算法涉及图2中的三个步骤的一次遍历，并找到了等式中重建和嵌入成本的全局最小值。 1和2。
In addition to the example in Fig. 1, for which the true manifold structure was known (10), we also applied LLE to images of faces (11) and vectors of word-document counts (12).Two-dimensional embeddings of faces and words are shown in Figs. 3 and 4. Note how the coordinates of these embedding spaces are related to meaningful attributes, such as the pose and expression of human faces and the semantic associations of words.
除了图1中的示例（已知真正的流形结构（10））之外，我们还将LLE应用于人脸图像（11）和单词文档计数矢量（12）。面部和单词的二维嵌入在图1和2中示出。参见图3和4。请注意这些嵌入空间的坐标如何与有意义的属性关联，例如人脸的姿势和表情以及单词的语义关联。
Many popular learning algorithms for nonlinear dimensionality reduction do not share the favorable properties of LLE. Iterative hill-climbing methods for autoencoder neural networks (13, 14), self-organizing maps (15), and latent variable models (16) do not have the same guarantees of global optimality or convergence; they also tend to involve many more free parameters, such as learning rates, convergence criteria, and architectural specifications. Finally, whereas other nonlinear methods rely on deterministic annealing schemes (17) to avoid local mini ma, the optimizations of LLE are especially tractable.
许多流行的用于非线性降维的学习算法并不具有LLE的良好特性。用于自动编码器神经网络（13、14），自组织映射（15）和潜在变量模型（16）的迭代爬山方法不能保证全局最优性或收敛性。他们还倾向于使用更多自由参数，例如学习率，收敛标准和体系结构规范。最后，尽管其他非线性方法依靠确定性退火方案（17）来避免局部极小值，但LLE的优化特别容易处理。
LLE scales well with the intrinsic mani fold dimensionality, d, and does not require a discretized gridding of the embedding space. As more dimensions are added to the embed ding space, the existing ones do not change, so that LLE does not have to be rerun to compute higher dimensional embeddings. Unlike methods such as principal curves and surfaces (18) or additive component models (19), LLE is not limited in practice to manifolds of extremely low dimensionality or codimensionality. Also, the intrinsic value of d can itself be estimated by analyzing a reciprocal cost function, in which reconstruction weights derived from the embedding vectors Yi are applied to the data points Xi.
LLE与固有歧管维数d很好地缩放，并且不需要嵌入空间的离散网格化。随着更多维度添加到嵌入空间中，现有维度不会更改，因此不必重新运行LLE即可计算更高维度的嵌入。与主曲线和曲面（18）或附加组件模型（19）之类的方法不同，LLE实际上并不限于尺寸或共维数极低的歧管。同样，可以通过分析一个倒数成本函数来估计d的内在值，其中将从嵌入矢量Yi导出的重构权重应用于数据点Xi。Tenenbaum的算法Isomap具有LLE的许多优点，该算法已成功应用于非线性降维中的类似问题。但是，优化了Isomap的嵌入以保留通用数据点对之间的测地距离，这只能通过计算通过大数据子格的最短路径来估算。 LLE采用不同的方法，分析局部对称性，线性系数和重构误差，而不是全局约束，成对距离和应力函数。因此，它避免了解决大型动态编程问题的需要，而且还倾向于累积非常稀疏的矩阵，可以利用其结构来节省时间和空间。
LLE illustrates a general principle of manifold learning, elucidated by Martinetz and Schulten (20) and Tenenbaum (4), that over lapping local neighborhoods-collectively analyzed-can provide information about global geometry. Many virtues of LLE are shared by Tenenbaum’s algorithm, Isomap, which has been successfully applied to similar problems in nonlinear dimensionality reduction. Isomap’s embeddings, however, are optimized to pre serve geodesic distances between general pairs of data points, which can only be estimated by computing shortest paths through large sublattices of data. LLE takes a different approach, analyzing local symmetries, linear coefficients, and reconstruction errors instead of global con straints, pairwise distances, and stress functions. It thus avoids the need to solve large dynamic programming problems, and it also tends to accumulate very sparse matrices, whose structure can be exploited for savings in time and space
LLE阐明了流形学习的一般原理，Martinetz和Schulten（20）和Tenenbaum（4）阐明了这一点，即对局部邻域进行重叠分析（共同分析）可以提供有关全局几何的信息。
LLE is likely to be even more useful in combination with other methods in data analysis and statistical learning. For example, a parametric mapping between the observation and embedding spaces could be learned by supervised neural networks (21) whose target values are generated by LLE. LLE can also be generalized to harder settings, such as the case of disjoint data manifolds (22), and specialized to simpler ones, such as the case of time-ordered observations (23).
在数据分析和统计学习中，LLE与其他方法结合使用可能会更加有用。例如，观察和嵌入空间之间的参数映射可以通过监督神经网络（21）来学习，其目标值由LLE生成。 LLE还可以推广到更难的设置，例如不连续的数据流形（22），而专门化到更简单的设置，例如按时间顺序的观测（23）。
Perhaps the greatest potential lies in ap plying LLE to diverse problems beyond those considered here. Given the broad appeal of traditional methods, such as PCA and MDS, the algorithm should find widespread use in many areas of science.
也许最大的潜力在于使LLE能够解决此处未考虑的各种问题。鉴于PCA和MDS等传统方法的广泛吸引力，该算法应在许多科学领域中得到广泛使用。
Fig.1. the problem of nonlinear dimensionality reduction, as illustrated (10) for three-dimensional data (B) sampled from a two-dimensional manifold (A). An unsupervised learning algorithm must discover the global internal coordinates of the manifold without signals that explicitly indicate how the data should be embedded in two dimensions. The color coding illustrates the neighborhood preserving mapping discovered by LLE; black outlines in (B) and © show the neighborhood of a single point. Unlike LLE, projections of the data by principal component analysis (PCA) (28) or classical MDS (2) map faraway data points to nearby points in the plane, failing to identify the underlying structure of the manifold. Note that mixture models for local dimensionality reduction (29), which cluster the data and perform PCA within each cluster, do not address the problem considered here: namely, how to map high-dimensional data into a single global coordinate system of lower dimensionality.
图1.非线性降维问题，如图（10）所示，它是从二维歧管（A）采样的三维数据（B）。一种无监督的学习算法必须发现流形的全局内部坐标，而没有明确表明如何将数据嵌入二维的信号。颜色编码说明了LLE发现的邻域保留映射。（B）和（C）中的黑色轮廓显示了单个点的邻域。与LLE不同，通过主成分分析（PCA）（28）或经典MDS（2）进行的数据投影将遥远的数据点映射到平面中的附近点，从而无法识别歧管的基础结构。注意，用于局部降维的混合模型（29）将数据聚类并在每个聚类中执行PCA，但并未解决此处考虑的问题：即，如何将高维数据映射到一个较低维的单一全局坐标系中。图1.非线性降维问题，如图（10）所示，它是从二维歧管（A）采样的三维数据（B）。一种无监督的学习算法必须发现流形的全局内部坐标，而没有明确表明如何将数据嵌入二维的信号。颜色编码说明了LLE发现的邻域保留映射。（B）和（C）中的黑色轮廓显示了单个点的邻域。与LLE不同，通过主成分分析（PCA）（28）或经典MDS（2）进行的数据投影将遥远的数据点映射到平面中的附近点，从而无法识别歧管的基础结构。注意，用于局部降维的混合模型（29）将数据聚类并在每个聚类中执行PCA，但并未解决此处考虑的问题：即，如何将高维数据映射到一个较低维的单一全局坐标系中。
在这里插入图片描述

Fig. 2. Steps of locally linear embedding: (1) Assign neighbors to each data point X, (for example by using the K nearest neighbors). (2) Compute the weights Wij that best linearly reconstruct Xi from its neighbors, solving the constrained least-squares problem in Eq. 1. (3) Com pute the low-dimensional embedding vectors Y. best reconstructed by W., minimizing Eq. 2 by finding the smallest eigenmodes of the sparse symmetric matrix in Eq. 3. Although the weights W. and vectors Yi are computed by methods in linear algebra, the constraint that points are only reconstructed from neighbors can result in highly nonlinear embeddings.
图2.局部线性耳朵嵌入的步骤：（1）将邻居分配给每个数据点X（例如，使用K个最近的邻居）。（2）计算权重Wj，使其最能从其邻域早期重建Xi，从而解决等式中的约束最小二乘问题。 1.（3）计算W.最佳重构的低维嵌入向量Y.最小化Eq。通过在等式中找到稀疏对称矩阵的最小本征模，得到图2所示的最小本征模。 3.尽管权重W.和向量Yi是通过线性代数中的方法计算的，但是仅从相邻的孔中重建点的约束可能导致高度非线性嵌入。
在这里插入图片描述

Fig.3. Images of faces (11) mapped into the embedding space described by the first two coordinates of LLE. Representative faces are shown next to circled points in different parts of the space. The bottom images correspond to points along the top-right path (linked by solid line), illustrating one particular mode of variability in pose and expression.
Fig.3. Images of faces (11) mapped into the embedding space described by the first two coordinates of LLE. Representative faces are shown next to circled points in different parts of the space. The bottom images correspond to points along the top-right path (linked by solid line), illustrating one particular mode of variability in pose and expression.
在这里插入图片描述

Fig.4. Arranging words in a continuous semantic space. Each word was initially represented by a high-dimensional vector that counted the number of times it appeared in different encyclopedia articles. LLE was applied to these word-document count vectors (12), resulting in an embedding location for each word. Shown are words from two different bounded regions (A) and (B) of the embedding space discovered by LLE. Each panel shows a two dimensional projection onto the third and fourth coordinates of LLE; in these two dimensions, the regions (A) and (B) are highly over lapped. The inset in (A) shows a three-dimensional projection onto the third, fourth, and fifth coordinates, revealing an extra dimension along which regions (A) and (B) are more separated. Words that lie in the inter section of both regions are capitalized. Note how LLE co locates words with similar contexts in this continuous semantic space.
图4.在连续语义空间中排列单词每个单词最初都由一个高维向量表示，该向量计算了它在不同百科全书词典中出现的次数。将LLE应用于这些单词文档计数向量（12），从而为每个单词生成嵌入位置。显示的是来自LLE发现的嵌入空间的两个不同有界区域（A）和（B）的单词。每个面板在LLE的第三和第四坐标上显示二维投影。在这两个维度上，区域（A）和（B）高度重叠。（A）中的插图显示了到第三，第四和第五坐标上的三维投影，揭示了一个额外的维度，沿着该维度，区域（A）和（B）更加分开。位于两个区域的中间的单词均大写。请注意，LLE如何在此连续语义空间中共同定位具有相似上下文的单词。

在这里插入图片描述

大数据机器学习实验室

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Nonlinear Dimensionality Reduction by Locally Linear Embedding

Nonlinear Dimensionality Reduction by Locally Linear Embedding通过局部线性嵌入减少非线性维数摘要：Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensi
复制链接

扫一扫