【Scikit-Learn 中文文档】流形学习 - 监督学习 - 用户指南 | ApacheCN

中文文档: http://sklearn.apachecn.org/cn/stable/modules/manifold.html

英文文档: http://sklearn.apachecn.org/en/stable/modules/manifold.html

官方文档: http://scikit-learn.org/stable/

GitHub: https://github.com/apachecn/scikit-learn-doc-zh(觉得不错麻烦给个 Star,我们一直在努力)

贡献者: https://github.com/apachecn/scikit-learn-doc-zh#贡献者

关于我们: http://www.apachecn.org/organization/209.html

注意: 正在翻译中 。。。 




2.2. 流形学习

Look for the bare necessities
The simple bare necessities
Forget about your worries and your strife
I mean the bare necessities
Old Mother Nature’s recipes
That bring the bare necessities of life

– Baloo的歌 [奇幻森林]
../_images/sphx_glr_plot_compare_methods_0011.png

流形学习是一种减少非线性维度的方法。 这个任务的算法基于许多数据集的维度只是人为导致的高。

2.2.1. 介绍

高维数据集可能非常难以可视化。 虽然可以绘制两维或三维数据来显示数据的固有结构,但等效的高维图不太直观。 为了帮助可视化数据集的结构,必须以某种方式减小维度。

通过对数据的随机投影来实现降维的最简单方法。 虽然这允许数据结构的一定程度的可视化,但是选择的随机性远远不够。 在随机投影中,数据中更有趣的结构很可能会丢失。

digits_img projected_img

为了解决这一问题,设计了一些监督和无监督的线性维数降低框架,如主成分分析(PCA),独立成分分析,线性判别分析等。 这些算法定义了特定的标题来选择数据的“有趣”线性投影。 这些是强大的,但是经常会错过重要的非线性结构的数据。

PCA_img LDA_img

流形可以被认为是将线性框架(如PCA)推广为对数据中的非线性结构敏感的尝试。 虽然存在监督变量,但是典型的流形学习问题是无监督的:它从数据本身学习数据的高维结构,而不使用预定的分类。

例子:

以下概述了scikit学习中可用的流形学习实现

2.2.2. Isomap

流形学习的最早方法之一是 Isomap 算法,等距映射(Isometric Mapping)的缩写。 Isomap 可以被视为多维缩放(Multi-dimensional Scaling:MDS)或 Kernel PCA 的扩展。 Isomap 寻求一个维度较低的嵌入,它保持所有点之间的测量距离。 Isomap 可以与 Isomap 对象执行。

../_images/sphx_glr_plot_lle_digits_0051.png

2.2.2.1. 复杂度

Isomap 算法包括三个阶段:

  1. 搜索最近的邻居. Isomap 使用 sklearn.neighbors.BallTree 进行有效的邻居搜索。 对于 D 维中 N 个点的 k 个最近邻,成本约为 O[D \log(k) N \log(N)]
  2. 最短路径图搜索. 最有效的已知算法是 Dijkstra 算法,它的复杂度大约是 O[N^2(k + \log(N))] ,或 Floyd-Warshall 算法,它的复杂度是 O[N^3]. 该算法可以由用户使用 isomap 的 path_method 关键字来选择。 如果未指定,则代码尝试为输入数据选择最佳算法。
  3. 部分特征值分解. 嵌入在与 N \times N isomap内核的 d 个最大特征值相对应的特征向量中进行编码。 对于密集求解器,成本约为 O[d N^2] 。 通常可以使用ARPACK求解器来提高这个成本。 用户可以使用isomap的path_method关键字来指定特征。 如果未指定,则代码尝试为输入数据选择最佳算法。

Isomap 的整体复杂度是 O[D \log(k) N \log(N)] + O[N^2(k + \log(N))] + O[d N^2].

  • N : 训练的数据节点数
  • D : 输入维度
  • k : 最近的邻居数
  • d : 输出维度

参考文献:

2.2.3. 局部线性嵌入

局部线性嵌入(LLE)寻求保留局部邻域内距离的数据的低维投影。 它可以被认为是一系列局部主成分分析,与整体相比,找到最好的非线性嵌入。

局部线性嵌入可以使用 locally_linear_embedding 函数或其面向对象的副本方法 LocallyLinearEmbedding 执行。

../_images/sphx_glr_plot_lle_digits_0061.png

2.2.3.1. 复杂度

标准的 LLE 算法包括三个阶段:

  1. 搜索最近的邻居. 参见上述 Isomap 讨论。
  2. 权重矩阵构造O[D N k^3]. LLE 权重矩阵的构造涉及每 N 个局部邻域的 k \times k 线性方程的解
  3. 部分特征值分解. 参见上述 Isomap 讨论。

标准 LLE 的整体复杂度是 O[D \log(k) N \log(N)] + O[D N k^3] + O[d N^2].

  • N : 训练的数据节点数
  • D : 输入维度
  • k : 最近的邻居数
  • d : 输出维度

参考文献:

2.2.4. Modified Locally Linear Embedding

One well-known issue with LLE is the regularization problem. When the number of neighbors is greater than the number of input dimensions, the matrix defining each local neighborhood is rank-deficient. To address this, standard LLE applies an arbitrary regularization parameter r, which is chosen relative to the trace of the local weight matrix. Though it can be shown formally that as r \to 0, the solution converges to the desired embedding, there is no guarantee that the optimal solution will be found for r > 0. This problem manifests itself in embeddings which distort the underlying geometry of the manifold.

One method to address the regularization problem is to use multiple weight vectors in each neighborhood. This is the essence of modified locally linear embedding (MLLE). MLLE can be performed with function locally_linear_embeddingor its object-oriented counterpart LocallyLinearEmbedding, with the keyword method = 'modified'. It requires n_neighbors > n_components.

../_images/sphx_glr_plot_lle_digits_0071.png

2.2.4.1. Complexity

The MLLE algorithm comprises three stages:

  1. Nearest Neighbors Search. Same as standard LLE
  2. Weight Matrix Construction. Approximately O[D N k^3] + O[N (k-D) k^2]. The first term is exactly equivalent to that of standard LLE. The second term has to do with constructing the weight matrix from multiple weights. In practice, the added cost of constructing the MLLE weight matrix is relatively small compared to the cost of steps 1 and 3.
  3. Partial Eigenvalue Decomposition. Same as standard LLE

The overall complexity of MLLE is O[D \log(k) N \log(N)] + O[D N k^3] + O[N (k-D) k^2] + O[d N^2].

  • N : number of training data points
  • D : input dimension
  • k : number of nearest neighbors
  • d : output dimension

2.2.5. Hessian Eigenmapping

Hessian Eigenmapping (also known as Hessian-based LLE: HLLE) is another method of solving the regularization problem of LLE. It revolves around a hessian-based quadratic form at each neighborhood which is used to recover the locally linear structure. Though other implementations note its poor scaling with data size, sklearn implements some algorithmic improvements which make its cost comparable to that of other LLE variants for small output dimension. HLLE can be performed with function locally_linear_embedding or its object-oriented counterpart LocallyLinearEmbedding, with the keyword method = 'hessian'. It requires n_neighbors > n_components * (n_components + 3) / 2.

../_images/sphx_glr_plot_lle_digits_0081.png

2.2.5.1. Complexity

The HLLE algorithm comprises three stages:

  1. Nearest Neighbors Search. Same as standard LLE
  2. Weight Matrix Construction. Approximately O[D N k^3] + O[N d^6]. The first term reflects a similar cost to that of standard LLE. The second term comes from a QR decomposition of the local hessian estimator.
  3. Partial Eigenvalue Decomposition. Same as standard LLE

The overall complexity of standard HLLE is O[D \log(k) N \log(N)] + O[D N k^3] + O[N d^6] + O[d N^2].

  • N : number of training data points
  • D : input dimension
  • k : number of nearest neighbors
  • d : output dimension

References:

2.2.6. Spectral Embedding

Spectral Embedding is an approach to calculating a non-linear embedding. Scikit-learn implements Laplacian Eigenmaps, which finds a low dimensional representation of the data using a spectral decomposition of the graph Laplacian. The graph generated can be considered as a discrete approximation of the low dimensional manifold in the high dimensional space. Minimization of a cost function based on the graph ensures that points close to each other on the manifold are mapped close to each other in the low dimensional space, preserving local distances. Spectral embedding can be performed with the function spectral_embedding or its object-oriented counterpart SpectralEmbedding.

2.2.6.1. Complexity

The Spectral Embedding (Laplacian Eigenmaps) algorithm comprises three stages:

  1. Weighted Graph Construction. Transform the raw input data into graph representation using affinity (adjacency) matrix representation.
  2. Graph Laplacian Construction. unnormalized Graph Laplacian is constructed as L = D - A for and normalized one as L = D^{-\frac{1}{2}} (D - A) D^{-\frac{1}{2}}.
  3. Partial Eigenvalue Decomposition. Eigenvalue decomposition is done on graph Laplacian

The overall complexity of spectral embedding is O[D \log(k) N \log(N)] + O[D N k^3] + O[d N^2].

  • N : number of training data points
  • D : input dimension
  • k : number of nearest neighbors
  • d : output dimension

References:

2.2.7. Local Tangent Space Alignment

Though not technically a variant of LLE, Local tangent space alignment (LTSA) is algorithmically similar enough to LLE that it can be put in this category. Rather than focusing on preserving neighborhood distances as in LLE, LTSA seeks to characterize the local geometry at each neighborhood via its tangent space, and performs a global optimization to align these local tangent spaces to learn the embedding. LTSA can be performed with function locally_linear_embedding or its object-oriented counterpart LocallyLinearEmbedding, with the keyword method = 'ltsa'.

../_images/sphx_glr_plot_lle_digits_0091.png

2.2.7.1. Complexity

The LTSA algorithm comprises three stages:

  1. Nearest Neighbors Search. Same as standard LLE
  2. Weight Matrix Construction. Approximately O[D N k^3] + O[k^2 d]. The first term reflects a similar cost to that of standard LLE.
  3. Partial Eigenvalue Decomposition. Same as standard LLE

The overall complexity of standard LTSA is O[D \log(k) N \log(N)] + O[D N k^3] + O[k^2 d] + O[d N^2].

  • N : number of training data points
  • D : input dimension
  • k : number of nearest neighbors
  • d : output dimension

References:

2.2.8. Multi-dimensional Scaling (MDS)

Multidimensional scaling (MDS) seeks a low-dimensional representation of the data in which the distances respect well the distances in the original high-dimensional space.

In general, is a technique used for analyzing similarity or dissimilarity data. MDS attempts to model similarity or dissimilarity data as distances in a geometric spaces. The data can be ratings of similarity between objects, interaction frequencies of molecules, or trade indices between countries.

There exists two types of MDS algorithm: metric and non metric. In the scikit-learn, the class MDS implements both. In Metric MDS, the input similarity matrix arises from a metric (and thus respects the triangular inequality), the distances between output two points are then set to be as close as possible to the similarity or dissimilarity data. In the non-metric version, the algorithms will try to preserve the order of the distances, and hence seek for a monotonic relationship between the distances in the embedded space and the similarities/dissimilarities.

../_images/sphx_glr_plot_lle_digits_0101.png

Let S be the similarity matrix, and X the coordinates of the n input points. Disparities \hat{d}_{ij} are transformation of the similarities chosen in some optimal ways. The objective, called the stress, is then defined by sum_{i < j} d_{ij}(X) - \hat{d}_{ij}(X)

2.2.8.1. Metric MDS

The simplest metric MDS model, called absolute MDS, disparities are defined by \hat{d}_{ij} = S_{ij}. With absolute MDS, the value S_{ij} should then correspond exactly to the distance between point i and j in the embedding point.

Most commonly, disparities are set to \hat{d}_{ij} = b S_{ij}.

2.2.8.2. Nonmetric MDS

Non metric MDS focuses on the ordination of the data. If S_{ij} < S_{kl}, then the embedding should enforce d_{ij} <d_{jk}. A simple algorithm to enforce that is to use a monotonic regression of d_{ij} on S_{ij}, yielding disparities \hat{d}_{ij} in the same order as S_{ij}.

A trivial solution to this problem is to set all the points on the origin. In order to avoid that, the disparities \hat{d}_{ij} are normalized.

../_images/sphx_glr_plot_mds_0011.png

References:

2.2.9. t-distributed Stochastic Neighbor Embedding (t-SNE)

t-SNE (TSNE) converts affinities of data points to probabilities. The affinities in the original space are represented by Gaussian joint probabilities and the affinities in the embedded space are represented by Student’s t-distributions. This allows t-SNE to be particularly sensitive to local structure and has a few other advantages over existing techniques:

  • Revealing the structure at many scales on a single map
  • Revealing data that lie in multiple, different, manifolds or clusters
  • Reducing the tendency to crowd points together at the center

While Isomap, LLE and variants are best suited to unfold a single continuous low dimensional manifold, t-SNE will focus on the local structure of the data and will tend to extract clustered local groups of samples as highlighted on the S-curve example. This ability to group samples based on the local structure might be beneficial to visually disentangle a dataset that comprises several manifolds at once as is the case in the digits dataset.

The Kullback-Leibler (KL) divergence of the joint probabilities in the original space and the embedded space will be minimized by gradient descent. Note that the KL divergence is not convex, i.e. multiple restarts with different initializations will end up in local minima of the KL divergence. Hence, it is sometimes useful to try different seeds and select the embedding with the lowest KL divergence.

The disadvantages to using t-SNE are roughly:

  • t-SNE is computationally expensive, and can take several hours on million-sample datasets where PCA will finish in seconds or minutes
  • The Barnes-Hut t-SNE method is limited to two or three dimensional embeddings.
  • The algorithm is stochastic and multiple restarts with different seeds can yield different embeddings. However, it is perfectly legitimate to pick the embedding with the least error.
  • Global structure is not explicitly preserved. This is problem is mitigated by initializing points with PCA (using init=’pca’).
../_images/sphx_glr_plot_lle_digits_0131.png

2.2.9.1. Optimizing t-SNE

The main purpose of t-SNE is visualization of high-dimensional data. Hence, it works best when the data will be embedded on two or three dimensions.

Optimizing the KL divergence can be a little bit tricky sometimes. There are five parameters that control the optimization of t-SNE and therefore possibly the quality of the resulting embedding:

  • perplexity
  • early exaggeration factor
  • learning rate
  • maximum number of iterations
  • angle (not used in the exact method)

The perplexity is defined as k=2^(S) where S is the Shannon entropy of the conditional probability distribution. The perplexity of a k-sided die is k, so that k is effectively the number of nearest neighbors t-SNE considers when generating the conditional probabilities. Larger perplexities lead to more nearest neighbors and less sensitive to small structure. Conversely a lower perplexity considers a smaller number of neighbors, and thus ignores more global information in favour of the local neighborhood. As dataset sizes get larger more points will be required to get a reasonable sample of the local neighborhood, and hence larger perplexities may be required. Similarly noisier datasets will require larger perplexity values to encompass enough local neighbors to see beyond the background noise.

The maximum number of iterations is usually high enough and does not need any tuning. The optimization consists of two phases: the early exaggeration phase and the final optimization. During early exaggeration the joint probabilities in the original space will be artificially increased by multiplication with a given factor. Larger factors result in larger gaps between natural clusters in the data. If the factor is too high, the KL divergence could increase during this phase. Usually it does not have to be tuned. A critical parameter is the learning rate. If it is too low gradient descent will get stuck in a bad local minimum. If it is too high the KL divergence will increase during optimization. More tips can be found in Laurens van der Maaten’s FAQ (see references). The last parameter, angle, is a tradeoff between performance and accuracy. Larger angles imply that we can approximate larger regions by a single point,leading to better speed but less accurate results.

“How to Use t-SNE Effectively” provides a good discussion of the effects of the various parameters, as well as interactive plots to explore the effects of different parameters.

2.2.9.2. Barnes-Hut t-SNE

The Barnes-Hut t-SNE that has been implemented here is usually much slower than other manifold learning algorithms. The optimization is quite difficult and the computation of the gradient is O[d N log(N)], where d is the number of output dimensions and N is the number of samples. The Barnes-Hut method improves on the exact method where t-SNE complexity is O[d N^2], but has several other notable differences:

  • The Barnes-Hut implementation only works when the target dimensionality is 3 or less. The 2D case is typical when building visualizations.
  • Barnes-Hut only works with dense input data. Sparse data matrices can only be embedded with the exact method or can be approximated by a dense low rank projection for instance using sklearn.decomposition.TruncatedSVD
  • Barnes-Hut is an approximation of the exact method. The approximation is parameterized with the angle parameter, therefore the angle parameter is unused when method=”exact”
  • Barnes-Hut is significantly more scalable. Barnes-Hut can be used to embed hundred of thousands of data points while the exact method can handle thousands of samples before becoming computationally intractable

For visualization purpose (which is the main use case of t-SNE), using the Barnes-Hut method is strongly recommended. The exact t-SNE method is useful for checking the theoretically properties of the embedding possibly in higher dimensional space but limit to small datasets due to computational constraints.

Also note that the digits labels roughly match the natural grouping found by t-SNE while the linear 2D projection of the PCA model yields a representation where label regions largely overlap. This is a strong clue that this data can be well separated by non linear methods that focus on the local structure (e.g. an SVM with a Gaussian RBF kernel). However, failing to visualize well separated homogeneously labeled groups with t-SNE in 2D does not necessarily implie that the data cannot be correctly classified by a supervised model. It might be the case that 2 dimensions are not enough low to accurately represents the internal structure of the data.

References:

2.2.10. Tips on practical use

  • Make sure the same scale is used over all features. Because manifold learning methods are based on a nearest-neighbor search, the algorithm may perform poorly otherwise. See StandardScaler for convenient ways of scaling heterogeneous data.
  • The reconstruction error computed by each routine can be used to choose the optimal output dimension. For a d-dimensional manifold embedded in a D-dimensional parameter space, the reconstruction error will decrease as n_components is increased until n_components == d.
  • Note that noisy data can “short-circuit” the manifold, in essence acting as a bridge between parts of the manifold that would otherwise be well-separated. Manifold learning on noisy and/or incomplete data is an active area of research.
  • Certain input configurations can lead to singular weight matrices, for example when more than two points in the dataset are identical, or when the data is split into disjointed groups. In this case, solver='arpack' will fail to find the null space. The easiest way to address this is to use solver='dense' which will work on a singular matrix, though it may be very slow depending on the number of input points. Alternatively, one can attempt to understand the source of the singularity: if it is due to disjoint sets, increasing n_neighbors may help. If it is due to identical points in the dataset, removing these points may help.

See also

   

完全随机树嵌入 can also be useful to derive non-linear representations of feature space, also it does not perform dimensionality reduction.




中文文档: http://sklearn.apachecn.org/cn/stable/modules/manifold.html

英文文档: http://sklearn.apachecn.org/en/stable/modules/manifold.html

官方文档: http://scikit-learn.org/stable/

GitHub: https://github.com/apachecn/scikit-learn-doc-zh(觉得不错麻烦给个 Star,我们一直在努力)

贡献者: https://github.com/apachecn/scikit-learn-doc-zh#贡献者

关于我们: http://www.apachecn.org/organization/209.html

有兴趣的们也可以和我们一起来维护,持续更新中 。。。

机器学习交流群: 629470233


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
在现有省、市港口信息化系统进行有效整合基础上,借鉴新 一代的感知-传输-应用技术体系,实现对码头、船舶、货物、重 大危险源、危险货物装卸过程、航管航运等管理要素的全面感知、 有效传输和按需定制服务,为行政管理人员和相关单位及人员提 供高效的管理辅助,并为公众提供便捷、实时的水运信息服务。 建立信息整合、交换和共享机制,建立健全信息化管理支撑 体系,以及相关标准规范和安全保障体系;按照“绿色循环低碳” 交通的要求,搭建高效、弹性、高可扩展性的基于虚拟技术的信 息基础设施,支撑信息平台低成本运行,实现电子政务建设和服务模式的转变。 实现以感知港口、感知船舶、感知货物为手段,以港航智能 分析、科学决策、高效服务为目的和核心理念,构建“智慧港口”的发展体系。 结合“智慧港口”相关业务工作特点及信息化现状的实际情况,本项目具体建设目标为: 一张图(即GIS 地理信息服务平台) 在建设岸线、港口、港区、码头、泊位等港口主要基础资源图层上,建设GIS 地理信息服务平台,在此基础上依次接入和叠加规划建设、经营、安全、航管等相关业务应用专题数据,并叠 加动态数据,如 AIS/GPS/移动平台数据,逐步建成航运管理处 "一张图"。系统支持扩展框架,方便未来更多应用资源的逐步整合。 现场执法监管系统 基于港口(航管)执法基地建设规划,依托统一的执法区域 管理和数字化监控平台,通过加强对辖区内的监控,结合移动平 台,形成完整的多维路径和信息追踪,真正做到问题能发现、事态能控制、突发问题能解决。 运行监测和辅助决策系统 对区域港口与航运业务日常所需填报及监测的数据经过科 学归纳及分析,采用统一平台,消除重复的填报数据,进行企业 输入和自动录入,并进行系统智能判断,避免填入错误的数据, 输入的数据经过智能组合,自动生成各业务部门所需的数据报 表,包括字段、格式,都可以根据需要进行定制,同时满足扩展 性需要,当有新的业务监测数据表需要产生时,系统将分析新的 需求,将所需字段融合进入日常监测和决策辅助平台的统一平台中,并生成新的所需业务数据监测及决策表。 综合指挥调度系统 建设以港航应急指挥中心为枢纽,以各级管理部门和经营港 口企业为节点,快速调度、信息共享的通信网络,满足应急处置中所需要的信息采集、指挥调度和过程监控等通信保障任务。 设计思路 根据项目的建设目标和“智慧港口”信息化平台的总体框架、 设计思路、建设内容及保障措施,围绕业务协同、信息共享,充 分考虑各航运(港政)管理处内部管理的需求,平台采用“全面 整合、重点补充、突出共享、逐步完善”策略,加强重点区域或 运输通道交通基础设施、运载装备、运行环境的监测监控,完善 运行协调、应急处置通信手段,促进跨区域、跨部门信息共享和业务协同。 以“统筹协调、综合监管”为目标,以提供综合、动态、实 时、准确、实用的安全畅通和应急数据共享为核心,围绕“保畅通、抓安全、促应急"等实际需求来建设智慧港口信息化平台。 系统充分整合和利用航运管理处现有相关信息资源,以地理 信息技术、网络视频技术、互联网技术、移动通信技术、云计算 技术为支撑,结合航运管理处专网与行业数据交换平台,构建航 运管理处与各部门之间智慧、畅通、安全、高效、绿色低碳的智 慧港口信息化平台。 系统充分考虑航运管理处安全法规及安全职责今后的变化 与发展趋势,应用目前主流的、成熟的应用技术,内联外引,优势互补,使系统建设具备良好的开放性、扩展性、可维护性。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值