流形学习
流形学习是一种用于非线性降维的方法。
简介
高维数据集难以图形化展示。在二维或三维中的数据通过图形可以展示其固有的结构,等价的高维图形则不是很直观。为了帮助图形化数据集结构,其维度必须以某种方式减少。
数据降维的最简单的方式是采用数据的任意投影。在任意投影中,数据中最有意思的结构可能会遗失。
为了解决这个问题,许多监督和非监督的线性降维框架已被设计,比如,Principal Component Analysis
、Independent Component Analysis
、Linear Discriminant Analysis
和其它。这些算法定义了具体的准则选择数据有趣的线性投影。这些方法是很强大的,但是通常会遗失数据中的非线性结构。
流形学习可以被当做泛化线性框架的尝试,其对数据中的非线性结构敏感。尽管有监督的版本存在,典型的流形学习问题是非监督的。
# coding: utf-8
# Comparison of Manifold Learning methods
from collections import OrderedDict
from functools import partial
from time import time
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.ticker import NullFormatter
from sklearn import manifold, datasets
Axes3D
n_points = 1000
X, color = datasets.make_s_curve(n_points, random_state=0)
n_neighbors = 10
n_components = 2
fig = plt.figure(figsize=(15, 8))
fig.suptitle("Manifold Leanring with %i points, %i neighbors"
% (1000, n_neighbors), fontsize=14)
ax = fig.add_subplot(251, projection='3d')
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap=plt.cm.Spectral)
ax.view_init(4, -72)
LLE = partial(manifold.LocallyLinearEmbedding,
n_neighbors, n_components, eigen_solver='auto')
methods = OrderedDict()
methods['LLE'] = LLE(method='standard')
methods['LTSA'] = LLE(method='ltsa')
methods['Hessian LLE'] = LLE(method='hessian')
methods['Modified LLE'] = LLE(method='modified')
methods['Isomap'] = manifold.Isomap(n_neighbors, n_components)
methods['MDS'] = manifold.MDS(n_components, max_iter=100, n_init=1)
methods['SE'] = manifold.SpectralEmbedding(n_components=n_components,
n_neighbors=n_neighbors)
methods['t-SNE'] = manifold.TSNE(n_components=n_components, init='pca',
random_state=0)
for i, (label, method) in enumerate(methods.items()):
t0 = time()
Y = method.fit_transform(X)
t1 = time()
print("%s: %.2g sec" % (label, t1 - t0))
ax = fig.add_subplot(2, 5, 2 + i + (i > 3))
ax.scatter(Y[:, 0], Y[:, 1], c=color, cmap=plt.cm.Spectral)
ax.set_title("%s (%.2g sec)" % (label, t1 - t0))
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
ax.axis('tight')
plt.show()
Isomap
seeks a lower-dimensional embedding which maintains geodestic distances between all points
Locally Linear Embedding
seeks a lower-dimensional projection of the data which preserves distances within local neighborhoods
Modified Locally Linear Embedding
Its essence is to use multiple weight vectors in each neighborhood to address the regularization problem
Hessian Eigenmapping
another method of solving the regularization problem of LLE
Spectral Embedding
an approach to calculating a non-linear embedding. implements Laplacian Eigenmaps, which finds a low dimensional representation of the data using a spectral decomposition of the graph Laplacian.
Local Rangent Space Alignment
seeks to characterize the local geometry at each neighborhood via its tangent space, and performs a global optimization to align these local tangent spaces to learn embedding
Multi-dimensional Scaling (MDS)
seeks a lower-dimensional representation of the data in which the distances respect well the distances in the original high-dimensional space, attemps to model similarity or dissimilarity data as distances in a geometric space.
t-distributed Stochastic Neighbor Embedding (t-SNE)
converts affinities of data points to probabilities