流形学习

最新推荐文章于 2024-09-13 18:11:35 发布

u200710

最新推荐文章于 2024-09-13 18:11:35 发布

阅读量295

点赞数

分类专栏： scikit-learn 文章标签： python 机器学习人工智能

原文链接：https://scikit-learn.org/stable/modules/manifold.html

版权

scikit-learn 专栏收录该内容

21 篇文章 1 订阅

订阅专栏

流形学习

流形学习是一种用于非线性降维的方法。

简介

高维数据集难以图形化展示。在二维或三维中的数据通过图形可以展示其固有的结构，等价的高维图形则不是很直观。为了帮助图形化数据集结构，其维度必须以某种方式减少。

数据降维的最简单的方式是采用数据的任意投影。在任意投影中，数据中最有意思的结构可能会遗失。

为了解决这个问题，许多监督和非监督的线性降维框架已被设计，比如，Principal Component Analysis、Independent Component Analysis、Linear Discriminant Analysis和其它。这些算法定义了具体的准则选择数据有趣的线性投影。这些方法是很强大的，但是通常会遗失数据中的非线性结构。

流形学习可以被当做泛化线性框架的尝试，其对数据中的非线性结构敏感。尽管有监督的版本存在，典型的流形学习问题是非监督的。

# coding: utf-8
# Comparison of Manifold Learning methods

from collections import OrderedDict
from functools import partial
from time import time

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.ticker import NullFormatter

from sklearn import manifold, datasets

Axes3D

n_points = 1000
X, color = datasets.make_s_curve(n_points, random_state=0)
n_neighbors = 10
n_components = 2

fig = plt.figure(figsize=(15, 8))
fig.suptitle("Manifold Leanring with %i points, %i neighbors"
             % (1000, n_neighbors), fontsize=14)

ax = fig.add_subplot(251, projection='3d')
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap=plt.cm.Spectral)
ax.view_init(4, -72)

LLE = partial(manifold.LocallyLinearEmbedding,
              n_neighbors, n_components, eigen_solver='auto')

methods = OrderedDict()
methods['LLE'] = LLE(method='standard')
methods['LTSA'] = LLE(method='ltsa')
methods['Hessian LLE'] = LLE(method='hessian')
methods['Modified LLE'] = LLE(method='modified')
methods['Isomap'] = manifold.Isomap(n_neighbors, n_components)
methods['MDS'] = manifold.MDS(n_components, max_iter=100, n_init=1)
methods['SE'] = manifold.SpectralEmbedding(n_components=n_components,
                                           n_neighbors=n_neighbors)
methods['t-SNE'] = manifold.TSNE(n_components=n_components, init='pca',
                                 random_state=0)

for i, (label, method) in enumerate(methods.items()):
    t0 = time()
    Y = method.fit_transform(X)
    t1 = time()
    print("%s: %.2g sec" % (label, t1 - t0))
    ax = fig.add_subplot(2, 5, 2 + i + (i > 3))

    ax.scatter(Y[:, 0], Y[:, 1], c=color, cmap=plt.cm.Spectral)
    ax.set_title("%s (%.2g sec)" % (label, t1 - t0))
    ax.xaxis.set_major_formatter(NullFormatter())
    ax.yaxis.set_major_formatter(NullFormatter())
    ax.axis('tight')

plt.show()

Isomap

seeks a lower-dimensional embedding which maintains geodestic distances between all points

Locally Linear Embedding

seeks a lower-dimensional projection of the data which preserves distances within local neighborhoods

Modified Locally Linear Embedding

Its essence is to use multiple weight vectors in each neighborhood to address the regularization problem

Hessian Eigenmapping

another method of solving the regularization problem of LLE

Spectral Embedding

an approach to calculating a non-linear embedding. implements Laplacian Eigenmaps, which finds a low dimensional representation of the data using a spectral decomposition of the graph Laplacian.

Local Rangent Space Alignment

seeks to characterize the local geometry at each neighborhood via its tangent space, and performs a global optimization to align these local tangent spaces to learn embedding

Multi-dimensional Scaling (MDS)

seeks a lower-dimensional representation of the data in which the distances respect well the distances in the original high-dimensional space, attemps to model similarity or dissimilarity data as distances in a geometric space.