0. 写作目的
好记性不如烂笔头。
1. 针对关系数据(表格类型)的使用
1.1 将关系数据降维二维
dataNumpy为numpy.array类型的数据。详细见参考[1].
from sklearn.manifold import TSNE
import numpy as np
## this parameters are default parameters
## data_tsne is: array, shape (n_samples, n_components)
# Embedding of the training data in low-dimensional space.
data_tsne = TSNE(n_components=2, perplexity=30.0, early_exaggeration=12.0, learning_rate=200.0, n_iter=1000, n_iter_without_progress=300, min_grad_norm=1e-07, metric=’euclidean’, init=’random’, verbose=0, random_state=None, method=’barnes_hut’, angle=0.5).fit_transform( dataNumpy )
各参数的意义参考官方给出的解释:
重要的参数:
n_components : int, optional (default: 2)
Dimension of the embedded space.
perplexity : float, optional (default: 30)
The perplexity is related to the number of nearest neighbors that is used in
other manifold learning algorithms. Larger datasets usually require a larger
perplexity. Consider selecting a value between 5 and 50. The choice is not
extremely critical since t-SNE is quite insensitive to this parameter.
n_iter : int, optional (default: 1000)
Maximum number of iterations for the optimization. Should be at least 250.
learning_rate : float, optional (default: 200.0)
The learning rate for t-SNE is usually in the range [10.0, 1000.0]. If
the learning rate is too high, the data may look like a ‘ball’ with any
point approximately equidistant from its nearest neighbours. If the
learning rate is too low, most points may l