http://scikit-learn.org/stable/modules/clustering.html#clustering
Clustering of unlabeled data can be performed with the module sklearn.cluster
.
无标签的数据的聚类可以用skearn.cluster来处理。
Each clustering algorithm comes in two variants: a class, that implements the fit
method to learn the clusters on train data, and a function, that, given train data, returns an array of integer labels corresponding to the different clusters. For the class, the labels over the training data can be found in the labels_
attribute.
每个聚类算法都有两类变量,一类是 class,用fit方法来实现从训练集学习聚类,还有一个是函数,给出训练数据,返回相对不同聚类的整数标签的一个数组。 对于class,对训练集的标签能在label_的属性中找到。
Input data
One important thing to note is that the algorithms implemented in this module take different kinds of matrix as input. On one hand, MeanShift
and KMeans
take data matrices of shape [n_samples, n_features]. These can be obtained from the classes in the sklearn.feature_extraction
module. On the other hand, AffinityPropagation
andSpectralClustering
take similarity matrices of shape [n_samples, n_samples]. These can be obtained from the functions in the sklearn.metrics.pairwise
module. In other words, MeanShift
and KMeans
work with points in a vector space, whereas AffinityPropagation
and SpectralClustering
can work with arbitrary objects, as long as a similarity measure exists for such objects.
这个模型的算法的输入矩阵有几种。 一方面,MeanShift 和Kmeans输入[n_samples,n_features]这样的矩阵。他们能够通过skearn.feature_extraction模块来获取。另一个方面,AffinityPropagation和SectralClustring输入相似矩阵,[n_samples,n_samples],这些能够在sklearn.metrics.pairwise模块获得。另一方面,meanShift,和Kmeans能够运用于向量空间的点以及AffinityPropagation和SpectralClustering能够用于任意能够衡量相似度的物体。
sciket-learn列出了常用的几种聚类
http://scikit-learn.org/stable/auto_examples/manifold/plot_compare_methods.html
代码简单。
这个是t-sne的java版本,这个代码可能需要自己稍微整理一下,还要额外的类库,
https://github.com/lejon/T-SNE-Java