sklearn.cluster.
KMeans(n_clusters=8,init='k-means++',n_init=10,max_iter=300, tol=0.0001, precompute_distances='auto',verbose=0,
random_state=None, copy_x=True,n_jobs=1,algorithm='auto')
n_clusters: 生成类别数, int, optional, default: 8.
init: 初始化方法, 默认为‘k-means++,可选{‘k-means++’, ‘random’ or an ndarray}.
n_init: ‘auto’ or int, default=’auto’ (When
n_init='auto'
, the number of runs depends on the value of init: 10 if usinginit='random'
orinit
is a callable; 1 if usinginit='k-means++'
orinit
is an array-like).max_iter: 最大循环次数, int, default: 300.
tol: 判断收敛参数, float, default: 1e-4.
precompute_distances: 预先计算距离并存储,可选{‘auto’, True, False},其中 ‘auto’:如果 n_samples * n_clusters > 12 million则不计算。
verbose:Verbosity模式, int, default 0
random_state: int, RandomState instance or None, optional, default: None (random number generator is the RandomState instance used by np.random)
copy_x: boolean, default True (the original data is not modified)
n_jobs: 设置parallel
algorithm : “auto”, “full”(classical EM-style) or “elkan”(triangle inequality), default=”auto”(chooses “elkan” for dense data and “full” for sparse data)
Examples:
from sklearn.cluster import KMeans
import numpy as np
X = np.array([[0, 0], [0, 2], [-1, 1], [1, 1],
[4, 0], [4, 2], [3, 1], [5, 1]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
print(kmeans.labels_)
# [1 1 1 1 0 0 0 0]
print(kmeans.predict([[0, -1], [4, 4]]))
# [1 0]
print(kmeans.cluster_centers_)
# [[4. 1.]
# [0. 1.]]