scikit-learn实现的 Agglo-merative-Clustering允许我们选择要返回的集群数量。这对修剪层次结构的集群树很有用。
from sklearn.cluster import AgglomerativeClustering
import numpy as np
import pandas as pd
# 随机生成样本
np.random.seed(123)
variables = ['X', 'Y', 'Z']
labels = ['ID_0', 'ID_1', 'ID_2', 'ID_3', 'ID_4']
X = np.random.random_sample([5, 3])*10
df = pd.DataFrame(X, columns=variables, index=labels)
print(df)
ac = AgglomerativeClustering(n_clusters=3,
affinity='euclidean',
linkage='complete')
labels = ac.fit_predict(X)
print('Cluster labels: %s' % labels)
运行结果:
X Y Z
ID_0 6.964692 2.861393 2.268515
ID_1 5.513148 7.194690 4.231065
ID_2 9.807642 6.848297 4.809319
ID_3 3.921175 3.431780 7.290497
ID_4 4.385722 0.596779 3.980443
Cluster labels: [1 0 0 2 1]