sklearn clustering聚类

无标签数据的聚类可以通过sklearn.cluster模块进行。每个聚类算法都有两个变体:一个类,实现在训练数据上学习聚类的拟合方法;一个函数,给定训练数据,返回对应于不同聚类的整数标签阵列。对于类,训练数据上的标签可以在labels_属性中找到。

### 1. KMeans
from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
print(kmeans.labels_) 
print(kmeans.predict([[0, 0], [12, 3]]))
print(kmeans.cluster_centers_)

### 2.MiniBatchKMeans
# MiniBatchKMeans是KMeans算法的一个变种,它使用迷你批次来减少计算时间,
# 同时仍然试图优化相同的目标函数。
from sklearn.cluster import MiniBatchKMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
              [4, 2], [4, 0], [4, 4],
              [4, 5], [0, 1], [2, 2],
              [3, 2], [5, 5], [1, -1]])
# manually fit on batches
kmeans = MiniBatchKMeans(n_clusters=2,
                         random_state=0,
                         batch_size=6)
kmeans = kmeans.partial_fit(X[0:6,:])
kmeans = kmeans.partial_fit(X[6:12,:])
print(kmeans.cluster_centers_)

print(kmeans.predict([[0, 0], [4, 4]]))

# fit on the whole data
kmeans = MiniBatchKMeans(n_clusters=2,
                         random_state=0,
                         batch_size=6,
                         max_iter=10).fit(X)
print(kmeans.cluster_centers_)
print(kmeans.predict([[0, 0], [4, 4]]))

### 3.AffinityPropagation
from sklearn.cluster import AffinityPropagation
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
              [4, 2], [4, 4], [4, 0]])
clustering = AffinityPropagation(random_state=5).fit(X)
print(clustering)
print(clustering.labels_) 
print(clustering.predict([[0, 0], [4, 4]]))  
print(clustering.cluster_centers_)

### 4.MeanShift
from sklearn.cluster import MeanShift
import numpy as np
X = np.array([[1, 1], [2, 1], [1, 0],
              [4, 7], [3, 5], [3, 6]])
clustering = MeanShift(bandwidth=2).fit(X)

print(clustering.labels_)
print(clustering.predict([[0, 0], [5, 5]])) 
print(clustering)

### 5.SpectralClustering 
from sklearn.cluster import SpectralClustering
import numpy as np
X = np.array([[1, 1], [2, 1], [1, 0],
              [4, 7], [3, 5], [3, 6]])
clustering = SpectralClustering(n_clusters=2,
    assign_labels='discretize',
    random_state=0).fit(X)

print(clustering.labels_) 
print(clustering)

### 6.AgglomerativeClustering
from sklearn.cluster import AgglomerativeClustering
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
              [4, 2], [4, 4], [4, 0]])
clustering = AgglomerativeClustering().fit(X)
print(clustering)
print(clustering.labels_)

### 7. DBSCAN
from sklearn.cluster import DBSCAN
import numpy as np
X = np.array([[1, 2], [2, 2], [2, 3],
              [8, 7], [8, 8], [25, 80]])
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
print(clustering)
print(clustering.labels_)

### 8.聚类评价
from sklearn import metrics
labels_true = [0, 0, 0, 1, 1, 1]   # 真实分类标签
labels_pred = [0, 0, 1, 1, 2, 2]   # 聚类结果

print(metrics.rand_score(labels_true, labels_pred))
print(metrics.adjusted_rand_score(labels_true, labels_pred))  
print(metrics.adjusted_mutual_info_score(labels_true, labels_pred))
print(metrics.homogeneity_score(labels_true, labels_pred))
print(metrics.completeness_score(labels_true, labels_pred))
print(metrics.v_measure_score(labels_true, labels_pred, beta=0.6))

# 轮廓系数:分数介于-1和+1之间,前者表示不正确的聚类,后者表示高度密集的聚类。分数在0左右表示有重叠的聚类。
print(metrics.silhouette_score(X, labels_pred, metric='euclidean'))

 参考:

https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值