python(sklearn) 聚类性能度量



python(sklearn) 聚类性能度量

一、sklearn聚类评价函数:

metrics.adjusted_mutual_info_score([,])	
metrics.adjusted_rand_score(labels_true,)	
metrics.calinski_harabasz_score(X, labels)	
metrics.davies_bouldin_score(X, labels)
metrics.completeness_score(labels_true,)	
metrics.cluster.contingency_matrix([,])	
metrics.fowlkes_mallows_score(labels_true,)	
metrics.homogeneity_completeness_v_measure()	
metrics.homogeneity_score(labels_true,)	
metrics.mutual_info_score(labels_true,)	
metrics.normalized_mutual_info_score([,])	
metrics.silhouette_score(X, labels[,])	
metrics.silhouette_samples(X, labels[, metric])
metrics.v_measure_score(labels_true, labels_pred)	

二、评价函数说明:

1. 轮廓系数(Silhouette Coefficient)

  1. 函数:
    def silhouette_score(X, labels, metric=‘euclidean’, sample_size=None,
    random_state=None, **kwds):

  2. 函数值说明:
    所有样本的s i 的均值称为聚类结果的轮廓系数,定义为S,是该聚类是否合理、有效的度量。聚类结果的轮廓系数的取值在【-1,1】之间,值越大,说明同类样本相距约近,不同样本相距越远,则聚类效果越好。


2. CH分数(Calinski Harabasz Score )

  1. 函数:
    def calinski_harabasz_score(X, labels):
  2. 函数值说明:
    类别内部数据的协方差越小越好,类别之间的协方差越大越好,这样的Calinski-Harabasz分数会高。 总结起来一句话:CH index的数值越大越好。

3. 戴维森堡丁指数(DBI)——davies_bouldin_score

  1. 函数:
    def davies_bouldin_score(X, labels):
  2. 函数值说明:
    注意:DBI的值最小是0,值越小,代表聚类效果越好。

完整示例

#!/usr/bin/env python
# encoding: utf-8
'''
@Author  : pentiumCM
@Email   : 842679178@qq.com
@Software: PyCharm
@File    : iris_hierarchical_cluster.py
@Time    : 2020/4/15 23:55
@desc	 : 鸢尾花层次聚类
'''

from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import linkage, dendrogram
from sklearn.decomposition import PCA
from mpl_toolkits.mplot3d import Axes3D

from sklearn import metrics

# 定义常量
cluster_num = 3

# 1. 导入数据集
iris = datasets.load_iris()
iris_data = iris.data

# 2. 数据预处理
data = np.array(iris_data)
std_scaler = preprocessing.StandardScaler()
data_M = std_scaler.fit_transform(data)

# 3. 绘制树状图
plt.figure()
Z = linkage(data_M, method='ward', metric='euclidean')
p = dendrogram(Z, 0)
plt.show()

# 4. 模型训练
ac = AgglomerativeClustering(n_clusters=cluster_num, affinity='euclidean', linkage='ward')
ac.fit(data_M)

# 聚类
label_list = ac.fit_predict(data_M)
for i in range(len(label_list)):
    if i % 50 == 0:
        print()
    else:
        print(label_list[i], end=" ")

print()

# 平面聚类的每一簇的元素
reslist = [[] for i in range(cluster_num)]
# 遍历聚类中每个簇的元素
for i in range(len(label_list)):
    label = label_list[i]
    # 遍历每一类
    reslist[label].append(data_M[i, :])

data_M = np.array(data_M.reshape((-1, 4)))
# 聚类结果可视化
pca = PCA(n_components=3)
pca.fit(data_M)
pca_data = pca.transform(data_M)

# 定义三维坐标轴
fig = plt.figure()
ax1 = plt.axes(projection='3d')

# 绘制散点图
zd = pca_data[:, 0]
xd = pca_data[:, 1]
yd = pca_data[:, 2]

colors = []
for label in label_list:
    if label == 0:
        colors.append('r')
    elif label == 1:
        colors.append('y')
    elif label == 2:
        colors.append('g')
    elif label == 3:
        colors.append('violet')

for i in range(len(label_list), data_M.shape[0]):
    colors.append('black')

ax1.scatter3D(xd, yd, zd, cmap='Blues', c=colors)
plt.show()

# 检验聚类的性能
# metrics.silhouette_score(X, labels[, …])
cluster_score_si = metrics.silhouette_score(data_M, label_list)

print("cluster_score_si", cluster_score_si)

cluster_score_ch = metrics.calinski_harabasz_score(data_M, label_list)
print("cluster_score_ch:", cluster_score_ch)

# DBI的值最小是0,值越小,代表聚类效果越好。
cluster_score_DBI = metrics.davies_bouldin_score(data_M, label_list)
print("cluster_score_DBI :", cluster_score_DBI)

运行结果:

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 
0 0 2 0 2 0 2 0 2 2 0 2 0 2 0 2 2 2 2 0 0 0 0 0 0 0 0 0 2 2 2 2 0 2 0 0 2 2 2 2 0 2 2 2 2 2 0 2 2 
0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
cluster_score_si 0.4466890410285909
cluster_score_ch: 222.71916382215363
cluster_score_DBI : 0.8034665302876753

参考资料

https://blog.csdn.net/qq_27825451/article/details/94436488

  • 2
    点赞
  • 32
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值