公式与简介
Rousseeuw, Peter J. “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis.” Journal of computational and applied mathematics 20 (1987): 53-65.
公式: s = b − a m a x ( a , b ) s = \frac{b - a}{max(a, b)} s=max(a,b)b−a
a表示:这个样本在同类中的平均距离
b表示:这个样本在离它最近的另一个类中的平均距离
代码实现
import pandas as pd
import numpy as np
from sklearn import metrics
from sklearn.cluster import KMeans
dataframe = pd.DataFrame(data=np.random.randint(0, 50, size=(200, 10)))
# 以kmeans聚类方法为例
kmeans_model = KMeans(n_clusters=3, random_state=1).fit(dataframe)
labels = kmeans_model.labels_
# 计算指标
score = metrics.silhouette_score(dataframe, labels, metric='euclidean')
print(score)
最终值域为[-1,1],越接近1,表明效果越好
参考文章
Silhouette Coefficient:https://scikit-learn.org/stable/modules/clustering.html#silhouette-coefficient