Davies, David L., and Donald W. Bouldin. “A cluster separation measure.” IEEE transactions on pattern analysis and machine intelligence 2 (1979): 224-227.
公式与简介
指标越小表明聚类效果越好,最小值为0
首先计算: R i j = s i + s j d i j R_{ij} = \frac{s_i + s_j}{d_{ij}} Rij=dijsi+sj
其中 s i s_i si表示这个类的直径; d i j d_{ij} dij表示类 i i i与 j j j的质心(centroids)之间的距离
然后取最大的 R i j R_{ij} Rij即可得到DB指标值: D B = 1 k ∑ i = 1 k max i ≠ j R i j DB = \frac{1}{k} \sum_{i=1}^k \max_{i \neq j} R_{ij} DB=k1∑i=1kmaxi=jRij
代码实现
import pandas as pd
import numpy as np
from sklearn import metrics
from sklearn.cluster import KMeans
dataframe = pd.DataFrame(data=np.random.randint(0, 50, size=(200, 10)))
# 以kmeans聚类方法为例
kmeans_model = KMeans(n_clusters=3, random_state=1).fit(dataframe)
labels = kmeans_model.labels_
# 计算指标
score = metrics.davies_bouldin_score(dataframe, labels)
print(score)