Silhouettes:聚类结果衡量指标

这段代码展示了如何计算样本的轮廓系数以评估聚类结果。通过导入numpy、sklearn库,利用pairwise_distances计算距离,然后计算同一簇内的平均距离(intra_clust_dists)和不同簇间的平均距离(inter_clust_dists),最后计算并返回轮廓系数。
摘要由CSDN通过智能技术生成
import numpy as np
from sklearn import datasets
from sklearn.metrics import pairwise_distances
from sklearn.preprocessing import LabelEncoder


def silhouette_samples(X, labels, metric='euclidean', **kwds):
    le = LabelEncoder()
    labels = le.fit_transform(labels)
    unique_labels = le.classes_
    distances = pairwise_distances(X, metric=metric, **kwds)
    intra_clust_dists = np.ones(distances.shape[0], dtype=distances.dtype)
    inter_clust_dists = np.inf * intra_clust_dists
    for curr_label in unique_labels:
        mask = curr_label == labels
        current_distances = distances[mask]
        n_samples_curr_lab = np.sum(mask) - 1
        if n_samples_curr_lab != 0:
            intra_clust_dists[mask] = np.sum(current_distances[:, mask], axis=1) / n_samples_curr_lab
        for other_label in unique_labels:
            if other_label != curr_label:
            
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是使用Python实现二分-k均值聚类算法进行图像分割并输出轮廓系数的完整代码: ``` python import numpy as np from PIL import Image from sklearn.metrics import silhouette_score def load_image(filename): img = Image.open(filename) return np.array(img) def save_image(filename, data): img = Image.fromarray(np.uint8(data)) img.save(filename) def kmeans(data, k, max_iter=100): centers = np.random.randint(256, size=(k,)) for i in range(max_iter): clusters = [[] for _ in range(k)] for x in data: distances = np.abs(centers - x) idx = np.argmin(distances) clusters[idx].append(x) new_centers = [np.mean(cluster) for cluster in clusters] if np.allclose(new_centers, centers): break centers = new_centers return centers, clusters def bisecting_kmeans(data, k, max_iter=100): clusters = [data] while len(clusters) < k: max_silhouette = -1 for i, cluster in enumerate(clusters): centers, subclusters = kmeans(cluster, 2, max_iter) silhouettes = np.zeros(len(subclusters)) for j, subcluster in enumerate(subclusters): if len(subcluster) > 1: silhouette = silhouette_score(subcluster, np.zeros_like(subcluster)) silhouettes[j] = silhouette silhouette = np.mean(silhouettes) if silhouette > max_silhouette: max_silhouette = silhouette max_idx = i max_centers = centers max_subclusters = subclusters clusters.pop(max_idx) clusters.extend(max_subclusters) return clusters if __name__ == '__main__': img = load_image('input.jpg') data = img.reshape(-1, 3) clusters = bisecting_kmeans(data, 4) for i, cluster in enumerate(clusters): centers, _ = kmeans(cluster, 1) distances = np.abs(cluster - centers[0]) mask = np.argmin(distances, axis=1) == 0 data[mask] = centers[0] img = data.reshape(img.shape) save_image('output.jpg', img) silhouettes = np.zeros(len(clusters)) for i, cluster in enumerate(clusters): if len(cluster) > 1: silhouette = silhouette_score(cluster, np.zeros_like(cluster)) silhouettes[i] = silhouette print('Silhouette coefficients:', silhouettes) ``` 这个代码基于之前的二分-k均值聚类算法实现了轮廓系数的计算。首先,使用 bisecting_kmeans 函数将数据分成四个簇。然后,使用 kmeans 函数将每个簇压缩为一个点,并使用该点将簇中所有像素点赋值为该点的颜色。接下来,将数据数组转换回图像并保存输出图像。最后,使用 silhouette_score 函数计算每个簇的轮廓系数,并输出到控制台。 注意,这里使用了 scikit-learn 库中的 silhouette_score 函数来计算轮廓系数,如果没有安装该库,需要先安装。此外,轮廓系数的计算需要注意簇中至少有两个样本。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值