k均值聚类

最新推荐文章于 2024-09-27 11:50:10 发布

辞三三

最新推荐文章于 2024-09-27 11:50:10 发布

阅读量621

点赞数 7

文章标签：均值算法聚类算法

本文链接：https://blog.csdn.net/weixin_66403225/article/details/139718542

版权

一、基本代码

sklearn.cluster.KMeans(n_clusters=8, *, init='k-means++', n_init=10,
max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True,
algorithm='auto')

参数说明：

(1)n_cluster:需要给定的类别个数,默认值为8;
(2)max_iter:算法执行的最大迭代次数,这里设置最大迭代次数为300;
(3)n_init:初始化次数,设为10意味着进行10次随机初始化,选择效果最好的一种来作为模型;
(4)init='k-means++' 会由程序自动寻找合适的n_clusters;
(5) tol:float型,默认值=1e-4,与inertia结合来确定收敛条件;
(6)n_jobs:指定计算所用的进程数;

属性说明：

(1) cluster_centers _: ndarray of shape (n_clusters, n_features), Coordinates of cluster centers.
(2) labels _: ndarray of shape (n_samples,), Labels of each point
(3) inertia_float: Sum of squared distances of samples to their closest cluster center, weighted
by the sample weights if provided.
(4) n_iter_in: int, Number of iterations run.
(5) n_features_in _: int, Number of features seen during fit.
(6) feature_names_in _: ndarray of shape (n_features_in_,), Names of features seen during fit.

方法：

(1) fit(X[, y, sample_weight]), X{array-like, sparse matrix} of shape (n_samples, n_features)
(2) fit_predict(X, y=None, sample_weight=None),
Returns: labels, ndarray of shape (n_samples,), Index of the cluster each sample belongs to.
(3) fit_transform(X, y=None, sample_weight=None),
Returns: X newndarray of shape (n_samples, n_clusters)

案例实现：

import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn import metrics

# 生成样本特征 X 和样本簇类别 y，共1000个样本，每个样本2个特征
# 共4个簇，簇中心分别在 [-1,-1], [0,0], [1,1], [2,2]，簇方差分别为 [0.4, 0.2, 0.2, 0.2]
X, y = make_blobs(n_samples=1000,
                  n_features=2,
                  centers=[[-1, -1], [0, 0], [1, 1], [2, 2]],
                  cluster_std=[0.4, 0.2, 0.2, 0.2],
                  random_state=9)

# 绘制散点图，根据簇类别 y 设置颜色
plt.figure(figsize=(8, 6))  # 设置图表大小
plt.scatter(X[:, 0], X[:, 1], c=y, marker='o', cmap='viridis')
plt.colorbar()  # 添加颜色条以显示类别
plt.title('Scatter plot of the blobs with true labels')  # 添加标题
plt.xlabel('Feature 1')  # x轴标签
plt.ylabel('Feature 2')  # y轴标签
plt.show()

# 对不同的簇数进行KMeans聚类，并计算每次聚类结果的轮廓系数
plt.figure(figsize=(12, 10))  # 设置图表大小
for index, k in enumerate((2, 3, 4, 5)):
    plt.subplot(2, 2, index + 1)  # 创建一个2x2的子图
    # KMeans聚类
    y_pred = KMeans(n_clusters=k, random_state=9).fit_predict(X)
    # 计算轮廓系数
    score = metrics.silhouette_score(X, y_pred, metric='euclidean')
    
    # 绘制聚类结果的散点图
    plt.scatter(X[:, 0], X[:, 1], c=y_pred, cmap='viridis')
    plt.title(f'KMeans with k={k}')  # 添加标题
    plt.xlabel('Feature 1')  # x轴标签
    plt.ylabel('Feature 2')  # y轴标签
    # 在图中添加轮廓系数文本
    plt.text(.99, .01, ('score: %.2f' % score), transform=plt.gca().transAxes,
             size=10, horizontalalignment='right')

plt.tight_layout()  # 调整子图间距
plt.show()