k-均值聚类算法_聚类算法-K-均值算法

最新推荐文章于 2025-02-27 23:21:47 发布

cunzai1985

最新推荐文章于 2025-02-27 23:21:47 发布

阅读量2.1k

点赞数

文章标签：算法聚类可视化 python 机器学习

原文链接：https://www.tutorialspoint.com/machine_learning_with_python/clustering_algorithms_k_means_algorithm.htm

版权

K-均值算法是一种迭代求解的聚类算法，它假设预先知道了要生成的群集数量K。算法通过计算数据点与质心的平方距离之和最小化来进行聚类。在Python中，K-均值可用于数据点的可视化和机器学习应用，如市场细分、图像分割等。其优点包括易于理解和实现，适用于大量变量；缺点则包括难以预估K值，对初始输入和数据顺序敏感，且对复杂几何形状的群集处理不佳。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

k-均值聚类算法

聚类算法-K-均值算法 (Clustering Algorithms - K-means Algorithm)

K-Means算法简介 (Introduction to K-Means Algorithm)

K-means clustering algorithm computes the centroids and iterates until we it finds optimal centroid. It assumes that the number of clusters are already known. It is also called flat clustering algorithm. The number of clusters identified from data by algorithm is represented by ‘K’ in K-means.

K-均值聚类算法计算质心并进行迭代，直到找到最佳质心为止。它假定群集的数目是已知的。它也称为平面聚类算法。通过算法从数据中识别出的聚类数量以K均值中的“ K”表示。

In this algorithm, the data points are assigned to a cluster in such a manner that the sum of the squared distance between the data points and centroid would be minimum. It is to be understood that less variation within the clusters will lead to more similar data points within same cluster.

在该算法中，将数据点分配给群集，以使数据点和质心之间的平方距离之和最小。应当理解，簇内的较少变化将导致相同簇内的更多相似数据点。

K均值算法的工作 (Working of K-Means Algorithm)

We can understand the working of K-Means clustering algorithm with the help of following steps −

我们可以通过以下步骤来了解K-Means聚类算法的工作原理-

Step 1 − First, we need to specify the number of clusters, K, need to be generated by this algorithm.
步骤1-首先，我们需要指定此算法需要生成的簇数K。
Step 2 − Next, randomly select K data points and assign each data point to a cluster. In simple words, classify the data based on the number of data points.
步骤2-接下来，随机选择K个数据点并将每个数据点分配给一个群集。简单来说，就是根据数据点的数量对数据进行分类。
Step 3 − Now it will compute the cluster centroids.
步骤3-现在它将计算群集质心。
Step 4 − Next, keep iterating the following until we find optimal centroid which is the assignment of data points to the clusters that are not changing any more −
步骤4-接下来，继续进行以下迭代，直到找到最佳质心，即将数据点分配给不再变化的群集的最佳质心-