k-均值聚类算法
聚类算法-K-均值算法 (Clustering Algorithms - K-means Algorithm)
K-Means算法简介 (Introduction to K-Means Algorithm)
K-means clustering algorithm computes the centroids and iterates until we it finds optimal centroid. It assumes that the number of clusters are already known. It is also called flat clustering algorithm. The number of clusters identified from data by algorithm is represented by ‘K’ in K-means.
K-均值聚类算法计算质心并进行迭代,直到找到最佳质心为止。 它假定群集的数目是已知的。 它也称为平面聚类算法。 通过算法从数据中识别出的聚类数量以K均值中的“ K”表示。
In this algorithm, the data points are assigned to a cluster in such a manner that the sum of the squared distance between the data points and centroid would be minimum. It is to be understood that less variation within the clusters will lead to more similar data points within same cluster.
在该算法中,将数据点分配给群集,以使数据点和质心之间的平方距离之和最小。 应当理解,簇内的较少变化将导致相同簇内的更多相似数据点。
K均值算法的工作 (Working of K-Means Algorithm)
We can understand the working of K-Means clustering algorithm with the help of following steps −
我们可以通过以下步骤来了解K-Means聚类算法的工作原理-
Step 1 − First, we need to specify the number of clusters, K, need to be generated by this algorithm.
步骤1-首先,我们需要指定此算法需要生成的簇数K。
Step 2 − Next, randomly select K data points and assign each data point to a cluster. In simple words, classify the data based on the number of data points.
步骤2-接下来,随机选择K个数据点并将每个数据点分配给一个群集。 简单来说,就是根据数据点的数量对数据进行分类。
Step 3 − Now it will compute the cluster centroids.
步骤3-现在它将计算群集质心。
Step 4 − Next, keep iterating the following until we find optimal centroid which is the assignment of data points to the clusters that are not changing any more −
步骤4-接下来,继续进行以下迭代,直到找到最佳质心,即将数据点分配给不再变化的群集的最佳质心-
4.1 − First, the sum of squared distance between data points and centroids would be computed.
4.1-首先,将计算数据点和质心之间的平方距离之和。