K-Means聚类算法的研究与改进

daisyyyyyyyy

已于 2022-08-08 09:00:53 修改

阅读量3.7w

点赞数 17

分类专栏：机器学习文章标签：聚类算法 kmeans

于 2018-04-24 13:02:21 首次发布

本文链接：https://blog.csdn.net/u013129109/article/details/80063111

版权

本文深入研究了K-Means算法，分析了其对初始聚类中心敏感的问题，并提出K-Means++算法进行改进。K-Means++通过选取距离较远的初始中心以提高算法效率和稳定性，实验表明该算法能更快收敛，降低算法开销。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

代码：GitHub - dengsiying/K-Means-improvement: K-Means聚类算法及其改进K-Means聚类算法及其改进. Contribute to dengsiying/K-Means-improvement development by creating an account on GitHub.https://github.com/dengsiying/K-Means-improvement.git

K-Means聚类算法的研究与改进*

摘要:K-Means算法是基于划分的聚类算法中的一个典型算法,该算法有操作简单、采用误差平方和准则函数、对大数据集的处理上有较高的伸缩性和可压缩性的优点.但是该算法还存在着一些随机初始聚类中心导致算法不稳定的缺陷,本文研究了传统K-Means的算法的思想、原理及优缺点,并针对其对初始值依赖的缺陷,提出并研究了一种改进算法K-Means++,该算法对选取初始聚类中心的方法进行了改进.经过实验证明,K-Means++算法有效的提高了算法效率和稳定性,减少了算法开销.

关键词:聚类算法,K-Means算法,数据挖掘

Research and Improvement of K-Means Clustering Algorithm

Abstract: K-Means algorithm is a typical algorithm based on partitioned clustering algorithm. It has the advantages of simple operation, error squared sum criteria function, high scalability and compressibility for processing large data sets advantage. However, there are still some shortcomings in this algorithm, such as stochastic initial clustering center, which results in instability of the algorithm. This paper studies the concept, principle, advantages and disadvantages of the traditional K-Means algorithm and proposes and studies the defects of the original K- An improved algorithm K-Means ++, which improves the method of selecting initial cluster centers. Experimental results show that the K-Means ++ algorithm effectively improves the efficiency and stability of the algorithm and reduces the cost of the algorithm.

Key words: clustering algorithm, K-Means algorithm, data mining

K-Means聚类算法是最为经典,同时