K-Means聚类算法的研究与改进*
摘 要:K-Means算法是基于划分的聚类算法中的一个典型算法,该算法有操作简单、采用误差平方和准则函数、对大数据集的处理上有较高的伸缩性和可压缩性的优点.但是该算法还存在着一些随机初始聚类中心导致算法不稳定的缺陷,本文研究了传统K-Means的算法的思想、原理及优缺点,并针对其对初始值依赖的缺陷,提出并研究了一种改进算法K-Means++,该算法对选取初始聚类中心的方法进行了改进.经过实验证明,K-Means++算法有效的提高了算法效率和稳定性,减少了算法开销.
关键词:聚类算法,K-Means算法,数据挖掘
Research and Improvement of K-Means Clustering Algorithm
Abstract: K-Means algorithm is a typical algorithm based on partitioned clustering algorithm. It has the advantages of simple operation, error squared sum criteria function, high scalability and compressibility for processing large data sets advantage. However, there are still some shortcomings in this algorithm, such as stochastic initial clustering center, which results in instability of the algorithm. This paper studies the concept, principle, advantages and disadvantages of the traditional K-Means algorithm and proposes and studies the defects of the original K- An improved algorithm K-Means ++, which improves the method of selecting initial cluster centers. Experimental results show that the K-Means ++ algorithm effectively improves the efficiency and stability of the algorithm and reduces the cost of the algorithm.
Key words: clustering algorithm, K-Means algorithm, data mining
K-Means聚类算法是最为经典,同时