K-Means聚类算法的研究与改进

代码:GitHub - dengsiying/K-Means-improvement: K-Means聚类算法及其改进K-Means聚类算法及其改进. Contribute to dengsiying/K-Means-improvement development by creating an account on GitHub.https://github.com/dengsiying/K-Means-improvement.git

K-Means聚类算法的研究与改进*

摘 要:K-Means算法是基于划分的聚类算法中的一个典型算法,该算法有操作简单、采用误差平方和准则函数、对大数据集的处理上有较高的伸缩性和可压缩性的优点.但是该算法还存在着一些随机初始聚类中心导致算法不稳定的缺陷,本文研究了传统K-Means的算法的思想、原理及优缺点,并针对其对初始值依赖的缺陷,提出并研究了一种改进算法K-Means++,该算法对选取初始聚类中心的方法进行了改进.经过实验证明,K-Means++算法有效的提高了算法效率和稳定性,减少了算法开销.

关键词:聚类算法,K-Means算法,数据挖掘

Research and Improvement of K-Means Clustering Algorithm

Abstract: K-Means algorithm is a typical algorithm based on partitioned clustering algorithm. It has the advantages of simple operation, error squared sum criteria function, high scalability and compressibility for processing large data sets advantage. However, there are still some shortcomings in this algorithm, such as stochastic initial clustering center, which results in instability of the algorithm. This paper studies the concept, principle, advantages and disadvantages of the traditional K-Means algorithm and proposes and studies the defects of the original K- An improved algorithm K-Means ++, which improves the method of selecting initial cluster centers. Experimental results show that the K-Means ++ algorithm effectively improves the efficiency and stability of the algorithm and reduces the cost of the algorithm.

Key words: clustering algorithm, K-Means algorithm, data mining

K-Means聚类算法是最为经典,同时也是使用最为广泛的一种基于划分的聚类算法,它属于基于距离的聚类算法.所谓基于距离的聚类算法是指采用距离作为相似性度量的评价指标,也就是说两个对象之间距离越近,则它们的相似性越大.这类算法的目标通常是将距离比较相近的对象组成簇,从而得到紧凑

  • 17
    点赞
  • 205
    收藏
    觉得还不错? 一键收藏
  • 8
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值