K-means clustering 介绍论文

最新推荐文章于 2024-05-04 08:00:00 发布

君子美玉

最新推荐文章于 2024-05-04 08:00:00 发布

阅读量9.4k

点赞数 2

文章标签：算法数据挖掘工作扩展生活工具

本文链接：https://blog.csdn.net/kingskyleader/article/details/6064838

版权

论文：基于距离的划分聚簇算法[分享]
叶若芬李春平
（清华大学软件学院北京 100084）

摘要：k-means算法在聚簇大的数据集时是公认比较有效的算法之一，然而它只能应用在具有数值属性描述的数据对象集合上，这种数据对象叫做数值数据；却无法应用于真实世界中具有其他形形色色属性的数据对象集合上，比如颜色、纹理、形状等特征描述的数据对象集合，这种数据叫做分类数据。为了能对分类数据进行聚簇，对k-means算法进行了扩展，出现两种新的算法：一种是k-modes算法，另一种是k-prototypes算法。但这两种算法都需要用户事先确定聚簇数k、阈值t和聚簇中心Q，在不明白数据分布状况的情况下能较准确地确定这3个参数值是很不容易的，改进的k-modes算法有效解决了这一问题。
关键词：聚簇，k-means，k-modes，k-prototypes，相异度
Distance-based Partition Clustering Algorithm
Ye Ruofen Li Chunping
（School of Software, Tsinghua University，Beijing 100084，China）

Abstract: The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values, such as those data whose attributes is color, texture and shape etc. To cluster categorical values,the k-modes algorithm and k-prototypes algorithm were presented. Yet it is necessary for users to predefine the number of clusters, the center of a cluster and the initial threshold for these algorithms. It is difficult to judge the number of clusters and the initial threshold while not understanding the distribution of the original data. The issue is addressed in this paper for an improved k-modes algorithm.
Key words: Cluster，k-means&#

最低0.47元/天解锁文章

君子美玉

关注

2
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
K-means clustering 介绍论文

论文：基于距离的划分聚簇算法[分享]<br />叶若芬李春平<br /> （清华大学软件学院北京 100084）<br />摘要：k-means算法在聚簇大的数据集时是公认比较有效的算法之一，然而它只能应用在具有数值属性描述的数据对象集合上，这种数据对象叫做数值数据；却无法应用于真实世界中具有其他形形色色属性的数据对象集合上，比如颜色、纹理、形状等特征描述的数据对象集合，这种数据叫做分类数据。为了能对分类数据进行聚簇，对k-means算法进行了扩展，出现两种新的算法：一
复制链接

扫一扫