k-means聚类的缺点

最新推荐文章于 2024-06-08 07:00:00 发布

likelet

最新推荐文章于 2024-06-08 07:00:00 发布

阅读量4.9k

点赞数

分类专栏：生物信息文章标签： algorithm distance numbers each up

生物信息专栏收录该内容

19 篇文章 5 订阅

订阅专栏

转自http://www.cnblogs.com/emanlee/archive/2012/03/06/2381617.html

Similar to other algorithm, K-mean clustering has many weaknesses:

1 When the numbers of data are not so many, initial grouping will determine the cluster significantly. 当数据数量不是足够大时，初始化分组很大程度上决定了聚类，影响聚类结果。
2 The number of cluster, K, must be determined before hand. 要事先指定K的值。
3 We never know the real cluster, using the same data, if it is inputted in a different order may produce different cluster if the number of data is a few. 数据数量不多时，输入的数据的顺序不同会导致结果不同。
4 Sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum. 对初始化条件敏感。
5 We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight. 无法确定哪个属性对聚类的贡献更大。
6 weakness of arithmetic mean is not robust to outliers. Very far data from the centroid may pull the centroid away from the real one. 使用算术平均值对outlier不鲁棒。
7 The result is circular cluster shape because based on distance. 因为基于距离，故结果是圆形的聚类形状。

One way to overcome those weaknesses is to use K-mean clustering only if there are available many data. To overcome outliers problem, we can use median instead of mean. 克服缺点的方法：使用尽量多的数据；使用中位数代替均值来克服outlier的问题。

Some people pointed out that K means clustering cannot be used for other type of data rather than quantitative data. This is not true! See how you can use multivariate data up to n dimensions (even mixed data type) here. The key to use other type of dissimilarity is in the distance matrix.

likelet

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
k-means聚类的缺点

转自http://www.cnblogs.com/emanlee/archive/2012/03/06/2381617.htmlSimilar to other algorithm, K-mean clustering has many weaknesses: 1 When the numbers of data are not so many, initial grouping
复制链接

扫一扫