k-means, k-medoids, k-median and k-center 的区别

最新推荐文章于 2024-08-05 16:55:11 发布

冬日and暖阳

最新推荐文章于 2024-08-05 16:55:11 发布

阅读量7.9k

点赞数 1

分类专栏：降维&&manifold 文章标签：聚类数据分析图像处理-计算机视觉

降维&&manifold 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

k-means, k-medoids, k-median and k-center, 先不要晕. 这4个都是聚类算法, 个中区别慢慢讲来.

k-means, 这位是最有名的了. 因为简单有效, 通常是聚类的第一选择.
N data items —- > k clusters
in each cluster, there is an averaged center (mean) called centroids.
Object: minimize the sum of squared distance from each item to its nearest averaged center.
EM algorithm is the most common and simple way to realize it.
k-medoids,
N data items —- > k clusters
in each cluster, there is a medoid, which is a real data item from the data set (not averaged !!!).
Object: minimize the sum of squared distance from each item to its nearest medoids.
Main realization : PAM, CLARA, CLARANS, EM algorithm (like k-means)
PAM: global optimal, but very slow
CLARA: use PAM on samples, efficient, not global optimal
CLARANS: random search, better than CLARA
EM: very fast, but not global optimal
k-median,
N data items —- > k clusters
in each cluster, there is a median (median !! not mean or medoids !!).
Object: minimize the sum of distance from each item to its nearest median (sum of distance !! not sum of squared distance !!).
k-center,
N data items —- > k clusters
in each cluster, there is a cluster center.
Object: minimize the maximum distance from each item to its nearest cluster centers (maximum distance !! not sum of distance !!)
According to (Bradley NIPS1997),
k-median is to assign n points in m-dimensional real value space to k clusters so that the sum of distances of each point to the nearest center is minimized. The center is a vector in m-dimensional real value space, but not the one of n points. A center of one cluster is iteratively computed as the median vector of all points in this cluster.
k-median algorithm uses the same strategy as k-means to update the centers, but it uses the 1-norm distance.
In contrast the k-means algorithm uses squares of 2-norm distances to generate cluster centers.
According to (Arya STOC2001), k-median problem is to minimize the average distance from data points to their closest cluster centers. k-center problem is to minimize the maximum distance from data points to their closest cluster centers, which is the min-max analogue of the k-median problem.
In a general metric space, the k-median problem is known to be NP-hard. Its approximation has been widely studied in (Arya STOC2001, Guha JCSS2002).

转自：http://blog.sina.com.cn/s/blog_68db53590100nttp.html

冬日and暖阳

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。