假定我们有如下9个点
A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)
希望分成3个聚类
初始化选择 A1(2, 10), A4(5, 8) ,A7(1, 2)为聚类中心点,两点距离定义为ρ(a, b) = |x2 – x1| + |y2 – y1| .
第一步
Iteration 1
|
| (2, 10) | (5, 8) | (1, 2) |
|
| Point | Dist Mean cluster 1 | Dist Mean cluster2 | Dist Mean cluster 3 | Cluster |
A1 | (2, 10) |
|
|
|
|
A2 | (2, 5) |
|
|
|
|
A3 | (8, 4) |
|
|
|
|
A4 | (5, 8) |
|
|
|
|
A5 | (7, 5) |
|
|
|
|
A6 | (6, 4) |
|
|
|
|
A7 | (1, 2) |
|
|
|
|
A8 | (4, 9) |
|
|
|
|
对A1点,计算其到每个cluster 的距离
A1->class1 = |2-2|+|10-10}=0
A1->class2 = |2-5|+|10-8|=5
A1->class3 = |2-1|+|10-2|=9
因此A1 属于cluster1
|
| (2, 10) | (5, 8) | (1, 2) |
|
| Point | Dist Mean cluster 1 | Dist Mean cluster 2 | Dist Mean cluster 3 | Cluster |
A1 | (2, 10) | 0 | 5 | 9 | 1 |
A2 | (2, 5) |
|
|
|
|
A3 | (8, 4) |
|
|
|
|
A4 | (5, 8) |
|
|
|
|
A5 | (7, 5) |
|
|
|
|
A6 | (6, 4) |
|
|
|
|
A7 | (1, 2) |
|
|
|
|
A8 | (4, 9) |
|
|
|
|
余下继续计算,直到
|
| (2, 10) | (5, 8) | (1, 2) |
|
| Point | Dist Mean cluster 1 | Dist Mean cluster 2 | Dist Mean cluster 3 | Cluster |
A1 | (2, 10) | 0 | 5 | 9 | 1 |
A2 | (2, 5) | 5 | 6 | 4 | 3 |
A3 | (8, 4) | 12 | 7 | 9 | 2 |
A4 | (5, 8) | 5 | 0 | 10 | 2 |
A5 | (7, 5) | 10 | 5 | 9 | 2 |
A6 | (6, 4) | 10 | 5 | 7 | 2 |
A7 | (1, 2) | 9 | 10 | 0 | 3 |
A8 | (4, 9) | 3 | 2 | 10 | 2 |
重新计算中心点
cluster1只有1个点,因此A1为中心点
cluster2的中心点为 ( (8+5+7+6+4)/5,(4+8+5+4+9)/5 )=(6,6)。注意:这个点并不实际存在。
cluster3的中心点为( (2+1)/2, (5+2)/2 )= (1.5, 3.5)
图形化的过程如下:
持续迭代,直到前后两次迭代不发生变化为止,如下:
以上例子来源:http://www.google.com/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&cd=1&ved=0CDsQFjAA&url=http%3A%2F%2Ffaculty.uscupstate.edu%2Fatzacheva%2FSHIM450%2FKMeansExample.doc&ei=ZDMVT56XJOmoiQLeyLm9DQ&usg=AFQjCNHMUw4sLHM82Pu6cXc2DTSz-cz2pw
from: http://blog.csdn.net/pennyliang/article/details/7207466