一、实验目的
使用Python实现K-means 算法。
二、实验原理
(1)(随机)选择K个聚类的初始中心;
(2)对任意一个样本点,求其到K个聚类中心的距离,将样本点归类到距离最小的中心的聚类,如此迭代n次;
(3)每次迭代过程中,利用均值等方法更新各个聚类的中心点(质心);
(4)对K个聚类中心,利用2,3步迭代更新后,如果位置点变化很小(可以设置阈值),则认为达到稳定状态,迭代结束。
三、Python包
(1)numpy
四、实验内容
数据集如下:
[3.13257748 4.08653576]
[2.8486827 4.48815431]
[3.40882487 4.14138275]
[3.06977634 4.31563331]
[3.14381702 4.10147438]
[2.67195731 3.6464033 ]
[2.53242806 4.40165829]
[3.43557873 3.70279658]
[2.62401582 3.54597948]
[3.10216656 4.19867393]
[3.77207532 2.58221923]
[3.92348801 2.72714337]
[4.2845745 3.42431606]
[3.80646856 2.73666636]
[3.9872807 3.13824138]
[4.09143306 3.39424484]
[4.31901806 3.08375654]
[3.58912334 2.91815208]
[4.09898341 3.00657741]
[4.2702863 3.20399911]
试采用K-means 算法对其进行聚类(k为2)。
代码:
import numpy as np
import matplotlib.pyplot as plt
data=[[3.13257748,4.08653576],[2.8486827,4.48815431],
[3.40882487,4.14138275],[3.06977634,4.31563331],
[3.14381702,4.10147438],[2.67195731,3.6464033 ],
[2.53242806,4.40165829],[3.43557873,3.70279658],
[2.62401582,3.54597948], [3.10216656,4.19867393],
[3.77207532,2.58221923],[3.92348801,2.72714337],
[4.2845745,3.42431606],[3.80646856,2.73666636],
[3.9872807,3.13824138],[4.09143306,3.39424484],
[4.31901806,3.08375654],[3.58912334,2.91815208],
[4.09898341,3.00657741],[4.2702863,3.20399911]]
Data=np.array(data)
plt.scatter(Data[:,0], Data[:,1],color = 'green', s = 200)
plt.show()
new_x1=[3.5,3.5]
new_x2=[4,2.5]
for i in range(100):
temp1=[]
temp2=[]
for i in range(len(Data)):
dis_1=sum((Data[i]-new_x1)**2)
dis_2=sum((Data[i]-new_x2)**2)
if dis_1>dis_2:
temp1.append(Data[i])
else:
temp2.append(Data[i])
temp1=np.array(temp1)
temp2=np.array(temp2)
new_x1=[]
new_x1.append(np.average(temp1[:,0]))
new_x1.append(np.average(temp1[:,1]))
new_x2 = []
new_x2.append(np.average(temp2[:, 0]))
new_x2.append(np.average(temp2[:, 1]))
plt.scatter(temp1[:,0], temp1[:,1],color = 'green', s = 200)
plt.scatter(temp2[:,0], temp2[:,1],color = 'red', s = 200)
plt.show()