实现过程
1 建立工程,导入sklearn相关包
import numpy as np
from sklearn.cluster import KMeans
2 加载数据,创建算法实例
def loadData(filePath):
fr = open(filePath, 'r+')
lines = fr.readlines()
retData = []
retCityName = []
for line in lines:
items = line.strip().split(',')
retCityName.append(items[0])
retData.append([float(items[i]) for i in range(1, len(items))])
return retData, retCityName
if __name__ == '__main__':
data, cityName = loadData('city.txt')
km = KMeans(n_clusters=3)
label = km.fit_predict(data)
expenses = np.sum(km.cluster_centers_, axis=1)
# print(expenses)
CityCluster = [[], [], []]
for i in range(len(cityName)):
CityCluster[label[i]].append(cityName[i])
for i in range(len(CityCluster)):
print("Expenses:%.2f" % expenses[i])
print(CityCluster[i])
调用K-Means方法所需要的参数:
- n_clusters:用于指定聚类中心的个数
- init:初始聚类中心的初始化方法
- max_iter:最大的迭代次数
- data:加载的数据
- label:聚类后各数据所属的标签
- fit_predic():计算簇中心以及为簇分配序号
【注意】:默认使用的是欧式距离