k-means聚类分析西瓜的密度与含糖率
第一部分:数据集
X表示二维矩阵数据,表示西瓜密度和含糖率
总共30行,每行两列数据
第一列表示西瓜密度:x1
第二列表示西瓜含糖率:x2
from sklearn.cluster import Birch # 从sklearn.cluster机器学习聚类包中导入Birch聚类
from sklearn.cluster import KMeans # 从sklearn.cluster机器学习聚类包中导入KMeans聚类
X=[
[0.697,0.460],[0.774,0.376],[0.634,0.264],[0.608,0.318],[0.556,0.215],
[0.403,0.237],[0.481,0.149],[0.437,0.211],[0.666,0.091],[0.243,0.267],
[0.245,0.057],[0.343,0.099],[0.639,0.161],[0.657,0.198],[0.360,0.370],
[0.593,0.042],[0.719,0.103],[0.359,0.188],[0.339,0.241],[0.282,0.257],
[0.748,0.232],[0.714,0.346],[0.483,0.312],[0.478,0.437],[0.525,0.369],
[0.751,0.489],[0.532,0.472],[0.473,0.376],[0.725,0.445],[0.446,0.459]
]
第二部分:KMeans聚类
clf = KMeans(n_clusters=3) 表示类簇数为3,聚成3类数据,clf即赋值为KMeans
y_pred = clf.fit_predict(X) 载入数据集X,并且将聚类的结果赋值给y_pred
clf = KMeans(n_clusters=2) # 聚类算法,参数n_clusters=3,聚成2类
y_pred = clf.fit_predict(X) # 直接对数据进行聚类,聚类不需要进行预测
print('k均值模型:\n',clf) # 输出完整Kmeans函数,包括很多省略参数
print('聚类结果:\n',y_pred) # 输出聚类预测结果,30行数据,每个y_pred对应X一行或一个西瓜,聚成2类
第三部分:可视化绘图
Python导入Matplotlib包,专门用于绘图
import matplotlib.pyplot as plt 此处as相当于重命名,plt用于显示图像
import numpy as np
import matplotlib.pyplot as plt
# 获取第一列和第二列数据 使用for循环获取 n[0]表示X第一列
x1 = [n[0] for n in X]
x2 = [n[1] for n in X]
# 绘制散点图 参数:x横轴 y纵轴
plt.scatter(x1, x2, c=y_pred, marker='*')
# 绘制标题
plt.title("k-means Data")
# 绘制x轴和y轴坐标
plt.xlabel("x1")
plt.ylabel("x2")
# 显示图形
plt.show()
运行结果:
k均值模型:
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
n_clusters=2, n_init=10, n_jobs=None, precompute_distances='auto',
random_state=None, tol=0.0001, verbose=0)
聚类结果:
[1 1 1 1 1 0 0 0 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0]