一、导入数据
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
data=pd.DataFrame({'教练编号':range(1,31),
'科目二':[0.87,0.97,0.79,0.76,0.7,0.5,0.6,0.55,0.83,0.3,0.31,0.43,0.8,0.82,0.45,0.74,0.9,0.45,0.42,0.35,0.94,0.89,0.6,0.6,0.66,0.94,0.67,0.59,0.91,0.56],
'科目三':[0.95,0.85,0.71,0.77,0.64,0.67,0.56,0.64,0.49,0.71,0.45,0.5,0.58,0.62,0.84,0.43,0.5,0.61,0.68,0.7,0.67,0.81,0.77,0.92,0.84,0.99,0.97,0.85,0.93,0.95]})
二、绘图
1. 教练编号 : 科目二
X = data[['教练编号', '科目二']].values
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
labels = kmeans.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=labels)
plt.show()
2. 教练编号 : 科目三
X = data[['教练编号', '科目三']].values
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
labels = kmeans.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=labels)
plt.show()
3. 教练编号 : 科目三 : 科目二
X = data[['教练编号','科目二','科目三']].values
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
labels = kmeans.predict(X)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=labels)
plt.show()
三、分析(个人主观分析)
1.此模型可以看出四类的聚类数是合理的,但是就其聚类紧密度,聚类效果一般,聚类分离度也不是很大,存在一些离群点。
2.在散点图可以看出在0-5,5-15,15-25,25-30为区间,这些点在这些位置聚集明显,可以看的出不同的教练的学员的科目二、三通过率。方便下次选择。