一、 K均值聚类为Unsupervised learning,默认使用欧氏距离。 from sklearn.cluster import KMeans,k_means
第1.cluster.KMeans([n_clusters, init, n_init, ...]) K均值聚类
第2.cluster.k_means(X, n_clusters[, init, ...]) K均值聚类算法
(1)从目的和源码来看:因为它们目的不一样
第2:k_means为K均值聚类算法只是对数据集进行k簇聚类(即为还原K均值聚类算法),所以要在k_means(X, n_clusters[, init, ...])里直接输入X(数据集)
第1:KMeans为K均值聚类,先在KMeans([n_clusters, init, n_init, ...])确定k簇聚类,然后KMeans源码内含子方法fit(计算k - means聚类)与predict(预测X中每个样本所属的最接近的群集),所以可以说它升华成了一个无监督的machine learning model
二、实例
数据集为
1.
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv("C:/Users/CWY/Desktop/deeplearn/Personalized-recommend-master/test/three_class_data.csv")
x = data[["x","y"]]
model = KMeans(n_clusters=3)
model.fit(x)
x_min, x_max = data['x'].min() - 1, data['x'].max() + 1
y_min, y_max = data['y'].min() - 1, data['y'].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, .01), np.arange(y_min, y_max, .01))
result = model.predict(np.c_[xx.ravel(), yy.ravel()])
result = result.reshape(xx.shape)
plt.contourf(xx, yy, result, cmap=plt.cm.Greens)
plt.scatter(data['x'], data['y'], c=model.labels_, s=15)
center = model.cluster_centers_
plt.scatter(center[:, 0], center[:, 1], marker='p', linewidths=2, color='b', edgecolors='w', zorder=20)
plt.show()
2.
from sklearn.cluster import k_means
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv("C:/Users/CWY/Desktop/deeplearn/Personalized-recommend-master/test/three_class_data.csv")
x = data[["x", "y"]]
model = k_means(x, n_clusters=3)
plt.scatter(data['x'], data['y'], c=model[1])
plt.show()