1. KMeans 对数据聚类
from sklearn.datasets import make_blobs
blobs, classes = make_blobs(500, centers=3)
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(blobs)
import matplotlib.pyplot as plt
import numpy as np
% matplotlib inline
f, ax = plt.subplots(figsize=(7.5, 7.5))
rgb = np.array(['r', 'g', 'b'])
ax.scatter(blobs[:, 0], blobs[:, 1], color=rgb[classes])
ax.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker='*', s=250, color='black', label='Centers')
ax.set_title('Blobs')
f.show()
labels_ 属性会产生每个点的预期标签
>>> kmean.labels_[:5]
array([1, 1, 2, 2, 1], dtype=int32)
transform 函数十分有用,它会输出每个点到形心的距离