kmeans python_使用python+sklearn实现部分人脸图像字典的在线学习

本示例使用大量的人脸数据集来学习一组构成人脸的20 x 20图像块。

从编程的角度上来看,这是很有趣,因为它展示了如何使用scikit-learn的在线学习API来按块(chunks)处理非常大的数据集。学习的方式是一次加载一张图像,并从该图像中随机抽取50个色块,一旦我们累积了500个这样的色块(使用10张图像),就可以运行KMeans在线学习对象MiniBatchKMeans的 partial_fit 方法。 MiniBatchKMeans上的详细设置使我们能够看到,在连续调用部分训练(partial-fit)过程中重新分配了一些聚类,这是因为它们代表的块数量太少了,最好随机选择一个的新聚类。
31073a6f2ff290c9d0956bccb0238c2c.png
sphx_glr_plot_dict_face_patches_001
输出:
Learning the dictionary...Partial fit of  100 out of 2400Partial fit of  200 out of 2400[MiniBatchKMeans] Reassigning 16 cluster centers.Partial fit of  300 out of 2400Partial fit of  400 out of 2400Partial fit of  500 out of 2400Partial fit of  600 out of 2400Partial fit of  700 out of 2400Partial fit of  800 out of 2400Partial fit of  900 out of 2400Partial fit of 1000 out of 2400Partial fit of 1100 out of 2400Partial fit of 1200 out of 2400Partial fit of 1300 out of 2400Partial fit of 1400 out of 2400Partial fit of 1500 out of 2400Partial fit of 1600 out of 2400Partial fit of 1700 out of 2400Partial fit of 1800 out of 2400Partial fit of 1900 out of 2400Partial fit of 2000 out of 2400Partial fit of 2100 out of 2400Partial fit of 2200 out of 2400Partial fit of 2300 out of 2400Partial fit of 2400 out of 2400done in 2.16s.
print(__doc__)import timeimport matplotlib.pyplot as pltimport numpy as npfrom sklearn import datasetsfrom sklearn.cluster import MiniBatchKMeansfrom sklearn.feature_extraction.image import extract_patches_2dfaces = datasets.fetch_olivetti_faces()# ############################################################################## 学习图像字典print('Learning the dictionary... ')rng = np.random.RandomState(0)kmeans = MiniBatchKMeans(n_clusters=81, random_state=rng, verbose=True)patch_size = (20, 20)buffer = []t0 = time.time()# 在线学习部分:在整个数据集上循环6次index = 0for _ in range(6):    for img in faces.images:        data = extract_patches_2d(img, patch_size, max_patches=50,                                  random_state=rng)        data = np.reshape(data, (len(data), -1))        buffer.append(data)        index += 1        if index % 10 == 0:            data = np.concatenate(buffer, axis=0)            data -= np.mean(data, axis=0)            data /= np.std(data, axis=0)            kmeans.partial_fit(data)            buffer = []        if index % 100 == 0:            print('Partial fit of %4i out of %i'                  % (index, 6 * len(faces.images)))dt = time.time() - t0print('done in %.2fs.' % dt)# ############################################################################## 绘制结果plt.figure(figsize=(4.2, 4))for i, patch in enumerate(kmeans.cluster_centers_):    plt.subplot(9, 9, i + 1)    plt.imshow(patch.reshape(patch_size), cmap=plt.cm.gray,               interpolation='nearest')    plt.xticks(())    plt.yticks(())plt.suptitle('Patches of faces\nTrain time %.1fs on %d patches' %             (dt, 8 * len(faces.images)), fontsize=16)plt.subplots_adjust(0.08, 0.02, 0.92, 0.85, 0.08, 0.23)plt.show()
脚本的总运行时间:(0分钟3.372秒) 估计的内存使用量: 14 MB

adeb1f281d8ce85a723937f953105945.png

下载Python源代码: plot_dict_face_patches.py 下载Jupyter notebook源代码: plot_dict_face_patches.ipynb 由Sphinx-Gallery生成的画廊

文壹由“伴编辑器”提供技术支持

☆☆☆为方便大家查阅,小编已将scikit-learn学习路线专栏 文章统一整理到公众号底部菜单栏,同步更新中,关注公众号,点击左下方“系列文章”,如图:

77bcf80825d8928158aa0dc790fd5a36.png

欢迎大家和我一起沿着scikit-learn文档这条路线,一起巩固机器学习算法基础。(添加微信:mthler备注:sklearn学习,一起进【sklearn机器学习进步群】开启打怪升级的学习之旅。)

086eed36ddff241f33e9ce62beed124a.png

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
K-means是一种常用的聚类算法,而K-means++是K-means算法的优化版本,它能够更好地初始化聚类中心,从而得到更好的聚类效果。下面是Python中K-means和K-means++的实现方法。 K-means实现: ```python import numpy as np def kmeans(X, k, max_iter=100): n_samples, n_features = X.shape centroids = X[np.random.choice(n_samples, k, replace=False)] for i in range(max_iter): clusters = [[] for _ in range(k)] for idx, x in enumerate(X): distances = [np.linalg.norm(x - c) for c in centroids] clusters[np.argmin(distances)].append(idx) new_centroids = np.zeros((k, n_features)) for idx, cluster in enumerate(clusters): new_centroids[idx] = np.mean(X[cluster], axis=0) if np.allclose(new_centroids, centroids): break centroids = new_centroids return centroids, clusters ``` K-means++实现: ```python import numpy as np def kmeans_pp(X, k, max_iter=100): n_samples, n_features = X.shape centroids = [] # choose first centroid randomly idx = np.random.choice(n_samples, 1, replace=False) centroids.append(X[idx]) # choose the rest of the centroids using k-means++ algorithm for i in range(1, k): distances = np.zeros(n_samples) for j, x in enumerate(X): distances[j] = np.min([np.linalg.norm(x - c) for c in centroids]) probabilities = distances / np.sum(distances) cumulative_probabilities = np.cumsum(probabilities) idx = np.searchsorted(cumulative_probabilities, np.random.rand()) centroids.append(X[idx]) centroids = np.array(centroids) # run k-means algorithm with the initial centroids for i in range(max_iter): clusters = [[] for _ in range(k)] for idx, x in enumerate(X): distances = [np.linalg.norm(x - c) for c in centroids] clusters[np.argmin(distances)].append(idx) new_centroids = np.zeros((k, n_features)) for idx, cluster in enumerate(clusters): new_centroids[idx] = np.mean(X[cluster], axis=0) if np.allclose(new_centroids, centroids): break centroids = new_centroids return centroids, clusters ``` 这两个函数的输入参数相同,其中X是数据集,k是聚类数量,max_iter是最大迭代次数。函数返回聚类中心和每个数据点所属的聚类编号。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值