sklearn聚类算法meanshift

官方参考文档

算法原理

翻譯器:https://cn.bing.com/translator
手動修改
這是一種無監督學習聚類算法,不需要知道標簽和要分成幾類

MeanShift clustering aims to discover blobs in a smooth density of samples. It is a centroid based algorithm, which works by updating candidates for centroids to be the mean of the points within a given region. These candidates are then filtered in a post-processing stage to eliminate near duplicates to form the final set of centroids.

【平均移聚類能夠在樣本密度平滑的樣本數據中發現聚類中心。它是一種基於質心(centroid)的算法,原理是計算給定區域內點的質心並以此作爲聚類中心的新 candidate 值。然後在後續階段篩選這些 candidate 值,以消除幾乎重複的匹配項,從而形成最終的聚類中心。】

Given a candidate centroid for iteration , the candidate is updated according to the following equation:
【給定候選質心,根據以下等式更新候選質心:】

x i t + 1 = m ( x i t ) x_i^{t+1} = m(x_i^t) xit+1=m(xit)

Where N ( x i ) N(x_i) N(xi) is the neighborhood of samples within a given distance around x i x_i xi and m m m is the mean shift vector that is computed for each centroid that points towards a region of the maximum increase in the density of points. This is computed using the following equation, effectively updating a centroid to be the mean of the samples within its neighborhood:
N ( x i ) N(x_i) N(xi) x i x_i xi 在給定距離的鄰域內的樣本集, m m m 為指向點密度增加最快的方向的mean shift vector , 對每個質心都計算出一個 m m m 向量。使用以下方程計算,有效地將質心值更新為其鄰域內樣本的平均值:】
m ( x i ) = ∑ x j ∈ N ( x i ) K ( x j − x i ) x j ∑ x j ∈ N ( x i ) K ( x j − x i ) m(x_i) = \frac{\sum_{x_j \in N(x_i)}K(x_j - x_i)x_j}{\sum_{x_j \in N(x_i)}K(x_j - x_i)} m(xi)=xjN(xi)K(xjxi)xjN(xi)K(xjxi)xj

The algorithm automatically sets the number of clusters, instead of relying on a parameter bandwidth, which dictates the size of the region to search through. This parameter can be set manually, but can be estimated using the provided estimate_bandwidth function, which is called if the bandwidth is not set.
【該算法會自動設置cluster數量,而不是依賴於參數bandwidth,bandwidth指示要搜索的區域大小。可以手動設置此參數,但可以使用提供 的estimate_bandwidth函數進行估計,如果未設置bandwidth,則會自動調用該函數。】

The algorithm is not highly scalable, as it requires multiple nearest neighbor searches during the execution of the algorithm. The algorithm is guaranteed to converge, however the algorithm will stop iterating when the change in centroids is small.
【該算法不是高度可擴展的,因為它需要在算法執行期間進行次個最近的鄰域搜索。該演算法保證收斂,但當質心變化較小時,算法將停止。】

算法實現

Reference :

sklearn.cluster.MeanShift

用法可參考本人上一篇博客:sklearn聚类算法affinity propagation

程序:

import numpy as np
from sklearn.cluster import MeanShift, estimate_bandwidth
from sklearn.datasets import make_blobs

# #############################################################################
# Generate sample data
centers = [[1, 1], [-1, -1], [1, -1]]
X, _ = make_blobs(n_samples=10000, centers=centers, cluster_std=0.6)

# #############################################################################
# Compute clustering with MeanShift

# The following bandwidth can be automatically detected using
bandwidth = estimate_bandwidth(X, quantile=0.2, n_samples=500)

ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
ms.fit(X)
# 每个样本的标签,长度为样本个数的1D向量
labels = ms.labels_
# 聚类中心
cluster_centers = ms.cluster_centers_
# 标签的种类
labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)

print("number of estimated clusters : %d" % n_clusters_)

# #############################################################################
# Plot result
import matplotlib.pyplot as plt
from itertools import cycle

plt.figure(1)
plt.clf()

colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
    my_members = labels == k
    cluster_center = cluster_centers[k]
    plt.plot(X[my_members, 0], X[my_members, 1], col + '.')
    plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
             markeredgecolor='k', markersize=14)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值