MeanShift参数含义

MeanShift主要是用高斯核函数实现质心漂移的算法;该算法不需要定义簇的个数,只需要规定质心圆的半径,之后通过计算圆内质心到所有点的向量距离的均值,如果圆内其他点作为质心的均值距离都小于该质心,那么该圆就不再继续移动

bin_seeding用来设定初始核的位置参数的生成方式,default False,默认采用所有点的
位置平均,当改为True时使用离散后的点的平均,前者比后者慢。
bandwidth:表示带宽,可以理解为设定的质心圆的半径

bandwidth = estimate_bandwidth(X, quantile=0.2, n_samples=500)
ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
ms.fit(X)
labels = ms.labels_
cluster_centers = ms.cluster_centers_

labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)
print("number of estimated clusters : %d" % n_clusters_)

estimate_bandwidth()用于生成mean-shift窗口的尺寸,其参数的意义为:从X中随机选取500个样本,计算每一对样本的距离,然后选取这些距离的0.2分位数作为返回值,显然当n_samples很大时,这个函数的计算量是很大的。
np.unique(labels)返回labels不同取值的个数,这里用于统计聚类后类别的个数。
MeanShift类的构造函数MeanShift()是重点,其原型为:

MeanShift(bandwidth=None, seeds=None, bin_seeding=False, min_bin_freq=1,cluster_all=True, n_jobs=1)

其参数的意义为:
bandwidth:float, Bandwidth used in the RBF(Radical Basis Function,径向基函数) kernel. If not given, the bandwidth is estimated using sklearn.cluster.estimate_bandwidth.
seeds:array, shape=[n_samples, n_features], Seeds used to initialize kernels. If not set, the seeds are calculated by clustering.get_bin_seeds with bandwidth as the grid size and default values for other parameters.
bin_seeding: boolean, If true, initial kernel locations are not locations of all points, but rather the location of the discretized version of points, where points are binned onto a grid whose coarseness(粒度) corresponds to the bandwidth. Setting this option to True will speed up the algorithm because fewer seeds will be initialized. Ignored if seeds argument is not None.
min_bin_freq: int, optional, To speed up the algorithm, accept only those bins with at least min_bin_freq points as seeds, default 1.
cluster_all: If true, then all points are clustered, even those orphans that are not within any kernel. Orphans are assigned to the nearest kernel. If false, then orphans are given cluster label -1.
n_jobs:The number of jobs to use for the computation. This works by computing each of the n_init runs in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
MeanShift类的其他常用函数以及属性:
**cluster_centers_ **: array, [n_clusters, n_features].Coordinates of cluster centers.
labels_ : Labels of each point.
fit(X):Perform clustering.

  • 2
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值