scikit-learn源码学习之cluster.MeanShift

最新推荐文章于 2024-05-15 09:08:53 发布

机器变得更残忍

最新推荐文章于 2024-05-15 09:08:53 发布

阅读量7.6k

点赞数 11

分类专栏：机器学习 python 文章标签： sklearn 聚类 python 机器学习源码

本文链接：https://blog.csdn.net/jiaqiangbandongg/article/details/53557500

版权

聚类部分的mean-shift算法终于看完了，网上这部分资料还是有些的，都是令人头疼数学公式，不过不如直接读源码来得直接些。

执行mean-shift算法的核心函数源码地址

def mean_shift(X, bandwidth=None, seeds=None, bin_seeding=False,
               min_bin_freq=1, cluster_all=True, max_iter=300,
               n_jobs=1):
    """Perform mean shift clustering of data using a flat kernel.

    Read more in the :ref:`User Guide <mean_shift>`.

    Parameters
    ----------

    X : array-like, shape=[n_samples, n_features]
        Input data.

    bandwidth : float, optional
        Kernel bandwidth.

        If bandwidth is not given, it is determined using a heuristic based on
        the median of all pairwise distances. This will take quadratic time in
        the number of samples. The sklearn.cluster.estimate_bandwidth function
        can be used to do this more efficiently.

    seeds : array-like, shape=[n_seeds, n_features] or None
        Point used as initial kernel locations. If None and bin_seeding=False,
        each data point is used as a seed. If None and bin_seeding=True,
        see bin_seeding.

    bin_seeding : boolean, default=False
        If true, initial kernel locations are not locations of all
        points, but rather the location of the discretized version of
        points, where points are binned onto a grid whose coarseness
        corresponds to the bandwidth. Setting this option to True will speed
        up the algorithm because fewer seeds will be initialized.
        Ignored if seeds argument is not None.

    min_bin_freq : int, default=1
       To speed up the algorithm, accept only those bins with at least
       min_bin_freq points as seeds.

    cluster_all : boolean, default True
        If true, then all points are clustered, even those orphans that are
        not within any kernel. Orphans are assigned to the nearest kernel.
        If false, then orphans are given cluster label -1.

    max_iter : int, default 300
        Maximum number of iterations, per seed point before the clustering
        operation terminates (for that seed point), if has not converged yet.

    n_jobs : int
        The number of jobs to use for the computation. This works by computing
        each of the n_init runs in parallel.

        If -1 all CPUs are used. If 1 is given, no parallel computing code is
        used at all, which is useful for debugging. For n_jobs below -1,
        (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
        are used.

        .. versionadded:: 0.17
           Parallel Execution using *n_jobs*.

    Returns
    -------

    cluster_centers : array, shape=[n_clusters, n_features]
        Coordinates of cluster centers.

    labels : array, shape=[n_samples]
        Cluster labels for each point.

    Notes
    -----
    See examples/cluster/plot_mean_shift.py for an example.

    """
    #没有定义bandwidth执行函数estimate_bandwidth估计带宽
    if bandwidth is None</

最低0.47元/天解锁文章

机器变得更残忍

关注

11
点赞
踩
23

收藏

觉得还不错? 一键收藏
1
评论
scikit-learn源码学习之cluster.MeanShift

聚类部分的mean-shift算法终于看完了，网上这部分资料还是有些的，都是令人头疼数学公式，不过不如直接读源码来得直接些。执行mean-shift算法的核心函数源码地址def mean_shift(X, bandwidth=None, seeds=None, bin_seeding=False, min_bin_freq=1, cluster_all=True, m
复制链接

扫一扫