从案例开始阅读Sklearn源代码【Cluster篇】

最新推荐文章于 2024-06-15 23:18:53 发布

皮皮黄的皮皮橙

最新推荐文章于 2024-06-15 23:18:53 发布

阅读量1.1k

点赞数

分类专栏：从案例开始阅读机器学习库源代码文章标签： sklearn python 机器学习

本文链接：https://blog.csdn.net/yangshi2015/article/details/122530644

版权

【系列说明】

	本系列用于复习与回顾机器学习的方法，总结算法流程，适当剖析源代码，列出适合算法的数据集，以及重要的调参参数。

案例1-中心漂移聚类(MeanShift)方法

数据集内容：
（1）numpy生成数据集，平面数据集，即二维向量，用矩阵 A(2*10000)表示
（2）设立三个数据的中心点 centers，分别为(1, 1), (-1, -1), (1, -1)
（3）每一类中数据点的标准差cluster_std 为0.6时，恰好(有数据粘合)能够区分这些类
（4）设置带宽，函数用作于mean-shift算法估计带宽，如果MeanShift函数没有传入bandwidth参数，MeanShift会自动运行estimate_bandwidth 函数说明如下：

def estimate_bandwidth(X, quantile=0.3, n_samples=None, random_state=0,
                       n_jobs=1):
			"""Estimate the bandwidth to use with the mean-shift algorithm.
		That this function takes time at least quadratic in n_samples. For large
		datasets, it's wise to set that parameter to a small value.
	
		Parameters
		----------
		X : array-like, shape=[n_samples, n_features]
			Input points.
	
		quantile : float, default 0.3
			should be between [0, 1]
			0.5 means that the median of all pairwise distances is used.
	
		n_samples : int, optional
			The number of samples to use. If not given, all samples are used.
	
		random_state : int or RandomState
			Pseudo-random number generator state used for random sampling.
	
		n_jobs : int, optional (default = 1)
			The number of parallel jobs to run for neighbors search.
			If ``-1``, then the number of jobs is set to the number of CPU cores.
	
		Returns
		-------
		bandwidth : float
			The bandwidth parameter.
		"""
	
		#根据r

最低0.47元/天解锁文章

皮皮黄的皮皮橙

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
从案例开始阅读Sklearn源代码【Cluster篇】

【系列说明】本系列用于复习与回顾机器学习的方法，总结算法流程，适当剖析源代码，列出适合算法的数据集，以及重要的调参参数。案例1-中心漂移聚类(MeanShift)方法数据集内容：（1）numpy生成数据集，平面数据集，即二维向量，用矩阵 A(2*10000)表示（2）设立三个数据的中心点 centers，分别为(1, 1), (-1, -1), (1, -1)（3）每一类中数据点的标准差cluster_std 为0.6时，恰好(有数据粘合)能够区分这些类（4）设置带宽，函数用作于mean-
复制链接

扫一扫

专栏目录