Python 第三方模块机器学习 Scikit-Learn模块无监督学习1 聚类1

最新推荐文章于 2022-08-23 00:30:04 发布

EdVzAs

最新推荐文章于 2022-08-23 00:30:04 发布

阅读量234

点赞数

文章标签： python 机器学习聚类

本文链接：https://blog.csdn.net/weixin_46131409/article/details/113446743

版权

Python 同时被 2 个专栏收录

135 篇文章 3 订阅

订阅专栏

机器学习

66 篇文章 5 订阅

订阅专栏

一.cluster
1.简介:

该模块用于进行聚类

2.使用
(1)类:

"亲和度传播聚类算法/仿射传播聚类算法/近邻传播聚类算法"(Affinity Propagation Clustering;AP Clustering):class cluster.AffinityPropagation([damping=0.5,max_iter=200,convergence_iter=15,copy=True,preference=None,affinity='euclidean',verbose=False,random_state='warn'])
  #参数说明:
	damping:指定"阻尼系数"(Damping factor);0.5<=float<=1
	max_iter:指定最大迭代次数;为int
	convergence_iter:指定停止迭代前簇的数量没有变化的迭代次数;为int
	copy:指定是否必定复制数据;为bool
	preference:指定各个数据点的"偏好值/参考度"(preference);为float/1×n_samples array-like
	  #偏好值越大的点越可能被选为"聚类中心"(exemplar)
	affinity:指定如何计算数据点间的"亲和度"(affinity);为"euclidean"/"precomputed"
	verbose:指定输出信息的冗余度;为int/bool
	random_state:指定使用的随机数;为int/RandomState instance/None

######################################################################################################################

"凝聚聚类算法"(Agglomerative Clustering):class cluster.AgglomerativeClustering([n_clusters=2,affinity='euclidean',memory=None,connectivity=None,compute_full_tree='auto',linkage='ward',distance_threshold=None,compute_distances=False])
  #参数说明:
	n_clusters:指定簇的数量;为int/None
	affinity:指定"连接距离"(linkage distance)的度量;为"euclidean"/"l1"/"l2"/"manhattan"/"cosine"/"precomputed"/callable
	memory:指定缓存区;为str(路径)/object with the joblib.Memory interface/None(无缓存)
	connectivity:指定"连接矩阵"(connectivity matrix);为array-like/callable
	compute_full_tree:指定是否计算完整的树而不提前停止;为"auto"/bool
	  #若簇的数量与样本数相比并不少,提前停止对减少计算时间非常有用
	  #仅当指定了连接矩阵时提前停止才有用
	  #当改变簇的数量并使用缓存时,计算完整的树可能更有利
	  #当distance_threshold非None时必须为True
	linkage:指定连接标准;为"ward"/"complete"/"average"/"single"
	  #该算法将最小化该标准
	distance_threshold:指定用于确定是否合并簇的连接距离的阈值;为float
	  #连接距离高于该值的簇不会被合并
	compute_distances:指定是否必定计算簇间的连接距离;为bool

######################################################################################################################

"BIRCH聚类算法"(BIRCH Clustering Algorithm):class cluster.Birch([threshold=0.5,branching_factor=50,n_clusters=3,compute_labels=True,copy=True])
  #参数说明:
	threshold:指定"子簇"(subcluster)的半径的阈值;为float
	  #若子簇的半径超过该值,则将启动1个新的子簇
	branching_factor:指定"分支因子"(branching factor);为int
	  #即每个节点中"CF子簇"(CF subclusters)的最大数量
	n_clusters:指定簇的数量;为int/sklearn.cluster model instance/None
	compute_labels:指定是否为每次拟合计算标签;为bool
	copy:指定是否必定复制数据;为bool

######################################################################################################################

"DBSCAN聚类算法"(DBSCAN Clustering):class cluster.DBSCAN([eps=0.5,min_samples=5,metric='euclidean',metric_params=None,algorithm='auto',leaf_size=30,p=None, n_jobs=None])
  #参数说明:
	eps:指定近邻的最远距离;为float
	  #距离超出该值的样本不会被视为近邻
	min_samples:指定被视为聚类中心的数据点的邻域内的最小样本数(包括该点本身);为int/float
	  #邻域内数据点数少于该值的点不会被视为聚类中心
	metric:指定距离的度量;为str/callable
	metric_params:指定要传入metric的其他参数;为dict
	algorithm:指定用于寻找最近邻的算法;为"auto"/"ball_tree"/"kd_tree"/"brute"
	leaf_size:指定BallTree/CKDTree的"叶大小"(Leaf size);为int
	p:指定用于计算数据点间距离的闵可夫斯基度量的幂;为float
	n_jobs:指定用于并行计算的任务数;为int

######################################################################################################################

凝聚特征:class cluster.FeatureAgglomeration([n_clusters=2,affinity='euclidean',memory=None,connectivity=None,compute_full_tree='auto',linkage='ward',pooling_func=<function mean>,distance_threshold=None,compute_distances=False]])
  #参数说明:其他参数同class cluster.AgglomerativeClustering()
	pooling_func:指定如何将凝聚的特征转换为单一值;为callable,接受1个M×N array和关键字参数axis=1,返回1个1×M array

######################################################################################################################

"K-均值聚类算法"(K-Means Clustering):class cluster.KMeans([n_clusters=8,init='k-means++',n_init=10,max_iter=300,tol=0.0001,precompute_distances='deprecated',verbose=0,random_state=None,copy_x=True,n_jobs='deprecated',algorithm='auto'])
  #参数说明:其他参数同class cluster.AffinityPropagation()
    n_clusters:指定簇的数量;为int/None
	init:指定如何初始化;为"k-means++"/"random"/callable/n_clusters×n_features array-like
	n_init:指定用不同初始值进行尝试的次数;为int
	max_iter:指定单次尝试的最大迭代次数;为int
	tol:指定最小改进;为float
	  #若2次迭代间聚类中心的差别小于该值,则停止
	precompute_distances:指定是否预计算距离;为"auto"/bool
	copy_x:指定是否必定复制传入的数据;为bool
	n_jobs:指定用于并行计算的任务数;为int

######################################################################################################################

"小批量K-均值聚类算法"(Mini-Batch K-Means Clustering):class cluster.MiniBatchKMeans([n_clusters=8,init='k-means++',max_iter=100,batch_size=100,verbose=0,compute_labels=True,random_state=None,tol=0.0,max_no_improvement=10,init_size=None,n_init=3,reassignment_ratio=0.01])
  #参数说明:其他参数同class cluster.KMeans()
	batch_size:指定"小批"(mini batch)的大小;为int
	compute_labels:指定是否为整个数据集计算标签和"簇惯性"(cluster inertia);为bool
	  #簇惯性即所有样本到其所属簇的聚类中心的平方距离的和
	max_no_improvement:指定没有改进时的最大迭代次数;为int
	init_size:指定为加快初始化而随机抽样的样本数;为int
	reassignment_ratio:Control the fraction of the maximum number of counts for a center to be reassigned;为float
	  #reassignment_ratio越大,花费时间越长,结果的质量越好

######################################################################################################################

基于"水平核"(Flat Kernel)的"均值漂移聚类算法"(Mean Shift Clustering):class cluster.MeanShift([bandwidth=None,seeds=None,bin_seeding=False,min_bin_freq=1,cluster_all=True,n_jobs=None,max_iter=300])
  #参数说明:其他参数同class cluster.KMeans()
	bandwidth:指定RBF核的带宽;为float
	seeds:指定用于初始化核的种子;为n_samples×n_features array-like/None
	bin_seeding:If true,initial kernel locations are locations of the discretized version of points,where points are binned onto a grid whose coarseness corresponds to the bandwidth
	            If False,initial kernel locations are locations of all points
      #设为True会加快运算,因为需要初始化的种子将减少;若seeds=None,则忽略该参数
	min_bin_freq:指定bin中最少的种子数;为int
	  #不接受种子数低于该值的bin,因而可用于加快运算
	cluster_all:指定是否对所有数据点(包括不在任何核中的孤立点)都进行聚类;为bool

######################################################################################################################

通过向量数组估计聚类结构:class sklearn.cluster.OPTICS([min_samples=5,max_eps=inf,metric='minkowski',p=2,metric_params=None,cluster_method='xi',eps=None,xi=0.05,predecessor_correction=True,min_cluster_size=None,algorithm='auto',leaf_size=30,n_jobs=None])
  #参数说明:其他参数同class cluster.DBSCAN()
	max_eps:指定近邻点的最远距离;为float
	  #距离超过该值的数据点不会被视为近邻点
	  #功能和eps相同,但eps仅当cluster_method="dbscan"时有效
	cluster_method:指定提取簇的方法;为"xi"/"dbscan"
	xi:指定构成簇的边界的"可达性图"(reachability plot)的最小"陡度"(steepness);为0<=float<=1
	predecessor_correction:指定是否根据OPTICS之前得到的结果修正簇;为bool
	min_cluster_size:指定簇中的最小样本数;为int>1/0<=float<=1

######################################################################################################################

"光谱聚类算法"(Spectral Clustering):class cluster.SpectralClustering([n_clusters=8,eigen_solver=None,n_components=None,random_state=None,n_init=10,gamma=1.0,affinity='rbf',n_neighbors=10,eigen_tol=0.0,assign_labels='kmeans',degree=3,coef0=1,kernel_params=None,n_jobs=None,verbose=False])
  #即apply clustering to a projection of the normalized Laplacian
  #参数说明:其他参数同class cluster.KMeans()
    n_clusters:指定"投影子空间"(projection subspace)的维数;为int
	eigen_solver:指定使用的特征值分解策略;为"arpack"/"lobpcg"/"amg"
	n_components:指定用于"谱嵌入"(spectral embedding)的特征向量数;为int
	gamma:指定内核的gamma系数;为float
	  #仅用于rbf/poly/Sigmoid/laplacian/chi2内核
	affinity:指定如何构造"关联矩阵"(affinity matrix);为"nearest_neighbors"/"rbf"/"precomputed"/"precomputed_nearest_neighbors"/callable
	n_neighbors:指定使用最近邻方法构造关联矩阵时使用的近邻数;为int
	  #当affinity="rbf"时忽略该参数
	eigen_tol:指定对拉普拉斯矩阵进行特征分解的停止准则;为float
	  #仅当eigen_solver="arpack"时有效
	assign_labels:指定在"嵌入空间"(embedding space)中分配标签的策略;为"kmeans"/"discretize"
	degree:指定内核的degree系数;为float
	  #仅用于polynomial kernel
	coef0:指定核的coef0参数;为float
	  #仅用于polynomial/sigmoid kernel
	kernel_params:指定要传递给kernel的其他参数;为mapping of str to anything

######################################################################################################################

"光谱双聚类算法"(Spectral Biclustering):class sklearn.cluster.SpectralBiclustering([n_clusters=3,method='bistochastic',n_components=6,n_best=3,svd_method='randomized',n_svd_vecs=None,mini_batch=False,init='k-means++',n_init=10,n_jobs='deprecated',random_state=None])
  #参数说明:其他参数同class cluster.KMeans()
	n_clusters:指定棋盘结构中的行和列的簇数;为int/tuple,格式为(n_row_clusters,n_column_clusters)
	method:指定将奇异向量归一化并转换为"双聚类"(bicluster)的方法;为"bistochastic"/"scale"/"log"
	n_components:指定要检查的奇异向量数;为int
	n_best:指定将数据投影到其上的最好的奇异向量的个数;为int
	  #Number of best singular vectors to which to project the data for clustering
	svd_method:指定用于寻找奇异向量的算法;为"randomized"/"arpack"
	n_svd_vecs:指定用于SVD的向量数;为int
	mini_batch:指定是否使用小批量K-均值算法;为bool

######################################################################################################################

"光谱联合聚类算法"(Spectral Co-Clustering Algorithm):class sklearn.cluster.SpectralCoclustering([n_clusters=3,svd_method='randomized',n_svd_vecs=None,mini_batch=False,init='k-means++',n_init=10,n_jobs='deprecated',random_state=None])
  #参数说明:其他参数同class sklearn.cluster.SpectralBiclustering()
	n_clusters:指定"双聚类"(bicluster)数;为int

EdVzAs

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python 第三方模块机器学习 Scikit-Learn模块无监督学习1 聚类1

官方文档:https://scikit-learn.org/0.17/modules/classes.html#module-sklearn.cluster一.cluster1.简介:sklearn.cluster是sklearn模块中用于解决聚类问题的子模块2.类:cluster.AffinityPropagation([damping=0.5,max_iter=200,convergence_iter=15,copy=True,preference=None,affinity='euclid
复制链接

扫一扫