python聚类后的点很少_聚类后，使用python获取点ID [重复]

最新推荐文章于 2022-10-21 11:00:25 发布

唐杉

最新推荐文章于 2022-10-21 11:00:25 发布

阅读量172

点赞数

文章标签： python聚类后的点很少

本文链接：https://blog.csdn.net/weixin_36238568/article/details/113987729

版权

这段代码实现了一个K-means聚类算法，用于将数据点分配到最近的聚类中心。首先，根据数据集大小估计K值，然后随机选择K个初始聚类中心。接着，不断迭代，重新计算每个点的归属并更新聚类中心，直到聚类不再变化。算法返回每个聚类的中心坐标。

摘要由CSDN通过智能技术生成

frommathimportsqrtdefk_means(data_pts,k=None):""" Return k (x,y) pairs where:

k = number of clusters

and each

(x,y) pair = centroid of cluster

data_pts should be a list of (x,y) tuples, e.g.,

data_pts=[ (0,0), (0,5), (1,3) ]

"""""" Helper functions """deflists_are_same(la,lb):# see if two lists have the same elementsout=Falseforiteminla:ifitemnotinlb:out=Falsebreakelse:out=Truereturnoutdefdistance(a,b):# distance between (x,y) points a and breturnsqrt(abs(a[0]-b[0])**2+abs(a[1]-b[1])**2)defaverage(a):# return the average of a one-dimensional list (e.g., [1, 2, 3])returnsum(a)/float(len(a))""" Set up some initial values """ifkisNone:# if the user didn't supply a number of means to look for, try to estimate how many there aren=len(data_pts)#number of pointsinthe dataset

k=int(sqrt(n/2))# number of clusters - see# http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set#Rule_of_thumbifk<1:# make sure there's at least one clusterk=1""" Randomly generate k clusters and determine the cluster centers,

or directly generate k random points as cluster centers. """init_clusters=data_pts[:]# put all of the data points into clustersshuffle(init_clusters)# put the data points in random orderinit_clusters=init_clusters[0:k]# only keep the first k random clustersold_clusters,new_clusters={},{}foritemininit_clusters:old_clusters[item]=[]# every cluster has a list of points associated with it. Initially, it's 0while1:# just keep going forever, until our break condition is mettmp={}forkinold_clusters:# create an editable version of the old_clusters dictionarytmp[k]=[]""" Associate each point with the closest cluster center. """forpointindata_pts:# for each (x,y) data pointmin_clust=Nonemin_dist=1000000000# absurdly large, should be larger than the maximum distance for most data setsforpcintmp:# for every possible closest clusterpc_dist=distance(point,pc)ifpc_dist

min_clust=pc

tmp[min_clust].append(point)# add each point to its closest cluster's list of associated points""" Recompute the new cluster centers. """forkintmp:associated=tmp[k]xs=[pt[0]forptinassociated]# build up a list of x'sys=[pt[1]forptinassociated]# build up a list of y'sx=average(xs)# x coordinate of new clustery=average(ys)# y coordinate of new clusternew_clusters[(x,y)]=associated# these are the points the center was built off of, they're *probably* still associatediflists_are_same(old_clusters.keys(),new_clusters.keys()):# if we've reached equilibrium, return the pointsreturnold_clusters.keys()else:# otherwise, we'll go another round. let old_clusters = new_clusters, and clear new_clusters.old_clusters=new_clusters

new_clusters={}