python聚类后的点很少_聚类后,使用python获取点ID [重复]

这段代码实现了一个K-means聚类算法,用于将数据点分配到最近的聚类中心。首先,根据数据集大小估计K值,然后随机选择K个初始聚类中心。接着,不断迭代,重新计算每个点的归属并更新聚类中心,直到聚类不再变化。算法返回每个聚类的中心坐标。
摘要由CSDN通过智能技术生成

frommathimportsqrtdefk_means(data_pts,k=None):""" Return k (x,y) pairs where:

k = number of clusters

and each

(x,y) pair = centroid of cluster

data_pts should be a list of (x,y) tuples, e.g.,

data_pts=[ (0,0), (0,5), (1,3) ]

"""""" Helper functions """deflists_are_same(la,lb):# see if two lists have the same elementsout=Falseforiteminla:ifitemnotinlb:out=Falsebreakelse:out=Truereturnoutdefdistance(a,b):# distance between (x,y) points a and breturnsqrt(abs(a[0]-b[0])**2+abs(a[1]-b[1])**2)defaverage(a):# return the average of a one-dimensional list (e.g., [1, 2, 3])returnsum(a)/float(len(a))""" Set up some initial values """ifkisNone:# if the user didn't supply a number of means to look for, try to estimate how many there aren=len(data_pts)#number of pointsinthe dataset

k=int(sqrt(n/2))# number of clusters - see# http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set#Rule_of_thumbifk<1:# make sure there's at least one clusterk=1""" Randomly generate k clusters and determine the cluster centers,

or directly generate k random points as cluster centers. """init_clusters=data_pts[:]# put all of the data points into clustersshuffle(init_clusters)# put the data points in random orderinit_clusters=init_clusters[0:k]# only keep the first k random clustersold_clusters,new_clusters={},{}foritemininit_clusters:old_clusters[item]=[]# every cluster has a list of points associated with it. Initially, it's 0while1:# just keep going forever, until our break condition is mettmp={}forkinold_clusters:# create an editable version of the old_clusters dictionarytmp[k]=[]""" Associate each point with the closest cluster center. """forpointindata_pts:# for each (x,y) data pointmin_clust=Nonemin_dist=1000000000# absurdly large, should be larger than the maximum distance for most data setsforpcintmp:# for every possible closest clusterpc_dist=distance(point,pc)ifpc_dist

min_clust=pc

tmp[min_clust].append(point)# add each point to its closest cluster's list of associated points""" Recompute the new cluster centers. """forkintmp:associated=tmp[k]xs=[pt[0]forptinassociated]# build up a list of x'sys=[pt[1]forptinassociated]# build up a list of y'sx=average(xs)# x coordinate of new clustery=average(ys)# y coordinate of new clusternew_clusters[(x,y)]=associated# these are the points the center was built off of, they're *probably* still associatediflists_are_same(old_clusters.keys(),new_clusters.keys()):# if we've reached equilibrium, return the pointsreturnold_clusters.keys()else:# otherwise, we'll go another round. let old_clusters = new_clusters, and clear new_clusters.old_clusters=new_clusters

new_clusters={}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值