scipy cluster库聚类方法-附python代码

最新推荐文章于 2022-07-21 09:52:09 发布

gao_vip

最新推荐文章于 2022-07-21 09:52:09 发布

阅读量1.5k

点赞数

分类专栏：机器学习篇文章标签： scipy python 聚类 kmeans算法 clustering

本文链接：https://blog.csdn.net/weixin_41233157/article/details/103459643

版权

机器学习篇专栏收录该内容

17 篇文章 3 订阅

订阅专栏

聚类的基本思路：

对大量未知标注的数据集，按数据的内在相似性将数据集划分为多个类别，是类别内数据相似度较大，而类别间相似度较小。主要有以下几步：

选择数据
初始化中心点，随机选取n个
将离数据点近的点划分到相应类
更新类的中心
重新将离数据近的点划分到相应类
重复以上两个步骤，直到类中心不再变化

相似度计算主要有以下几种距离：

Minkowski
明氏距离
欧氏距离
契比雪夫距离
杰卡德距离

在这里插入图片描述

scipy.cluster是scipy下的一个做聚类的package, 可以层次聚类和k-means 聚类
scipy.cluster.hierarchy.fcluster(Z, t, criterion=‘inconsistent’, depth=2, R=None, monocrit=None)[source]

代码示例

import scipy.randn as rd
import scipy.cluster.hierarchy as sch
from scipy.cluster.vq import vq,kmeans,whiten
import matplotlib.pylab as plt

#1. 层次聚类
def hierarchy_cal(points):
    dismatt = sch.distance.pdist(points, 'euclidean')    # 欧氏距离矩阵
    zz = sch.linkage(dismatt, method='average')
    pp = sch.dendrogram(zz)   # 树状图表示出来并保存
    plt.savefig('cluster_result.png')
    cluster_re = sch.fcluster(zz, t=1, criterion='inconsistent')
    print('hierarchy result:\n',cluster_re)
    return cluster_re
    
#2.k-means聚类
def kmean_cal(points):
    data = whiten(points)
    cluster_re = hierarchy_cal(points)
    # 聚类最大数可以层次聚类的结果，也可自己设定。[0]聚类中心,[1]损失distortion
    centroid = kmeans(data,max(cluster_re))[0] 
    label = vq(data,centroid)[0]
    print('k-means result:\n',label)
    return label

聚类结果
在这里插入图片描述

gao_vip

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
scipy cluster库聚类方法-附python代码

scipy cluster库简介scipy.cluster是scipy下的一个做聚类的package, 共包含了两类聚类方法:矢量量化(scipy.cluster.vq):支持vector quantization 和 k-means 聚类方法层次聚类(scipy.cluster.hierarchy):支持hierarchical clustering 和 agglomerative cl...
复制链接

扫一扫