使用树状图和Cophenetic相关性在python中进行分层聚类

介绍 (Introduction)In this article, we will take a look at an alternative approach to K Means clustering, popularly known as the Hierarchical Clustering. The hierarchical Clustering technique differs ...
摘要由CSDN通过智能技术生成

介绍 (Introduction)

In this article, we will take a look at an alternative approach to K Means clustering, popularly known as the Hierarchical Clustering. The hierarchical Clustering technique differs from K Means or K Mode, where the underlying algorithm of how the clustering mechanism works is different. K Means relies on a combination of centroid and euclidean distance to form clusters, hierarchical clustering on the other hand uses agglomerative or divisive techniques to perform clustering. Hierarchical clustering allows visualization of clusters using dendrograms that can help in better interpretation of results through meaningful taxonomies. Creating a dendrogram doesn’t require us to specify the number of clusters upfront.

在本文中,我们将介绍K均值聚类的另一种方法,通常称为“层次聚类”。 分层聚类技术不同于K均值或K模式,后者的聚类机制如何工作的基础算法不同。 K Means依靠质心和欧几里得距离的组合来形成聚类,另一方面,层次聚类则使用凝聚或分裂技术进行聚类。 分层聚类允许使用树状图可视化聚类,这有助于通过有意义的分类法更好地解释结果。 创建树状图不需要我们预先指定群集数。

Programming languages like R, Python, and SAS allow hierarchical clustering to work with categorical data making it easier for problem statements with categorical variables to deal with.

诸如R,Python和SAS之类的编程语言允许分层聚类与分类数据一起使用,从而使带有分类变量的问题陈述更易于处理。

层次聚类中的重要术语 (Important Terms in Hierarchical Clustering)

链接方法 (Linkage Methods)

Suppose there are (a) original observations a[0],…,a[|a|−1] in cluster (a) and (b) original objects b[0],…,b[|b|−1] in cluster (b), then in order to combine these clusters we need to calculate the distance between two clusters (a) and (b). Say a point (d) exists that hasn’t been allocated to any of the clusters, we need to compute the distance between cluster (a) to (d) and between cluster (b) to (d).

假设在群集(a)中有(a)个原始观测值a [0],…,a [| a | -1],在(b)中有(b)个原始对象b [0],…,b [| b | -1]聚类(b),然后为了合并这些聚类,我们需要计算两个聚类(a)和(b)之间的距离。 假设存在尚未分配给任何群集的点(d),我们需要计算群集(a)至(d)之间以及群集(b)至(d)之间的距离。

Now clusters usually have multiple points in them that require a different approach for the distance matrix calculation. Linkage decides how the distance between clusters, or point to cluster distance is computed. Commonly used linkage mechanisms are outlined below:

现在,群集中通常具有多个点,因此需要不同的距离矩阵计算方法。 链接决定如何计算聚类之间的距离或点到聚类的距离。 常用的链接机制概述如下:

  1. Single Linkage — Distances between the most similar members for each pair of clusters are calculated and then clusters are merged based on the shortest distance

    单一链接-计算每对集群中最相似成员之间的距离,然后根据最短距离合并集群
  2. Average Linkage — Distance between all members of one cluster is calculated to all other members in a different cluster. The average of these distances is then utilized to decide which clusters will merge

    平均链接-计算一个群集中所有成员到另一群集中所有其他成员之间的距离。 然后,利用这些距离的平均值来确定哪些聚类将合并
  3. Complete Linkage — Distances between the most dissimilar members for each pair of clusters are calculated and then clusters are merged based on the shortes
  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值