python层次聚类算法_机器学习-层级聚类算法(Hierarchy Cluster)

最新推荐文章于 2022-02-21 23:09:55 发布

weixin_39756540

最新推荐文章于 2022-02-21 23:09:55 发布

阅读量270

点赞数

文章标签： python层次聚类算法

本文链接：https://blog.csdn.net/weixin_39756540/article/details/111525776

版权

Section I: Brief Introduction on Hierarchy Cluster

The two standard algorithms for agglomerative hierarchical clustering are single linkage and complete linkage. Using single linkage, the distances between the most similar members for each pair of clusters and merge the two clusters for which the distance between the most similar members is the smallest. With respect to complete linkage, the approach is similar to single linkage but, instead of comparing the most similar members in each pair of clusters, it compare the most dissimilar members to perform the merge.

Hierarchical complete linkage clustering is an iterative procedure that can be summarized by the following steps:

Step 1: Compute the distance matrix of all samples (Euclidean Distance)

Step 2: Represent each data point as a singleton cluster

Step 3: Merge the two closest clusters based on the distance between the most similar/dissimilar (distant) members

Step 4: Update similarity matrix

Step 5: Repeat steps 2-4 until one single cluster remains

FROM

Sebastian Raschka, Vahid Mirjalili. Python机器学习第二版. 南京：东南大学出版社，2018.

第一部分: 数据初始化

代码

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import warnings

warnings.filterwarnings("ignore")

plt.rcParams['figure.dpi']=200

plt.rcParams['savefig.dpi']=200

font = {'weight': 'light'}

plt.rc("font", **font)

np.random.seed(123)

#Section 1: Generate random data

variables=['X','Y','Z']

labels=['ID_1','ID_2','ID_3','ID_4','ID_5']

X=np.random.random_sample([5,3])*10

df=pd.DataFrame(X,columns=variables,index=labels)

print("Original DataFrame:\n",df)1

结果

Original DataFrame:

X Y Z

ID_1 6.964692 2.861393 2.268515

ID_2 5.513148 7.194690 4.231065

ID_3 9.807642 6.848297 4.809319

ID_4 3.921175 3.431780 7.290497

ID_5 4.385722 0.596779 3.9804431

第二部分：Euclidean距离计算

方法一：通过scipy包的pdist和square函数

代码

#Section 2: Perform hierarchical clustering on a distance matrix

#Section 2.1: Via pdist and squareform methods

from scipy.spatial.distance import pdist,squareform

row_dist=pd.DataFrame(squareform(pdist(df,metric='euclidean')),

columns=labels,index=labels)

print("\nData Distance via pdist and squareform: \n",row_dist)1

结果

Data Distance via pdist and squareform:

ID_1 ID_2 ID_3 ID_4 ID_5

ID_1 0.000000 4.973534 5.516653 5.899885 3.835396

ID_2 4.973534 0.000000 4.347073 5.104311 6.698233

ID_3 5.516653 4.347073 0.000000 7.244262 8.316594

ID_4 5.899885 5.104311 7.244262 0.000000 4.382864

ID_5 3.835396 6.698233 8.316594 4.382864 0.0000001

方法二：通过linkage函数

代码

#Section 2.2: Via linkage method

from scipy.cluster.hierarchy import linkage

row_cluster=linkage(df.values,method='complete',metric='euclidean')

row_dist_linkage=pd.DataFrame(row_cluster,

columns=['Row Label 1','Row Label 2','Distance','Item Number in Cluster'],

index=['Cluster %d' % (i+1) for i in range(row_cluster.shape[0])])

print("\nData Distance via Linkage: \n",row_dist_linkage)1

结果

Data Distance via Linkage:

Row Label 1 Row Label 2 Distance Item Number in Cluster

Cluster 1 0.0 4.0 3.835396 2.0

Cluster 2 1.0 2.0 4.347073 2.0

Cluster 3 3.0 5.0 5.899885 3.0

Cluster 4 6.0 7.0 8.316594 5.01

参考文献

Sebastian Raschka, Vahid Mirjalili. Python机器学习第二版. 南京：东南大学出版社，2018.

weixin_39756540

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python层次聚类算法_机器学习-层级聚类算法(Hierarchy Cluster)

Section I: Brief Introduction on Hierarchy ClusterThe two standard algorithms for agglomerative hierarchical clustering are single linkage and complete linkage. Using single linkage, the distances bet...
复制链接

扫一扫