Hierarchical Clustering: Agglomerative and Divisive

最新推荐文章于 2024-10-03 13:36:06 发布

EverNoob

最新推荐文章于 2024-10-03 13:36:06 发布

阅读量160

点赞数

分类专栏： Machine_Learning Math&Stat Algorithm 文章标签：算法贪心算法

原文链接：https://towardsdatascience.com/hierarchical-clustering-agglomerative-and-divisive-explained-342e6b20d710

版权

Machine_Learning 同时被 3 个专栏收录

54 篇文章 1 订阅

订阅专栏

Algorithm

43 篇文章 0 订阅

订阅专栏

Math&Stat

14 篇文章 0 订阅

订阅专栏

https://towardsdatascience.com/hierarchical-clustering-agglomerative-and-divisive-explained-342e6b20d710

Hierarchical clustering is a method of cluster analysis that is used to cluster similar data points together. Hierarchical clustering follows either the top-down or bottom-up method of clustering.

What is Clustering?

Clustering is an unsupervised machine learning technique that divides the population into several clusters such that data points in the same cluster are more similar and data points in different clusters are dissimilar.

Points in the same cluster are closer to each other.
Points in the different clusters are far apart.

(Image by Author), Sample 2-dimension Dataset

In the above sample 2-dimension dataset, it is visible that the dataset forms 3 clusters that are far apart, and points in the same cluster are close to each other.

There are several types of clustering algorithms other than Hierarchical clusterings, such as k-Means clustering, DBSCAN, and many more. Read the below article to understand what is k-means clustering and how to implement it.

Understanding K-means, K-means++ and, K-medoids Clustering Algorithms

In this article, you can understand hierarchical clustering, its types.

There are two types of hierarchical clustering methods:

Divisive Clustering
Agglomerative Clustering

Divisive Clustering:

The divisive clustering algorithm is a top-down clustering approach, initially, all the points in the dataset belong to one cluster and split is performed recursively as one moves down the hierarchy.

Steps of Divisive Clustering:

Initially, all points in the dataset belong to one single cluster.
Partition the cluster into two least similar cluster
Proceed recursively to form new clusters until the desired number of clusters is obtained.

(Image by Author), 1st Image: All the data points belong to one cluster, 2nd Image: 1 cluster is separated from the previous single cluster, 3rd Image: Further 1 cluster is separated from the previous set of clusters.

In the above sample dataset, it is observed that there is 3 cluster that is far separated from each other. So we stopped after getting 3 clusters.

Even if start separating further more clusters, below is the obtained result.

(Image by Author), Sample dataset separated into 4 clusters

How to choose which cluster to split?

Check the sum of squared errors of each cluster and choose the one with the largest value. In the below 2-dimension dataset, currently, the data points are separated into 2 clusters, for further separating it to form the 3rd cluster find the sum of squared errors (SSE) for each of the points in a red cluster and blue cluster.

(Image by Author), Sample dataset separated into 2clusters

The cluster with the largest SSE value is separated into 2 clusters, hence forming a new cluster. In the above image, it is observed red cluster has larger SSE so it is separated into 2 clusters forming 3 total clusters.

How to split the above-chosen cluster?

Once we have decided to split which cluster, then the question arises on how to split the chosen cluster into 2 clusters. One way is to use Ward’s criterion to chase for the largest reduction in the difference in the SSE criterion as a result of the split.

How to handle the noise or outlier?

Due to the presence of outlier or noise, can result to form a new cluster of its own. To handle the noise in the dataset using a threshold to determine the termination criterion that means do not generate clusters that are too small.

Agglomerative Clustering:

Agglomerative Clustering is a bottom-up approach, initially, each data point is a cluster of its own, further pairs of clusters are merged as one moves up the hierarchy.

Steps of Agglomerative Clustering:

Initially, all the data-points are a cluster of its own.
Take two nearest clusters and join them to form one single cluster.
Proceed recursively step 2 until you obtain the desired number of clusters.

(Image by Author), 1st Image: All the data point is a cluster of its own, 2nd Image: Two nearest clusters (surrounded by a black oval) joins together to form a single cluster.

In the above sample dataset, it is observed that 2 clusters are far separated from each other. So we stopped after getting 2 clusters.

(Image by Author), Sample dataset separated into 2 clusters

How to join two clusters to form one cluster?

To obtain the desired number of clusters, the number of clusters needs to be reduced from initially being n cluster (n equals the total number of data-points). Two clusters are combined by computing the similarity between them.

There are some methods which are used to calculate the similarity between two clusters:

Distance between two closest points in two clusters.
Distance between two farthest points in two clusters.
The average distance between all points in the two clusters.
Distance between centroids of two clusters.

There are several pros and cons of choosing any of the above similarity metrics.

Implementation:

ML | Hierarchical clustering (Agglomerative and Divisive clustering) - GeeksforGeeks

Performance

Hierarchical Agglomerative vs Divisive clustering –

Divisive clustering is more complex as compared to agglomerative clustering, as in the case of divisive clustering we need a flat clustering method as “subroutine” to split each cluster until we have each data having its own singleton cluster.
Divisive clustering is more efficient if we do not generate a complete hierarchy all the way down to individual data leaves. The time complexity of a naive agglomerative clustering is O(n3) because we exhaustively scan the N x N matrix dist_mat for the lowest distance in each of N-1 iterations. Using priority queue data structure we can reduce this complexity to O(n2logn). By using some more optimizations it can be brought down to O(n2). Whereas for divisive clustering given a fixed number of top levels, using an efficient flat algorithm like K-Means, divisive algorithms are linear in the number of patterns and clusters.
A divisive algorithm is also more accurate. Agglomerative clustering makes decisions by considering the local patterns or neighbor points without initially taking into account the global distribution of data. These early decisions cannot be undone. whereas divisive clustering takes into consideration the global distribution of data when making top-level partitioning decisions.

Conclusion:

In this article, we have discussed the in-depth intuition of agglomerative and divisive hierarchical clustering algorithms. There are some disadvantages of hierarchical algorithms that these algorithms are not suitable for large datasets because of large space and time complexities.