Cluster Analysis

最新推荐文章于 2023-08-23 00:05:17 发布

塔希提

最新推荐文章于 2023-08-23 00:05:17 发布

阅读量593

点赞数

分类专栏：人工智能文章标签：聚类分析

人工智能专栏收录该内容

3 篇文章 0 订阅

订阅专栏

clustering algorithms based on distance threshold:

Nearest Neighbor Algorithm: classify the samples {X1,X2,…XN} to clusters based on threshold and centered on Z1, Z2, …

details:
1). select an initial cluster center Z1 = X1;
2). calculate the Euclid distance D21 between X2 to Z1, named D21 = ||X2-Z1||; if D21 >T, define a new clustering center Z2 = X2; else, X2 belongs to the old cluster centered on Z1;
3). calculate all distance between another sample and the existing clustering center; classify the sample into the nearest clustering center if the distance is less than T; otherwise, if all distances are greater than T, create a new clustering center.
4).classify all the sampling samples.

The performance of this algorithm depends on the initial clustering center.

Max-Min-distance: select clustering centers depending on the distance threshold T (select the maximum among the minimum distances and compare it with T to decide whether a new clustering center is generated), and classify samples to its cluster according to the nearest principle.

details:
1)select a clustering center Z1;
2)calculate all the distances between Z1 and the rest samples; select the farest sample as the second clustering center Z2;
3)calculate all the distances between all the samples and all the clustering samples, and the nearest distance between the samples and one clustering center. And you will get N nearest distances if there are N samples.
4)if the maximum among the nearest distances is great enough (T = theta*||Z1-Z2||, where theta is between 0 and 1), then the sample would be a new clustering center, and return to last step; else, all the clustering centers has been generated.

Hierarchical clustering algorithm:

regard every sample as a clustering center, and then combine them according to their distances and the distance threshold T.

details:
1). regard every sample as a clustering center, and calculate the distances between all the clustering centers. Then a G(0) matrix (N*N) is obtained if N samples in total.
2). combine the two group whose distance is the nearest among all the distances G(n), and the new clustering center generates. Calculate the G(n+1), G(n+2),…
3). calculate the distances between new clusters and combine. End until the minimum distance is greater than the threshold T or all the samples are in one cluster and generate the clustering tree.

Dynamic Clustering algorithm:

K-means algorithm: select the clustering centers to minimize the cost function (for example, the square error sum function). (New clustering centers equal to the mean of every group, the number of clustering centers is defined as K.)

details:
1). select initial clustering centers and calculate the distances between the clustering centers and the samples. Classify the samples according to the nearest principle.
2). calculate the new clustering centers which equal to the mean of every cluster.
3). repeat step 1 and step 2 until the clustering centers stay same.

ISODATA: it is similar to K-means algorithm, but has both combination and division.

details:
1). select initial values, such as initial clustering centers;
2). classify according to the nearest principle;
3).calculate the distances and obtain the new clustering centers after combining or dividing.
4). determine the result until it appeals to your requirements.