Cluster Analysis

clustering algorithms based on distance threshold:

Nearest Neighbor Algorithm: classify the samples {X1,X2,…XN} to clusters based on threshold and centered on Z1, Z2, …

details:
1). select an initial cluster center Z1 = X1;
2). calculate the Euclid distance D21 between X2 to Z1, named D21 = ||X2-Z1||; if D21 >T, define a new clustering center Z2 = X2; else, X2 belongs to the old cluster centered on Z1;
3). calculate all distance between another sample and the existing clustering center; classify the sample into the nearest clustering center if the distance is less than T; otherwise, if all distances are greater than T, create a new clustering center.
4).classify all the sampling samples.

The performance of this algorithm depends on the initial clustering center.

Max-Min-distance: select clustering centers depending on the distance threshold T (select the maximum among the minimum distances and compare it with T to decide whether a new clustering center is generated), and classify samples to its cluster according to the nearest principle.

details:
1)select a clustering center Z1;
2)calculate all the distances between Z1 and the rest samples; select the farest sample as the second clustering center Z2;
3)calculate all the distances between all the samples and all the clustering samples, and the nearest distance between the samples and one clustering center. And you will get N nearest distances if there are N samples.
4)if the maximum among the nearest distances is great enough (T = theta*||Z1-Z2||, where theta is between 0 and 1), then the sample would be a new clustering center, and return to last step; else, all the clustering centers has been generated.

Hierarchical clustering algorithm:

regard every sample as a clustering center, and then combine them according to their distances and the distance threshold T.

details:
1). regard every sample as a clustering center, and calculate the distances between all the clustering centers. Then a G(0) matrix (N*N) is obtained if N samples in total.
2). combine the two group whose distance is the nearest among all the distances G(n), and the new clustering center generates. Calculate the G(n+1), G(n+2),…
3). calculate the distances between new clusters and combine. End until the minimum distance is greater than the threshold T or all the samples are in one cluster and generate the clustering tree.

Dynamic Clustering algorithm:

K-means algorithm: select the clustering centers to minimize the cost function (for example, the square error sum function). (New clustering centers equal to the mean of every group, the number of clustering centers is defined as K.)

details:
1). select initial clustering centers and calculate the distances between the clustering centers and the samples. Classify the samples according to the nearest principle.
2). calculate the new clustering centers which equal to the mean of every cluster.
3). repeat step 1 and step 2 until the clustering centers stay same.

ISODATA: it is similar to K-means algorithm, but has both combination and division.

details:
1). select initial values, such as initial clustering centers;
2). classify according to the nearest principle;
3).calculate the distances and obtain the new clustering centers after combining or dividing.
4). determine the result until it appeals to your requirements.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值