我写的几个k均值聚类算法:
第1个:通过迭代次数来决定算法终止
% K-Means Clustering
% idx: represent the indication vector
% center: stand for the clustering center
% data: the primal data matrix, which is a n by dim matrix
% k: the number of the clustering
% maxIter: the maximum number of iterative
% By Rong-Hua Li
function [idx , center] = Kmeans (data, center, k, maxIter)
[n, dim] = size (data);
% initial the indication vector, which the i-th element stands for the
% category corresponding the i-th row of the data
idx = zeros (n, 1);
% if the clustering center is empty , then select randomly k rows of the data matrix
% to initial the clustering center
if sum (size (center)) == 0
prek = randperm (n);
center = data (sort (prek (1:k)),:);
end
temp = zeros (k, 1);
for iter = 1 : maxIter
% computing the minmum (Euclidean distance, 2-norm) neighbor
for i = 1 : n
for j = 1 : k
temp(j) = norm (data (i, :) - center (j, :));
end
minid = find (temp == min (temp));
idx(i) = minid (1);
end
% update the center
K = 0;
for i = 1 : k
minid = find (idx == i);
len = length (minid);
K = K + 1;
if (len == 1)
center (K, :) = data (minid, :);
else
center (K, :) = mean (data (minid, :));
end
end
end
第2个:通过2次迭代的误差来终止
% K-Means2 Clustering
% idx: represents the indication vector
% center: stands for the clustering center
% iters: the number of the iterative
% data: the primal data matrix, which is a n by dim matrix
% k: the number of the clustering
% epso: the error
% By Rong-Hua Li
function [idx , center, iters] = Kmeans2 (data, center, k, epso)
[n, dim] = size (data);
% initial the indication vector, which the i-th element stands for the
% category corresponding the i-th row of the data
idx = zeros (n, 1);
% if the clustering center is empty , then select randomly k rows of the data matrix
% to initial the clustering center
if sum (size (center)) == 0
prek = randperm (n);
center = data (sort (prek (1:k)),:);
end
temp = zeros (k, 1);
iters = 0;
while 1 == 1
iters = iters + 1;
% computing the minmum (Euclidean distance, 2-norm) neighbor
for i = 1 : n
for j = 1 : k
temp(j) = norm (data (i, :) - center (j, :));
end
minid = find (temp == min (temp));
idx(i) = minid (1);
end
% update the center
K = 0;
for i = 1 : k
minid = find (idx == i);
len = length (minid);
K = K + 1;
if (len == 1)
center2 (K, :) = data (minid, :);
else
center2 (K, :) = mean (data (minid, :));
end
end
% dis = norm (center2 - center, inf);
dis = norm (center2 - center);
if (dis <= epso)
break;
end
center = center2;
end