K-means均值聚类是先行选择若干个样本点作为聚类中心,再按某种聚类准则(通常采用最小距离原则)使各样本点向各个中心聚集,从而得到初始分类,然后,判断初始分类是否合理,如果不合理,就修改分类,......,以此反复进行修改聚类的迭代运算,直到合理为止。算法主要参照了蔡元龙老师的《模式识别》一书,matlab程序编写:
%Programed by Lu Qi,University of Chinese Academy of Sciences
%my email:qqlu1992@gmail.com
clear all
clc
train_x=[0 3 1 2 0;
1 3 0 1 0;
3 3 0 0 1;
1 1 0 2 0;
3 2 1 2 1;
4 1 1 1 0];
[pars.num_train,pars.length]=size(train_x);%num_train代表样本的数量,length代表样本的维数
pars.num_cluster=2;
initial=randperm(6);
pars.iter=1;
initial_point=initial(1:pars.num_cluster);
pars.z{pars.iter}=train_x(initial_point,:);
while 1
for i=1:pars.num_train
for j=1:pars.num_cluster
d(i,j)=norm(train_x(i,:)-pars.z{pars.iter}(j,:));
end
end
[min_d,index]=min(d,[],2);
%求相同聚类样本的标号
for i=1:pars.num_train
for j=1:pars.num_cluster
if (index(i))==j
g(j,i)=[;i];
end
end
end
pars.iter=pars.iter+1;
%找每一类中不等于0的数
for j=1:pars.num_cluster
temp=find(g(j,:));
pars.z{pars.iter}(j,:)=sum(train_x(temp,:))/(length(temp));
end
if pars.z{pars.iter}==pars.z{pars.iter-1}
break;
end
end
for j=1:pars.num_cluster
fprintf('the %d th cluster centre is\n',j);
fprintf('%7.4f',pars.z{end}(j,:))
fprintf('\n');
end