K均值聚类法分为如下几个步骤:
一、初始化聚类中心
1、根据具体问题,凭经验从样本集中选出C个比较合适的样本作为初始聚类中心。
2、用前C个样本作为初始聚类中心。
3、将全部样本随机地分成C类,计算每类的样本均值,将样本均值作为初始聚类中心。
二、初始聚类
1、按就近原则将样本归入各聚类中心所代表的类中。
2、取一样本,将其归入与其最近的聚类中心的那一类中,重新计算样本均值,更新聚类中心。然后取下一样本,重复操作,直至所有样本归入相应类中。
三、判断聚类是否合理
采用误差平方和准则函数判断聚类是否合理,不合理则修改分类。循环进行判断、修改直至达到算法终止条件。
clc
clear
tic
RGB= imread ('test5.jpg'); %读入像
img=rgb2gray(RGB);
[m,n]=size(img);
subplot(2,2,1),imshow(img);title(' 图一 原图像')
subplot(2,2,2),imhist(img);title(' 图二 原图像的灰度直方图')
hold off;
img=double(img);
for i=1:200
c1(1)=25;
c2(1)=125;
c3(1)=200;%选择三个初始聚类中心
r=abs(img-c1(i));
g=abs(img-c2(i));
b=abs(img-c3(i));%计算各像素灰度与聚类中心的距离
r_g=r-g;
g_b=g-b;
r_b=r-b; %此处以最近距离为判断准则
n_r=find(r_g<=0&r_b<=0);%寻找最小的聚类中心
n_g=find(r_g>0&g_b<=0);%寻找中间的一个聚类中心
n_b=find(g_b>0&r_b>0);%寻找最大的聚类中心
i=i+1;
c1(i)=sum(img(n_r))/length(n_r);%将所有低灰度求和取平均,作为下一个低灰度中心
c2(i)=sum(img(n_g))/length(n_g);%将所有低灰度求和取平均,作为下一个中间灰度中心
c3(i)=sum(img(n_b))/length(n_b);%将所有低灰度求和取平均,作为下一个高灰度中心
d1(i)=abs(c1(i)-c1(i-1));
d2(i)=abs(c2(i)-c2(i-1));
d3(i)=abs(c3(i)-c3(i-1));
if d1(i)<=0.001&&d2(i)<=0.001&&d3(i)<=0.001
R=c1(i);
G=c2(i);
B=c3(i);
k=i;
break;
end
end
R
G
B
img=uint8(img);
img(find(img<R))=0;
img(find(img>R&img<G))=128;
img(find(img>G))=255;
toc
subplot(2,2,3),imshow(img);title(' 图三 聚类后的图像')
subplot(2,2,4),imhist(img);title(' 图四 聚类后的图像直方图')
感觉matlab这个软件好使,能处理各个方面的算法仿真。开始做MP2音频压缩在DSP上实现这个项目拉,自己要加油!迎接挑战!
clear
tic
RGB= imread ('test5.jpg'); %读入像
img=rgb2gray(RGB);
[m,n]=size(img);
subplot(2,2,1),imshow(img);title(' 图一 原图像')
subplot(2,2,2),imhist(img);title(' 图二 原图像的灰度直方图')
hold off;
img=double(img);
for i=1:200
c1(1)=25;
c2(1)=125;
c3(1)=200;%选择三个初始聚类中心
r=abs(img-c1(i));
g=abs(img-c2(i));
b=abs(img-c3(i));%计算各像素灰度与聚类中心的距离
r_g=r-g;
g_b=g-b;
r_b=r-b; %此处以最近距离为判断准则
n_r=find(r_g<=0&r_b<=0);%寻找最小的聚类中心
n_g=find(r_g>0&g_b<=0);%寻找中间的一个聚类中心
n_b=find(g_b>0&r_b>0);%寻找最大的聚类中心
i=i+1;
c1(i)=sum(img(n_r))/length(n_r);%将所有低灰度求和取平均,作为下一个低灰度中心
c2(i)=sum(img(n_g))/length(n_g);%将所有低灰度求和取平均,作为下一个中间灰度中心
c3(i)=sum(img(n_b))/length(n_b);%将所有低灰度求和取平均,作为下一个高灰度中心
d1(i)=abs(c1(i)-c1(i-1));
d2(i)=abs(c2(i)-c2(i-1));
d3(i)=abs(c3(i)-c3(i-1));
if d1(i)<=0.001&&d2(i)<=0.001&&d3(i)<=0.001
R=c1(i);
G=c2(i);
B=c3(i);
k=i;
break;
end
end
R
G
B
img=uint8(img);
img(find(img<R))=0;
img(find(img>R&img<G))=128;
img(find(img>G))=255;
toc
subplot(2,2,3),imshow(img);title(' 图三 聚类后的图像')
subplot(2,2,4),imhist(img);title(' 图四 聚类后的图像直方图')
感觉matlab这个软件好使,能处理各个方面的算法仿真。开始做MP2音频压缩在DSP上实现这个项目拉,自己要加油!迎接挑战!
K-Means聚类算法 Matlab代码
function y=kMeansCluster(m,k,isRand)
%%%%%%%%%%%%%%%%
%
% kMeansCluster - Simple k means clustering algorithm
% Author: Kardi Teknomo, Ph.D.
%
% Purpose: classify the objects in data matrix based on the attributes
% Criteria: minimize Euclidean distance between centroids and object points
% For more explanation of the algorithm, see http://people.revoledu.com/kardi/tutorial/kMean/index.html
% Output: matrix data plus an additional column represent the group of each object
%
% Example: m = [ 1 1; 2 1; 4 3; 5 4] or in a nice form
% m = [ 1 1;
% 2 1;
% 4 3;
% 5 4]
% k = 2
% kMeansCluster(m,k) produces m = [ 1 1 1;
% 2 1 1;
% 4 3 2;
% 5 4 2]
% Input:
% m - required, matrix data: objects in rows and attributes in columns
% k - optional, number of groups (default = 1)
% isRand - optional, if using random initialization isRand=1, otherwise input any number (default)
% it will assign the first k data as initial centroids
%
% Local Variables
% f - row number of data that belong to group i
% c - centroid coordinate size (1:k, 1:maxCol)
% g - current iteration group matrix size (1:maxRow)
% i - scalar iterator
% maxCol - scalar number of rows in the data matrix m = number of attributes
% maxRow - scalar number of columns in the data matrix m = number of objects
% temp - previous iteration group matrix size (1:maxRow)
% z - minimum value (not needed)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if nargin<3, isRand=0; end
if nargin<2, k=1; end
[maxRow, maxCol]=size(m)
if maxRow<=k,
y=[m, 1:maxRow]
else
% initial value of centroid
if isRand,
p = randperm(size(m,1)); % random initialization
for i=1:k
c(i,:)=m(p(i),:)
end
else
for i=1:k
c(i,:)=m(i,:) % sequential initialization
end
end
temp=zeros(maxRow,1); % initialize as zero vector
while 1,
d=DistMatrix(m,c); % calculate objcets-centroid distances
[z,g]=min(d,[],2); % find group matrix g
if g==temp,
break; % stop the iteration
else
temp=g; % copy group matrix to temporary variable
end
for i=1:k
f=find(g==i);
if f % only compute centroid if f is not empty
c(i,:)=mean(m(find(g==i),:),1);
end
end
end
y=[m,g];
end
The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. It works for multi-dimensional Euclidean distance. Learn about other type of distance here.
function d=DistMatrix(A,B) %%%%%%%%%%%%%%%%%%%%%%%%% % DISTMATRIX return distance matrix between points in A=[x1 y1 ... w1] and in B=[x2 y2 ... w2] % Copyright (c) 2005 by Kardi Teknomo, http://people.revoledu.com/kardi/ % % Numbers of rows (represent points) in A and B are not necessarily the same. % It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway), % % A and B must contain the same number of columns (represent variables of n dimensions), % first column is the X coordinates, second column is the Y coordinates, and so on. % The distance matrix is distance between points in A as rows % and points in B as columns. % example: Spacing= dist(A,A) % Headway = dist(A,B), with hA ~= hB or hA=hB % A=[1 2 3; 4 5 6; 2 4 6; 1 2 3]; B=[4 5 1; 6 2 0]
% dist(A,B)= [ 4.69 5.83;
% 5.00 7.00;
% 5.48 7.48;
% 4.69 5.83]
%
% dist(B,A)= [ 4.69 5.00 5.48 4.69;
% 5.83 7.00 7.48 5.83]
%%%%%%%%%%%%%%%%%%%%%%%%%%% [hA,wA]=size(A);
[hB,wB]=size(B);
if wA ~= wB, error(' second dimension of A and B must be the same'); end
for k=1:wA
C{k}= repmat(A(:,k),1,hB);
D{k}= repmat(B(:,k),1,hA);
end
S=zeros(hA,hB);
for k=1:wA
S=S+(C{k}-D{k}').^2;
end
d=sqrt(S);转自 http://hi.baidu.com/teemix2008/blog/item/4ffd213e3714f8eb55e7237b.html
对上述算法的说明:
补充的:上述的最近距离可以改为:d1 = (c1()+c2())/2d2 = (c2()+c3())/2其中d1,d2为这3个类中心的分割线,这样比较容易理解