K均值聚类算法

K均值聚类法分为如下几个步骤:

一、初始化聚类中心

1、根据具体问题,凭经验从样本集中选出C个比较合适的样本作为初始聚类中心。

2、用前C个样本作为初始聚类中心。

3、将全部样本随机地分成C类,计算每类的样本均值,将样本均值作为初始聚类中心。

二、初始聚类

1、按就近原则将样本归入各聚类中心所代表的类中。

2、取一样本,将其归入与其最近的聚类中心的那一类中,重新计算样本均值,更新聚类中心。然后取下一样本,重复操作,直至所有样本归入相应类中。

三、判断聚类是否合理

采用误差平方和准则函数判断聚类是否合理,不合理则修改分类。循环进行判断、修改直至达到算法终止条件。

clc
clear
tic
RGB= imread ('test5.jpg'); %读入像
img=rgb2gray(RGB);
[m,n]=size(img);
subplot(2,2,1),imshow(img);title(' 图一 原图像')
subplot(2,2,2),imhist(img);title(' 图二 原图像的灰度直方图')
hold off;
img=double(img);
for i=1:200
    c1(1)=25;
    c2(1)=125;
    c3(1)=200;%选择三个初始聚类中心
    r=abs(img-c1(i));
    g=abs(img-c2(i));
    b=abs(img-c3(i));%计算各像素灰度与聚类中心的距离
    r_g=r-g;
    g_b=g-b;
    r_b=r-b;   %此处以最近距离为判断准则
    n_r=find(r_g<=0&r_b<=0);%寻找最小的聚类中心
    n_g=find(r_g>0&g_b<=0);%寻找中间的一个聚类中心
    n_b=find(g_b>0&r_b>0);%寻找最大的聚类中心
    i=i+1;
    c1(i)=sum(img(n_r))/length(n_r);%将所有低灰度求和取平均,作为下一个低灰度中心
    c2(i)=sum(img(n_g))/length(n_g);%将所有低灰度求和取平均,作为下一个中间灰度中心
    c3(i)=sum(img(n_b))/length(n_b);%将所有低灰度求和取平均,作为下一个高灰度中心
    d1(i)=abs(c1(i)-c1(i-1));
    d2(i)=abs(c2(i)-c2(i-1));
    d3(i)=abs(c3(i)-c3(i-1));
    if d1(i)<=0.001&&d2(i)<=0.001&&d3(i)<=0.001
        R=c1(i);
        G=c2(i);
        B=c3(i);
        k=i;
        break;
    end
end
R
G
B
img=uint8(img);
img(find(img<R))=0;
img(find(img>R&img<G))=128;
img(find(img>G))=255;
toc
subplot(2,2,3),imshow(img);title(' 图三 聚类后的图像')
subplot(2,2,4),imhist(img);title(' 图四 聚类后的图像直方图')
     感觉matlab这个软件好使,能处理各个方面的算法仿真。开始做MP2音频压缩在DSP上实现这个项目拉,自己要加油!迎接挑战!
K-Means聚类算法 Matlab代码
function y=kMeansCluster(m,k,isRand)
%%%%%%%%%%%%%%%%
%
% kMeansCluster - Simple k means clustering algorithm
% Author: Kardi Teknomo, Ph.D.
%
% Purpose: classify the objects in data matrix based on the attributes
% Criteria: minimize Euclidean distance between centroids and object points
% For more explanation of the algorithm, see http://people.revoledu.com/kardi/tutorial/kMean/index.html % Output: matrix data plus an additional column represent the group of each object
%
% Example: m = [ 1 1; 2 1; 4 3; 5 4] or in a nice form
% m = [ 1 1;
% 2 1;
% 4 3;
% 5 4]
% k = 2
% kMeansCluster(m,k) produces m = [ 1 1 1;
% 2 1 1;
% 4 3 2;
% 5 4 2]
% Input:
% m - required, matrix data: objects in rows and attributes in columns
% k - optional, number of groups (default = 1)
% isRand - optional, if using random initialization isRand=1, otherwise input any number (default)
% it will assign the first k data as initial centroids
%
% Local Variables
% f - row number of data that belong to group i
% c - centroid coordinate size (1:k, 1:maxCol)
% g - current iteration group matrix size (1:maxRow)
% i - scalar iterator
% maxCol - scalar number of rows in the data matrix m = number of attributes
% maxRow - scalar number of columns in the data matrix m = number of objects
% temp - previous iteration group matrix size (1:maxRow)
% z - minimum value (not needed)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if nargin<3, isRand=0; end
if nargin<2, k=1; end

[maxRow, maxCol]=size(m)
if maxRow<=k,
y=[m, 1:maxRow]
else

% initial value of centroid
if isRand,
p = randperm(size(m,1)); % random initialization
for i=1:k
c(i,:)=m(p(i),:)
end
else
for i=1:k
c(i,:)=m(i,:) % sequential initialization
end
end

temp=zeros(maxRow,1); % initialize as zero vector

while 1,
d=DistMatrix(m,c); % calculate objcets-centroid distances
[z,g]=min(d,[],2); % find group matrix g
if g==temp,
break; % stop the iteration
else
temp=g; % copy group matrix to temporary variable
end
for i=1:k
f=find(g==i);
if f % only compute centroid if f is not empty
c(i,:)=mean(m(find(g==i),:),1);
end
end
end

y=[m,g];

end

The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. It works for multi-dimensional Euclidean distance. Learn about other type of distance here.

function d=DistMatrix(A,B)
             %%%%%%%%%%%%%%%%%%%%%%%%%
             % DISTMATRIX return distance matrix between points in A=[x1 y1 ... w1] and in B=[x2 y2 ... w2]
             % Copyright (c) 2005 by Kardi Teknomo,  http://people.revoledu.com/kardi/
             %
             % Numbers of rows (represent points) in A and B are not necessarily the same.
             % It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway),
             %
             % A and B must contain the same number of columns (represent variables of n dimensions),
             % first column is the X coordinates, second column is the Y coordinates, and so on.
             % The distance matrix is distance between points in A as rows
             % and points in B as columns.
             % example: Spacing= dist(A,A)
             % Headway = dist(A,B), with hA ~= hB or hA=hB
             %          A=[1 2 3; 4 5 6; 2 4 6; 1 2 3]; B=[4 5 1; 6 2 0]
% dist(A,B)= [ 4.69 5.83;
% 5.00 7.00;
% 5.48 7.48;
% 4.69 5.83]
%
% dist(B,A)= [ 4.69 5.00 5.48 4.69;
% 5.83 7.00 7.48 5.83]
%%%%%%%%%%%%%%%%%%%%%%%%%%% [hA,wA]=size(A);
[hB,wB]=size(B);
if wA ~= wB, error(' second dimension of A and B must be the same'); end
for k=1:wA
C{k}= repmat(A(:,k),1,hB);
D{k}= repmat(B(:,k),1,hA);
end
S=zeros(hA,hB);
for k=1:wA
S=S+(C{k}-D{k}').^2;
end
d=sqrt(S);
 
转自 http://hi.baidu.com/teemix2008/blog/item/4ffd213e3714f8eb55e7237b.html
 
     
对上述算法的说明:

       
       
补充的:
上述的最近距离可以改为:
d1 = (c1()+c2())/2
d2 = (c2()+c3())/2
其中d1,d2为这3个类中心的分割线,这样比较容易理解
 
 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值