K均值聚类算法

最新推荐文章于 2025-02-19 14:23:55 发布

sjzcandy

最新推荐文章于 2025-02-19 14:23:55 发布

阅读量500

点赞数

分类专栏：数字图像处理文章标签：算法 matrix distance initialization attributes matlab

数字图像处理专栏收录该内容

15 篇文章 0 订阅

订阅专栏

K均值聚类法分为如下几个步骤：

一、初始化聚类中心

1、根据具体问题，凭经验从样本集中选出C个比较合适的样本作为初始聚类中心。

2、用前C个样本作为初始聚类中心。

3、将全部样本随机地分成C类，计算每类的样本均值，将样本均值作为初始聚类中心。

二、初始聚类

1、按就近原则将样本归入各聚类中心所代表的类中。

2、取一样本，将其归入与其最近的聚类中心的那一类中，重新计算样本均值，更新聚类中心。然后取下一样本，重复操作，直至所有样本归入相应类中。

三、判断聚类是否合理

采用误差平方和准则函数判断聚类是否合理，不合理则修改分类。循环进行判断、修改直至达到算法终止条件。

clc
clear
tic
RGB= imread ('test5.jpg'); %读入像
img=rgb2gray(RGB);
[m,n]=size(img);
subplot(2,2,1),imshow(img);title(' 图一原图像')
subplot(2,2,2),imhist(img);title(' 图二原图像的灰度直方图')
hold off;
img=double(img);
for i=1:200
    c1(1)=25;
    c2(1)=125;
    c3(1)=200;%选择三个初始聚类中心
    r=abs(img-c1(i));
    g=abs(img-c2(i));
    b=abs(img-c3(i));%计算各像素灰度与聚类中心的距离
    r_g=r-g;
    g_b=g-b;
    r_b=r-b;   %此处以最近距离为判断准则
    n_r=find(r_g<=0&r_b<=0);%寻找最小的聚类中心
    n_g=find(r_g>0&g_b<=0);%寻找中间的一个聚类中心
    n_b=find(g_b>0&r_b>0);%寻找最大的聚类中心
    i=i+1;
    c1(i)=sum(img(n_r))/length(n_r);%将所有低灰度求和取平均，作为下一个低灰度中心
    c2(i)=sum(img(n_g))/length(n_g);%将所有低灰度求和取平均，作为下一个中间灰度中心
    c3(i)=sum(img(n_b))/length(n_b);%将所有低灰度求和取平均，作为下一个高灰度中心
    d1(i)=abs(c1(i)-c1(i-1));
    d2(i)=abs(c2(i)-c2(i-1));
    d3(i)=abs(c3(i)-c3(i-1));
    if d1(i)<=0.001&&d2(i)<=0.001&&d3(i)<=0.001
        R=c1(i);
        G=c2(i);
        B=c3(i);
        k=i;
        break;
    end
end
R
G
B
img=uint8(img);
img(find(img<R))=0;
img(find(img>R&img<G))=128;
img(find(img>G))=255;
toc
subplot(2,2,3),imshow(img);title(' 图三聚类后的图像')
subplot(2,2,4),imhist(img);title(' 图四聚类后的图像直方图')
     感觉matlab这个软件好使，能处理各个方面的算法仿真。开始做MP2音频压缩在DSP上实现这个项目拉，自己要加油！迎接挑战！

from http://blog.21ic.com/user1/1369/archives/2007/37594.

K-Means聚类算法 Matlab代码

function y=kMeansCluster(m,k,isRand)
%%%%%%%%%%%%%%%%
%                                                        
% kMeansCluster - Simple k means clustering algorithm                                                              
% Author: Kardi Teknomo, Ph.D.                                                                  
%                                                                                                                    
% Purpose: classify the objects in data matrix based on the attributes    
% Criteria: minimize Euclidean distance between centroids and object points                    
% For more explanation of the algorithm, see http://people.revoledu.com/kardi/tutorial/kMean/index.html    
% Output: matrix data plus an additional column represent the group of each object               
%                                                                                                                
% Example: m = [ 1 1; 2 1; 4 3; 5 4]  or in a nice form                         
%          m = [ 1 1;                                                                                     
%                2 1;                                                                                         
%                4 3;                                                                                         
%                5 4]                                                                                         
%          k = 2                                                                                             
% kMeansCluster(m,k) produces m = [ 1 1 1;                                        
%                                   2 1 1;                                                                   
%                                   4 3 2;                                                                   
%                                   5 4 2]                                                                   
% Input:
%   m      - required, matrix data: objects in rows and attributes in columns                                                 
%   k      - optional, number of groups (default = 1)
%   isRand - optional, if using random initialization isRand=1, otherwise input any number (default)
%            it will assign the first k data as initial centroids
%
% Local Variables
%   f      - row number of data that belong to group i
%   c      - centroid coordinate size (1:k, 1:maxCol)
%   g      - current iteration group matrix size (1:maxRow)
%   i      - scalar iterator 
%   maxCol - scalar number of rows in the data matrix m = number of attributes
%   maxRow - scalar number of columns in the data matrix m = number of objects
%   temp   - previous iteration group matrix size (1:maxRow)
%   z      - minimum value (not needed)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

if nargin<3,        isRand=0;   end
if nargin<2,        k=1;        end
    
[maxRow, maxCol]=size(m)
if maxRow<=k, 
    y=[m, 1:maxRow]
else
 
 % initial value of centroid
    if isRand,
        p = randperm(size(m,1));      % random initialization
        for i=1:k
            c(i,:)=m(p(i),:)  
     end
    else
        for i=1:k
           c(i,:)=m(i,:)        % sequential initialization
     end
    end
    
 temp=zeros(maxRow,1);   % initialize as zero vector
    
 while 1,
        d=DistMatrix(m,c);  % calculate objcets-centroid distances
        [z,g]=min(d,[],2);  % find group matrix g
        if g==temp,
            break;          % stop the iteration
        else
            temp=g;         % copy group matrix to temporary variable
        end
        for i=1:k
            f=find(g==i);
            if f            % only compute centroid if f is not empty
                c(i,:)=mean(m(find(g==i),:),1);
            end
        end
 end
    
 y=[m,g];
    
end

The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. It works for multi-dimensional Euclidean distance. Learn about other type of distance here.

function d=DistMatrix(A,B)
             %%%%%%%%%%%%%%%%%%%%%%%%%
             % DISTMATRIX return distance matrix between points in A=[x1 y1 ... w1] and in B=[x2 y2 ... w2]
             % Copyright (c) 2005 by Kardi Teknomo,  http://people.revoledu.com/kardi/
             %
             % Numbers of rows (represent points) in A and B are not necessarily the same.
             % It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway),
             %
             % A and B must contain the same number of columns (represent variables of n dimensions),
             % first column is the X coordinates, second column is the Y coordinates, and so on.
             % The distance matrix is distance between points in A as rows
             % and points in B as columns.
             % example: Spacing= dist(A,A)
             % Headway = dist(A,B), with hA ~= hB or hA=hB
             %          A=[1 2 3; 4 5 6; 2 4 6; 1 2 3]; B=[4 5 1; 6 2 0]
             %          dist(A,B)= [ 4.69   5.83;
             %                       5.00   7.00;
             %                       5.48   7.48;
             %                       4.69   5.83]
             %
             %          dist(B,A)= [ 4.69   5.00     5.48    4.69;
             %                       5.83   7.00     7.48    5.83]
             %%%%%%%%%%%%%%%%%%%%%%%%%%%
             [hA,wA]=size(A);
             [hB,wB]=size(B);
             if wA ~= wB,  error(' second dimension of A and B must be the same'); end
             for k=1:wA
                  C{k}= repmat(A(:,k),1,hB);
                  D{k}= repmat(B(:,k),1,hA);
             end
             S=zeros(hA,hB);
             for k=1:wA
                  S=S+(C{k}-D{k}').^2;
             end
             d=sqrt(S);

转自 http://hi.baidu.com/teemix2008/blog/item/4ffd213e3714f8eb55e7237b.html

对上述算法的说明：


       
       
        
        补充的：
       
       
       
       
        
        上述的最近距离可以改为：
       
       
       
       
        
        d1 = (c1()+c2())/2
       
       
       
       
        
        d2 = (c2()+c3())/2
       
       
       
       
        
        其中d1，d2为这3个类中心的分割线，这样比较容易理解