粒子群实现K-means聚类+常规K-means(Matlab源码实现)

本文介绍了K-means聚类算法的基本流程及其在二维数据集上的实现,强调了初始聚类中心选择对算法的影响。接着,通过粒子群优化(PSO)算法实现K-means,展示了PSO的优势在于不需要预设初值且收敛稳定。实验结果显示,两种方法在聚类中心上有不同输出,PSO在全局最优解上表现更优,但计算复杂度相对较高。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

K-means导入

K-means 是我们最常用的基于欧氏距离的聚类算法,其认为两个目标的距离越近,相似度越大;

1 传统K-means实现

1.1 算法流程

流程:
1、导入数据,得到数据维度和取值范围,同时确定聚类数:K;
2、初始化聚类中心,常使用随机数;(由于是随机数生成初始聚类中心,有时会导致算法不收敛);
3、计算每个点到K个聚类中心的距离(欧氏距离),根据距离最小原则将数据分配到K个类中;
4、利用K个类中数据,计算并更新均值更新K个聚类中心;
5、重复步骤3-4,直到收敛(a.达到最大迭代次数,b.聚类中心或者总距离不再明显变化)。

1.2 算法实现

1.2.1 数据集

本次实验使用的2维3分类数据集。其数据集链接🔗为 R3;

1.2.2 主函数(主要是绘图)
clc;clear;close;
tic;
% data 
dataset = load("R3.txt");
dataset = dataset(:,1:2);
%% K-means 
K =3;
iter_max = 1000;
tol = 0.0001;
[point,total_dist] = myk_means(dataset,K,iter_max,tol);  %%此时已完成K-means聚类

%% plot origin dataset
color = {'r','g','b'};
% plot orginal point
subplot(1,2,1);
title("orgin point");
hold on;
scatter(dataset(1:40,1),dataset(1:40,2),'.',color{1});
scatter(dataset(41:80,1),dataset(41:80,2),'.',color{2});
scatter(dataset(81:120,1),dataset(81:120,2),'.',color{3});
legend("cluster1","cluster2","cluster3");
hold off;
%% plot after cluster using K-means
subplot(1,2,2);
title("after Kmeans");
hold on;
for i = 1:K
    point(i).point = [];
     for j = 1:length(point(i).cluster)
            point(i).point = [point(i).point ; dataset(point(i).cluster(j),:)];
     end
     
     scatter(point(i).point(:,1),point(i).point(:,2),".",color{i});
     scatter(point(i).mean(1),point(i).mean(2),"^",color{i});
end
toc;

final_dist = 0;
for i =1:K
    for j = 1:length(point(i).cluster)
        final_dist = final_dist + norm(dataset(point(i).cluster(j),:) - point(i).mean);
    end  
end
disp(['最优距离:',num2str(final_dist)]);
figure(2)
plot(total_dist,"-b");
title(['k means',' ','最优距离',num2str(total_dist(end))]);
xlabel('iter');
ylabel("total distance");
1.2.3 传统K-means实现主体
% dataset  传入数据集   
% K  聚类数目
% iter_max 最大迭代次数
% tol  精度
% point :返回一个结构体,包括 K个类的聚类中心 及原数据的索引序号。
function [point,total_dist] = myk_means(dataset,K,iter_max,tol)

total_dist = [];
%% initialize randomly generate K points;
[m,dim] = size(dataset);
max1 = max(dataset);
min1 = min(dataset);
for i = 1:K
    point(i).mean = rand(1,dim).*(max1-min1) + min1;
end
iter = 1;
flag = 1;
while iter < iter_max  && flag
%% assign every point to a  particular cluster using min-distance-rule
count = zeros(K,1);
for i = 1:m
    temp_distance = inf;
    for  j = 1:K
         distance = norm(dataset(i,:)-point(j).mean);
         if distance <temp_distance
             temp_distance = distance;
             index = j;    %% 更新 属于的类别
         end
    end
    count(index) = count(index)+1;
    point(index).cluster(count(index)) = i;
end

%% clear every cluster
for i = 1:K
    point(i).cluster = point(i).cluster(1:count(i));
    temp_mean(i,:) = point(i).mean;   % record the last_mean_point
end

%% compute new_mean_point
for i = 1:K
    sum = zeros(1,dim);
    for j = 1:length(point(i).cluster)
        for n = 1 : dim
            sum(1,n) = sum(1,n) + dataset(point(i).cluster(j),n);        %得到一个1*dim的同类别属性值之和
        end
    end
    point(i).mean = sum./length(point(i).cluster);
end

%% compute distance between last_mean_point and new_mean_point
delta = 0;
for i =1:K
    delta = delta + norm(temp_mean(i,:)-point(i).mean);
end

if delta <= tol
    flag = 0;
end

iter = iter +1;
%% compute dist
tp_dist = 0;
for i =1:K
    for j = 1:length(point(i).cluster)
        tp_dist = tp_dist + norm(dataset(point(i).cluster(j),:) - point(i).mean);
    end  
end
total_dist = [total_dist;tp_dist];
end
end

1.3 实验结果

聚类中心(结果保留两位小数)为: [11.24,11.62] ;[9.99,10.05];[12.06,10.03]

1.3.1 原始标签数据对比聚类结果

请添加图片描述

1.3.2 总聚类距离收敛图

请添加图片描述
传统K-means方法 总结:速度上非常快,往往几次迭代就可以出结果;偶尔会出现无法收敛(初始聚类中心敏感)。

2 粒子群优化算法实现K-means

使用与传统K-means一样的数据集 R3

2.1 构建优化函数(适应度函数)

K-means聚类问题转化为优化问题即变成,找到K个聚类中心使 所有点到所属类别的距离之和最小!!!
请添加图片描述

2.2 适应度函数代码实现

该部分包括两点,
1、分配:先对于每组聚类中心计算整个数据集的欧氏距离,然后根据欧氏距离最小原则分配类别;
2、总距离(fitness):计算所有数据到所属类别中心的欧氏距离,求和返回total_dist,即为该组聚类中心的 fitness;

2.2.1 距离最小准则分配
function [total_dist,res] = assignment(one_pop,dataset,K_C)
[data_num,dim] = size(dataset);

for k = 1:K_C
    res(k).mean(1,1:dim) = one_pop(1,dim*(k-1)+1:dim*k);
end

% distance最小原则
total_dist = 0; 
count = zeros(K_C,1);

for i = 1:data_num
    temp_dist = inf;
    dist_record = zeros(K_C,1);
    for k =1:K_C
        dist_record(k) = norm(dataset(i,1:dim)- res(k).mean(1,1:dim));
        if(temp_dist > dist_record(k))
            count_flag = k;                                         %%记录距离最小的类
            temp_dist = dist_record(k);
        end
    end

%     res(count_flag).index(,1) = i;           %% 记录类别对应序号

    count(count_flag,1) = count(count_flag,1) + 1;    
    total_dist = total_dist + dist_record(count_flag);    
    res(count_flag).index(count(count_flag,1)) = i;           %% 记录类别对应序号
end

end
2.2.2 计算种群适应度fitness,需要调用assignment
function fitness = Fitness(x_pop,dataset,K_C)
[pop_num,~] = size(x_pop);

fitness = zeros(pop_num,1);
% 传入每一组聚类中心
	for i = 1:pop_num
	    [fitness(i),~] = assignment(x_pop(i,:),dataset,K_C);
	end
end

2.3 主函数(粒子群算法实现)

粒子群算法流程:
1、初始化种群和速度,初始化参数 c1, c2,惯性因子w;
2、计算初始种群的fitness,并初始化,a.个体最优位置 b.个体最优适应度 c.种群最优位置 d.种群最优适应度;
3、更新种群个体速度(考虑速度边界处理问题),更新种群个体位置(考虑个体越界处理问题);
4、检查并更新,个体最优位置,个体最优适应度,种群最优位置,种群最优适应度;
5、重复3-4步,直到收敛或达到最大迭代次数。

2.3.1 PSO && K-means 实现主体

注意:
1、根据聚类中心数目 K_C 和数据维度 dim ,确定个体长度:K_Cdim;
2、该算法可能存在两代之间最优值不更新的情况,故加入的收敛条件更改为3代间差和变化小于精度,即需3代更新不变化,才判断停止迭代。
3、速度范围可能需要依据不同的问题灵活设置,如:V_max (j)= 0.05
x_max(j)

clc;clear;close;
%  Particle Swarm Optimization  solve  k-means problem 
%  min
tic;
dataset = load("R3.txt");
dataset = dataset(:,1:2);

[~,dim] = size(dataset);

K_C = 3;         %类别数
tol = 10e-6;

% pso参数
pop_num = 100;
iter_max = 100;
c1 = 1.5;
c2 = 1.5;
w = 0.9;      %惯性系数

pos_max = max(dataset);
pos_min = min(dataset);
V_max = [0.5,0.5];
V_min = [-0.5,-0.5];
%其他维度时
%%V_max = 0.05*(pos_max-pos_min);
%%V_min =  -0.05*(pos_max-pos_min);


%将自变量写入同一种群
for k = 1:K_C
    x_max(1,(k-1)*dim+1:k*dim) = pos_max;
    x_min(1,(k-1)*dim+1:k*dim) = pos_min;
    v_max(1,(k-1)*dim+1:k*dim) = V_max;
    v_min(1,(k-1)*dim+1:k*dim) = V_min;
end
x_pop = zeros(pop_num,K_C*dim);          %初始化种群个体
v_pop = zeros(pop_num,K_C*dim);          %初始化种群速度
for i = 1:pop_num
        x_pop(i,1:K_C*dim) = rand(1,K_C*dim).*(x_max-x_min)+ x_min.*ones(1,K_C*dim);
        v_pop(i,1:K_C*dim) = rand(1,K_C*dim).*(v_max-v_min)+ v_min.*ones(1,K_C*dim);
end           

fitness = Fitness(x_pop,dataset,K_C);
[~,index] = sort(fitness);
gbest = x_pop(index(1),:);  			 	% 群体极值位置
gbest_fitness = fitness(index(1));        	% 群体适应度极值
pbest = x_pop;              				% 个体极值位置
pbest_fitness = fitness;    				% 个体极值适应度

fit_iter = zeros(iter_max,1);
fit_iter(1,1) = gbest_fitness; 
iter = 2;
flag = 1; %精度控制,精度达到要求置0
while(iter <= iter_max && flag == 1)    
    %更新群体速度和位置
    for i = 1:pop_num
        %更新群体速度
        v_pop(i,:) = w*v_pop(i,:) + c1*rand*(pbest(i,:)-x_pop(i,:)) + c2*rand*(gbest-x_pop(i,:));%rand是[0,1]随机数 
        
        for j = 1:K_C*dim                  %速度 边界处理
            if(v_pop(i,j)> v_max(1,j))
                v_pop(i,j) =  v_max(1,j);   
            end
            if(v_pop(i,j)  < v_min(1,j))
                v_pop(i,j) =  v_min(1,j);
            end
        end
        
        %更新群体位置
        x_pop(i,:) = x_pop(i,:) + 0.5 * v_pop(i,:);
       
        for j = 1:K_C*dim                  		%位置 边界处理,越界,做边界值赋值处理
            if(x_pop(i,j)> x_max(1,j))
                x_pop(i,j) = x_max(1,j);   		% 处理完变 最大值
            end
            if(x_pop(i,j)  < x_min(1,j))
                x_pop(i,j) = x_min(1,j) ;     	% 处理完变 最小值
            end
        end
    end %i循环结束
    
    % 重新计算适应度
    fitness = Fitness(x_pop,dataset,K_C);
    for i = 1:pop_num
        %更新个体极值 和 极值位置
        if (fitness(i) < pbest_fitness(i))
            pbest(i,:) = x_pop(i,:);
            pbest_fitness(i) = fitness(i);   
         % 更新个体极值的同时考虑全局极值的更新   
            if (pbest_fitness(i) < gbest_fitness )
                gbest = pbest(i,:);    
                gbest_fitness = pbest_fitness(i);
            end 
        end       
    end % i循环结束    
    fit_iter(iter,1) = gbest_fitness;
  
    sum = 0;
    % 计算3代差值和
    if(  iter > 3 )
        for co = 1:3
            sum = sum + abs(fit_iter(iter+1-co,1) - fit_iter(iter-co,1));
        end
        if(sum < tol)
            flag = 0;
            cooo = iter;
        end
    end
    
    iter = iter + 1;
end  %iter 结束

[~,res] = assignment(gbest,dataset,K_C);
toc;

figure(1);
hold on;
plot(fit_iter(1:cooo));
xlabel("iteration");
ylabel('total distance');
title('PSO && k-means'' ''最优距离:',num2str(fit_iter(cooo)));
hold off;

2.4 实验结果

聚类中心(保留两位小数,且聚类中心顺序不分先后)为:[11.19,11.65],[10.01,10.01], [12.09,10.00]
请添加图片描述

PSO 实现 K-means总结:
1、优势在于不需要确定初值,收敛比较稳定,为全局最优值。
2、缺点:运算量相较于传统方法较大;

function [idx, C, sumD, D] = kmeans(X, k, varargin) % varargin:实际输入参量 if nargin 1 % 大于1刚至少有一种距离 error(sprintf('Ambiguous ''distance'' parameter value: %s.', distance)); elseif isempty(i) % 如果是空的,则表明没有合适的距离 error(sprintf('Unknown ''distance'' parameter value: %s.', distance)); end % 针对不同的距离,处理不同 distance = distNames{i}; switch distance case 'cityblock' % sort 列元素按升序排列,Xord中存的是元素在原始矩阵中的列中对应的大小位置 [Xsort,Xord] = sort(X,1); case 'cosine' % 余弦 % 计算每一行的和的平方根 Xnorm = sqrt(sum(X.^2, 2)); if any(min(Xnorm) <= eps * max(Xnorm)) error(['Some points have small relative magnitudes, making them ', ... 'effectively zero.\nEither remove those points, or choose a ', ... 'distance other than ''cosine''.'], []); end % 标量化 Xnorm(:,ones(1,p))得到n*p的矩阵 X = X ./ Xnorm(:,ones(1,p)); case 'correlation' % 线性化 X = X - repmat(mean(X,2),1,p); % 计算每一行的和的平方根 Xnorm = sqrt(sum(X.^2, 2)); if any(min(Xnorm) 1 error(sprintf('Ambiguous ''start'' parameter value: %s.', start)); elseif isempty(i) error(sprintf('Unknown ''start'' parameter value: %s.', start)); elseif isempty(k) error('You must specify the number of clusters, K.'); end start = startNames{i}; % strcmp比较两个字符串 uniform是在X中随机选择K个点 if strcmp(start, 'uniform') if strcmp(distance, 'hamming') error('Hamming distance cannot be initialized with uniform random values.'); end % 标量化后的X Xmins = min(X,1); Xmaxs = max(X,1); end elseif isnumeric(start) % 判断输入是否是一个数 这里的start是一个K*P的矩阵,表示初始聚类中心 CC = start; % CC表示初始聚类中心 start = 'numeric'; if isempty(k) k = size(CC,1); elseif k ~= size(CC,1); error('The ''start'' matrix must have K rows.'); elseif size(CC,2) ~= p error('The ''start'' matrix must have the same number of columns as X.'); end if isempty(reps) reps = size(CC,3); elseif reps ~= size(CC,3); error('The third dimension of the ''start'' array must match the ''replicates'' parameter value.'); end % Need to center explicit starting points for 'correlation'. % 线性距离需要指定具体的开始点 % (Re)normalization for 'cosine'/'correlation' is done at each % iteration.每一次迭代进行“余弦和线性”距离正规化 % 判断是否相等 if isequal(distance, 'correlation') % repmat复制矩阵1*P*1 线性化 CC = CC - repmat(mean(CC,2),[1,p,1]); end else error('The ''start'' parameter value must be a string or a numeric matrix or array.'); end % ------------------------------------------------------------------ % 如果一个聚类丢失了所有成员,这个进程将被调用 if ischar(emptyact) emptyactNames = {'error','drop','singleton'}; i = strmatch(lower(emptyact), emptyactNames); if length(i) > 1 error(sprintf('Ambiguous ''emptyaction'' parameter value: %s.', emptyact)); elseif isempty(i) error(sprintf('Unknown ''emptyaction'' parameter value: %s.', emptyact)); end emptyact = emptyactNames{i}; else error('The ''emptyaction'' parameter value must be a string.'); end % ------------------------------------------------------------------ % 控制输出的显示示信息 if ischar(display) % strvcat 垂直连接字符串 i = strmatch(lower(display), strvcat('off','notify','final','iter')); if length(i) > 1 error(sprintf('Ambiguous ''display'' parameter value: %s.', display)); elseif isempty(i) error(sprintf('Unknown ''display'' parameter value: %s.', display)); end display = i-1; else error('The ''display'' parameter value must be a string.'); end % ------------------------------------------------------------------ % 输入参数K的控制 if k == 1 error('The number of clusters must be greater than 1.'); elseif n 2 % 'iter',\t 表示向后空出8个字符 disp(sprintf(' iter\t phase\t num\t sum')); end % ------------------------------------------------------------------ % Begin phase one: batch reassignments 第一队段:分批赋值 converged = false; iter = 0; while true % Compute the distance from every point to each cluster centroid % 计算每一个点到每一个聚类中心的距离,D中存放的是N*K个数值 D(:,changed) = distfun(X, C(changed,:), distance, iter); % Compute the total sum of distances for the current configuration. % Can't do it first time through, there's no configuration yet. % 计算当前配置的总距离,但第一次不能执行,因为还没有配置 if iter > 0 totsumD = sum(D((idx-1)*n + (1:n)')); % Test for a cycle: if objective is not decreased, back out % the last step and move on to the single update phase % 循环测试:如果目标没有减少,退出到最后一步,移动到第二阶段 % prevtotsumD表示前一次的总距离,如果前一次的距离比当前的小,则聚类为上一次的,即不发生变化 if prevtotsumD 2 % 'iter' disp(sprintf(dispfmt,iter,1,length(moved),totsumD)); end if iter >= maxit, % break(2) break; % 跳出while end end % Determine closest cluster for each point and reassign points to clusters % 决定每一个点的最近分簇,重新分配点到各个簇 previdx = idx; % 大小为n*1 % totsumD 被初始化为无穷大,这里表示总距离 prevtotsumD = totsumD; % 返回每一行中最小的元素,d的大小为n*1,nidx为最小元素在行中的位置,其大小为n*1,D为n*p [d, nidx] = min(D, [], 2); if iter == 0 % iter==0,表示第一次迭代 % Every point moved, every cluster will need an update % 每一个点需要移动,每一个簇更新 moved = 1:n; idx = nidx; changed = 1:k; else % Determine which points moved 决定哪一个点移动 % 找到上一次和当前最小元素不同的位置 moved = find(nidx ~= previdx); if length(moved) > 0 % Resolve ties in favor of not moving % 重新分配而不是移动 括号中是一个逻辑运算 确定需要移动点的位置 moved = moved(D((previdx(moved)-1)*n + moved) > d(moved)); end % 如果没有不同的,即当前的是最小元素,跳出循环,得到的元素已经是各行的最小值 if length(moved) == 0 % break(3) break; end idx(moved) = nidx(moved); % Find clusters that gained or lost members 找到获得的或者丢失的成员的分簇 % 得到idx(moved)和previdx(moved)中不重复出现的所有元素,并按升序排列 changed = unique([idx(moved); previdx(moved)])'; end % Calculate the new cluster centroids and counts. 计算新的分簇中心和计数 % C(changed,:)表示新的聚类中心,m(changed)表示聚类标号在idx中出现的次数 % sort 列元素按升序排列,Xsort存放对的元素,Xord中存的是元素在原始矩阵中的列中对应的大小位置 [C(changed,:), m(changed)] = gcentroids(X, idx, changed, distance, Xsort, Xord); iter = iter + 1; % Deal with clusters that have just lost all their members % 处理丢失所有成员的分簇,empties表示丢失元素的聚类标号 不用考虑 empties = changed(m(changed) == 0); if ~isempty(empties) switch emptyact case 'error' % 默认值,一般情况下不会出现 error(sprintf('Empty cluster created at iteration %d.',iter)); case 'drop' % Remove the empty cluster from any further processing % 移走空的聚类 D(:,empties) = NaN; changed = changed(m(changed) > 0); if display > 0 warning(sprintf('Empty cluster created at iteration %d.',iter)); end case 'singleton' if display > 0 warning(sprintf('Empty cluster created at iteration %d.',iter)); end for i = empties % Find the point furthest away from its current cluster. % Take that point out of its cluster and use it to create % a new singleton cluster to replace the empty one. % 找到离聚类中心最远距离的点,把该点从它的聚类中移走,用它来创建一个新的聚类,来代替空的聚类 % 得到列的最大元素(dlarge),以及该元素在列中的标号(lonely) [dlarge, lonely] = max(d); from = idx(lonely); % taking from this cluster 从当前聚类移走 C(i,:) = X(lonely,:); m(i) = 1; idx(lonely) = i; d(lonely) = 0; % Update clusters from which points are taken % 更新那些点被移走的聚类 [C(from,:), m(from)] = gcentroids(X, idx, from, distance, Xsort, Xord); changed = unique([changed from]); end end end end % phase one % ------------------------------------------------------------------ % Initialize some cluster information prior to phase two % 为第二阶段初始化一些先验聚类信息 针对特定的距离,默认的是欧氏距离 switch distance case 'cityblock' Xmid = zeros([k,p,2]); for i = 1:k if m(i) > 0 % Separate out sorted coords for points in i'th cluster, % and save values above and below median, component-wise % 分解出第i个聚类中挑选的点的坐标,保存它的上,下中位数 % reshape把矩阵分解为要求的行列数m*p % sort 列元素按升序排列,Xord中存的是元素在原始矩阵中的列中对应的大小位置 Xsorted = reshape(Xsort(idx(Xord)==i), m(i), p); % floor取比值小或者等于的最近的值 nn = floor(.5*m(i)); if mod(m(i),2) == 0 Xmid(i,:,1:2) = Xsorted([nn, nn+1],:)'; elseif m(i) > 1 Xmid(i,:,1:2) = Xsorted([nn, nn+2],:)'; else Xmid(i,:,1:2) = Xsorted([1, 1],:)'; end end end case 'hamming' Xsum = zeros(k,p); for i = 1:k if m(i) > 0 % Sum coords for points in i'th cluster, component-wise % 对基于分量对第i个聚类的坐标点求和 Xsum(i,:) = sum(X(idx==i,:), 1); end end end % ------------------------------------------------------------------ % Begin phase two: single reassignments 第二阶段:单独赋值 % m中保存的是每一个聚类的个数,元素和为n % find(m' > 0)得到m'中大于0的元素的位置(索引) % 实际情况(默认情况下)changed=1:k changed = find(m' > 0); lastmoved = 0; nummoved = 0; iter1 = iter; while iter < maxit % Calculate distances to each cluster from each point, and the % potential change in total sum of errors for adding or removing % each point from each cluster. Clusters that have not changed % membership need not be updated. % 计算从每一个点到每一个聚类的距离,潜在由于总距离发生变化移除或添加引起的错误。 % 那些成员没有改变的聚类不需要更新。 % % Singleton clusters are a special case for the sum of dists % calculation. Removing their only point is never best, so the % reassignment criterion had better guarantee that a singleton % point will stay in its own cluster. Happily, we get % Del(i,idx(i)) == 0 automatically for them. % 单独聚类在计算距离时是一个特殊情况,仅仅移除点不是最好的方法,因此重新赋值准则能够保证一个 % 单独的点能够留在它的聚类中,可喜的是,自动有Del(i,idx(i)) == 0。 switch distance case 'sqeuclidean' for i = changed % idx存放的距离矩阵D中每一行的最小元素所处的列,也即位置 mbrs = (idx == i); sgn = 1 - 2*mbrs; % -1 for members, 1 for nonmembers % 表示只有一个聚类 if m(i) == 1 % prevent divide-by-zero for singleton mbrs 防止单个成员做0处理 sgn(mbrs) = 0; end Del(:,i) = (m(i) ./ (m(i) + sgn)) .* sum((X - C(repmat(i,n,1),:)).^2, 2); end case 'cityblock' for i = changed if mod(m(i),2) == 0 % this will never catch singleton clusters ldist = Xmid(repmat(i,n,1),:,1) - X; rdist = X - Xmid(repmat(i,n,1),:,2); mbrs = (idx == i); sgn = repmat(1-2*mbrs, 1, p); % -1 for members, 1 for nonmembers Del(:,i) = sum(max(0, max(sgn.*rdist, sgn.*ldist)), 2); else Del(:,i) = sum(abs(X - C(repmat(i,n,1),:)), 2); end end case {'cosine','correlation'} % The points are normalized, centroids are not, so normalize them normC(changed) = sqrt(sum(C(changed,:).^2, 2)); if any(normC 0 % Resolve ties in favor of not moving % 重新分配而不是移动 确定移动的位置 moved = moved(Del((previdx(moved)-1)*n + moved) > minDel(moved)); end if length(moved) == 0 % Count an iteration if phase 2 did nothing at all, or if we're % in the middle of a pass through all the points if (iter - iter1) == 0 | nummoved > 0 iter = iter + 1; if display > 2 % 'iter' disp(sprintf(dispfmt,iter,2,nummoved,totsumD)); end end converged = true; break; end % Pick the next move in cyclic order moved = mod(min(mod(moved - lastmoved - 1, n) + lastmoved), n) + 1; % If we've gone once through all the points, that's an iteration if moved 2 % 'iter' disp(sprintf(dispfmt,iter,2,nummoved,totsumD)); end if iter >= maxit, break; end nummoved = 0; end nummoved = nummoved + 1; lastmoved = moved; oidx = idx(moved); nidx = nidx(moved); totsumD = totsumD + Del(moved,nidx) - Del(moved,oidx); % Update the cluster index vector, and rhe old and new cluster % counts and centroids idx(moved) = nidx; m(nidx) = m(nidx) + 1; m(oidx) = m(oidx) - 1; switch distance case 'sqeuclidean' C(nidx,:) = C(nidx,:) + (X(moved,:) - C(nidx,:)) / m(nidx); C(oidx,:) = C(oidx,:) - (X(moved,:) - C(oidx,:)) / m(oidx); case 'cityblock' for i = [oidx nidx] % Separate out sorted coords for points in each cluster. % New centroid is the coord median, save values above and % below median. All done component-wise. Xsorted = reshape(Xsort(idx(Xord)==i), m(i), p); nn = floor(.5*m(i)); if mod(m(i),2) == 0 C(i,:) = .5 * (Xsorted(nn,:) + Xsorted(nn+1,:)); Xmid(i,:,1:2) = Xsorted([nn, nn+1],:)'; else C(i,:) = Xsorted(nn+1,:); if m(i) > 1 Xmid(i,:,1:2) = Xsorted([nn, nn+2],:)'; else Xmid(i,:,1:2) = Xsorted([1, 1],:)'; end end end case {'cosine','correlation'} C(nidx,:) = C(nidx,:) + (X(moved,:) - C(nidx,:)) / m(nidx); C(oidx,:) = C(oidx,:) - (X(moved,:) - C(oidx,:)) / m(oidx); case 'hamming' % Update summed coords for points in each cluster. New % centroid is the coord median. All done component-wise. Xsum(nidx,:) = Xsum(nidx,:) + X(moved,:); Xsum(oidx,:) = Xsum(oidx,:) - X(moved,:); C(nidx,:) = .5*sign(2*Xsum(nidx,:) - m(nidx)) + .5; C(oidx,:) = .5*sign(2*Xsum(oidx,:) - m(oidx)) + .5; end changed = sort([oidx nidx]); end % phase two % ------------------------------------------------------------------ if (~converged) & (display > 0) warning(sprintf('Failed to converge in %d iterations.', maxit)); end % Calculate cluster-wise sums of distances nonempties = find(m(:)'>0); D(:,nonempties) = distfun(X, C(nonempties,:), distance, iter); d = D((idx-1)*n + (1:n)'); sumD = zeros(k,1); for i = 1:k sumD(i) = sum(d(idx == i)); end if display > 1 % 'final' or 'iter' disp(sprintf('%d iterations, total sum of distances = %g',iter,totsumD)); end % Save the best solution so far if totsumD 3 Dbest = D; end end end % Return the best solution
评论 23
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Duckbubi1

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值