K—Means 关联分析

最新推荐文章于 2024-07-19 11:59:06 发布

⁠小华

最新推荐文章于 2024-07-19 11:59:06 发布

阅读量71

点赞数

文章标签：支持向量机机器学习人工智能

本文链接：https://blog.csdn.net/qq_73963438/article/details/134312028

版权

简介：

kmeans聚类算法是一种迭代求解的聚类分析算法。它是一种机器学习技术，一种无监督的学习方法。

聚类和分类：

分类的目标是已确定的，事先已知的。聚类的目标不明确，分出的类没有明确的定义。

聚类的过程：

1.数据准备：包括特征标准化和降维。

2.特征选择：从最初的特征中选择最有效的特征，并将其存储于向量中。

3.特征提取：对所选择的特征进行转换，形成新的突出特征。

4.分组：选择合适的距离函数或自己构造，进行距离上的度量，而后进行分组。

5.聚类结果评估：3种：外部有效性评估、内部有效性评估、相关性测试评估。

聚类分析要求：

不同聚类算法有不同的应用场景，有的可用于大数据，有的适用小数据。

（1）随机选取K个对象作为初始的聚类中心

（2）计算每个对象与各个种子聚类中心之间的距离，把每个对象分配给距离它最近的聚类中心。

（3）聚类中心以及分配给他们的对象就代表一个聚类。每分配一个样本，聚类的聚类中心会根据聚类中现有的对象被重新计算。

（4）不断重复（2）（3）步骤（迭代）。直到聚类误差平方和局部最小。

代码实现：

function [index_cluster,cluster] = kmeans_func(data,cluster_num)
%% 原理推导Kmeans聚类算法
[m,n]=size(data);
cluster=data(randperm(m,cluster_num),:);%从m个点中随机选择cluster_num个点作为初始聚类中心点
epoch_max=1000;%最大次数
therad_lim=0.001;%中心变化阈值
epoch_num=0;
while(epoch_num<epoch_max)
    epoch_num=epoch_num+1;
    % distance1存储每个点到各聚类中心的欧氏距离
    for i=1:cluster_num
        distance=(data-repmat(cluster(i,:),m,1)).^2;
        distance1(:,i)=sqrt(sum(distance'));
    end
    [~,index_cluster]=min(distance1');%index_cluster取值范围1~cluster_num
    % cluster_new存储新的聚类中心
    for j=1:cluster_num
        cluster_new(j,:)=mean(data(find(index_cluster==j),:));
    end
    %如果新的聚类中心和上一轮的聚类中心距离和大于therad_lim，更新聚类中心，否则算法结束
    if (sqrt(sum((cluster_new-cluster).^2))>therad_lim)
        cluster=cluster_new;
    else
        break;
    end
end
clc;clear;close all;
data(:,1)=[90,35,52,83,64,24,49,92,99,45,19,38,1,71,56,97,63,...
    32,3,34,33,55,75,84,53,15,88,66,41,51,39,78,67,65,25,40,77,...
    13,69,29,14,54,87,47,44,58,8,68,81,31];
data(:,2)=[33,71,62,34,49,48,46,69,56,59,28,14,55,41,39,...
    78,23,99,68,30,87,85,43,88,2,47,50,77,22,76,94,11,80,...
    51,6,7,72,36,90,96,44,61,70,60,75,74,63,40,81,4];
figure(1)
scatter(data(:,1),data(:,2),'LineWidth',2)
title('原始数据散点图')
cluster_num=4;
[index_cluster,cluster] = kmeans_func(data,cluster_num);
%% 画出聚类效果
figure(2)
% subplot(2,1,1)
a=unique(index_cluster); %找出分类出的个数
C=cell(1,length(a));
for i=1:length(a)
   C(1,i)={find(index_cluster==a(i))};
end
for j=1:cluster_num
    data_get=data(C{1,j},:);
    scatter(data_get(:,1),data_get(:,2),100,'filled','MarkerFaceAlpha',.6,'MarkerEdgeAlpha',.9);
    hold on
end
%绘制聚类中心
plot(cluster(:,1),cluster(:,2),'ks','LineWidth',2);
hold on
sc_t=mean(silhouette(data,index_cluster'));
title_str=['原理推导K均值聚类','  聚类数为：',num2str(cluster_num),'  SC轮廓系数:',num2str(sc_t)];
title(title_str)

⁠小华

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
K—Means 关联分析

每分配一个样本，聚类的聚类中心会根据聚类中现有的对象被重新计算。（2）计算每个对象与各个种子聚类中心之间的距离，把每个对象分配给距离它最近的聚类中心。分类的目标是已确定的，事先已知的。聚类的目标不明确，分出的类没有明确的定义。4.分组：选择合适的距离函数或自己构造，进行距离上的度量，而后进行分组。2.特征选择：从最初的特征中选择最有效的特征，并将其存储于向量中。不同聚类算法有不同的应用场景，有的可用于大数据，有的适用小数据。3.特征提取：对所选择的特征进行转换，形成新的突出特征。
复制链接

扫一扫