聚类算法4——DBSCAN密度聚类(算法步骤及matlab代码)

看了西关书的聚类算法,算法原理很容易明白,接下来就是整理成自己的理解思路,然后一步一步来实现算法,那么就来做吧。

DensityClustering算法

  • 概念

从样本密度的角度考察样本之间的可连接性,样本分布的紧密程度刻画聚类结构

  • 术语

核心对象:样本x_j的Δd邻域内至少包含MinPts个样本,称x_j为核心对象

密度直达:x_j邻域内的样本x_i,称x_j由x_i密度直达

密度可达:对于x_j和x_i,存在样本序列p1,p2,…pn,若p1=x_j,pn=x_i,p_i+1由p_i密度直达,x_j和x_i密度可达

密度相连:对于x_j和x_i,若存在x_k,使得x_i与x_j均由x_k密度可达,则x_j和x_i密度相连。

DBSCAN将簇定义为:由密度可达关系导出最大密度相连样本集合。

三、算法步骤

输入:样本集,邻域参数(邻域距离Δd,最小包含邻域样本个数MinPst)

输出:聚类簇划分

Step1、搜索核心对象集 search_objects()

输入:样本集D,邻域参数;输出:核心对象集

Step1.1载入数据集,初始化核心对象集、邻域参数

Step1.2 遍历样本,根据邻域参数,搜索核心对象,并添加到核心对象集合中

Setp2、密度聚类 density_clustering()

输入:核心对象集O,样本集T = D,邻域参数

输出:聚类簇个数k,聚类簇划分集C

Step2.1初始化聚类样本簇k=0;初始化未访问样本集合T =D;

Step2.2 repeat_Objects_clustering();源源不断的对核心对象集中的元素抽取以聚类原则工作

K=0;T=D;

While O != NULL

记录当前未访问的样本集合T_old = T

随机选取一个核心对象o∈O, 初始化队列Q = < o>

T=T\{o};

While Q!=NULL

取出队列Q中的首个样本q

If q的邻域样本个数>= MinPst

Δ = q的邻域样本

Q= {Q;Δ}

T = T\Δ

End if

End while Q

K = k+1; 生成聚类簇C_k = T_old\T

O = O\C_k

End whileO

DBSCAN代码下载链接

ok啦,我选择的是最小距离聚类方法,接下来不废话上代码(Matlab发布形式)

function Main()
clc
clear
close all
%step1
melon_data = load('melon4.0.txt');
melon_data(:,1)=[];
global delta_dist;
global min_pst;
delta_dist = 0.11; min_pst = 5;
object_set = search_objects(melon_data); %ok
%step2
[k_class,cluster_set]= repeat_Objects_clustering(melon_data,object_set);%ok
show(melon_data,cluster_set);
plot(object_set(:,1),object_set(:,2),'ro',...
    'MarkerEdgeColor','k',...
    'MarkerFaceColor','g',...
    'MarkerSize',4)
fprintf('样本密度聚类个数为:%d\n',k_class);
end

subfunction

%step1
function object_set = search_objects(melon_data)
object_set = [];
for i= 1:length(melon_data)
    [is_object, ~] = core_engin(melon_data,melon_data(i,:));
   if is_object % including xi itself
       object_set = [object_set;melon_data(i,:)];
   end
end
end
%core engin
function [is_object, xi_object_samples] = core_engin(melon_data,xi_data)
% judge objects and get object samples
global delta_dist;
global min_pst;
is_object = 0;
xi_dist =  pdist2(melon_data,xi_data);
min_pst_ind= find(xi_dist<=delta_dist);
if length(min_pst_ind) >= min_pst % including xi itself
   is_object =1;
   xi_object_samples = melon_data(min_pst_ind,:);
else
    xi_object_samples =[];
end
end
%step2
function [k_class,cluster_set]= repeat_Objects_clustering(melon_data,object_set)
cluster_set.k_rows =[];
cluster_set.cluster =[];
k=0;
t_data = melon_data;
while ~isempty(object_set)
    t_old_data = t_data; % not visit smample data
    [OS_rows,~] = size(object_set);
    Q = object_set(randi(OS_rows),:);
    del_ind = search_same_data(t_data,Q);
    t_data(del_ind,:)=[];
    while ~isempty(Q)
        [is_object, xi_object_samples] = core_engin(melon_data,Q(1,:));
        if is_object %>=min_pst
            delta_sample= set_across(t_data,xi_object_samples);
            if ~isempty(delta_sample)
                Q =[Q;delta_sample];
                t_data = set_diff(t_data,delta_sample);
            end
        end
        Q(1,:)=[];
    end
 k=k+1;
 cur_cluster = set_diff(t_old_data,t_data);
 object_set = set_diff(object_set,cur_cluster);
 %store
 [cur_cluster_rows,~] = size(cur_cluster);
 cluster_set.k_rows =[cluster_set.k_rows;cur_cluster_rows];
 cluster_set.cluster =[cluster_set.cluster;cur_cluster];
end
k_class = k;
end
function output_data = set_across(act_data,pas_data)
% this function is doing output_data = act_data ∩pas_data
output_data = [];
[PD_rows,~] = size(pas_data);
for i =1:PD_rows
    delta_ind = search_same_data(act_data,pas_data(i,:));
    if ~isempty(delta_ind)
       output_data = [output_data;pas_data(i,:)];
    else
        continue;
    end
end

end

function output_data = set_diff(act_data,pas_data)
%this function is set operation  : output_data = act_data\pas_data去除操作
[m,~] = size(pas_data);
for i= 1:m
    delta_ind = search_same_data(act_data,pas_data(i,:));
    if ~isempty(delta_ind)
       act_data(delta_ind,:) =[];
    else
        continue;
    end
end
output_data = act_data;
end

function zero_ind = search_same_data(data,xi_data)
    dist = pdist2(data,xi_data);
    zero_ind = find(dist==0);
end


function show(melon_data,cluster_set)
plot(melon_data(:,1),melon_data(:,2),'+b');
hold on
cum_rows = cumsum(cluster_set.k_rows);
plot(cluster_set.cluster(1:cum_rows(1),1),cluster_set.cluster(1:cum_rows(1),2),'or');
plot(cluster_set.cluster(cum_rows(1)+1:cum_rows(2),1),cluster_set.cluster(cum_rows(1)+1:cum_rows(2),2),'sg');
plot(cluster_set.cluster(cum_rows(2)+1:cum_rows(3),1),cluster_set.cluster(cum_rows(2)+1:cum_rows(3),2),'^k');
plot(cluster_set.cluster(cum_rows(3)+1:end,1),cluster_set.cluster(cum_rows(3)+1:end,2),'pm');
xlabel('density');ylabel('sugar rate');
end
样本密度聚类个数为:4

  • 12
    点赞
  • 104
    收藏
    觉得还不错? 一键收藏
  • 10
    评论
DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一种基于密度聚类算法,可以自动发现不同形状和大小的密集区域,并将离群点视为噪声。 以下是在MATLAB中实现DBSCAN聚类算法的示例代码和数据: ```matlab % 初始化数据 data = [rand(100,2)*0.5; rand(100,2)*0.5+0.5]; data = [data; rand(20,2)*0.5+0.25, rand(20,1)*0.5]; % 设置算法参数 epsilon = 0.1; % 邻域半径 minPts = 5; % 邻域最小样本数 % DBSCAN算法 labels = dbscan(data, epsilon, minPts); % 可视化结果 gscatter(data(:,1), data(:,2), labels) xlabel('X') ylabel('Y') legend('Cluster 1', 'Cluster 2', 'Noise') % 定义DBSCAN函数 function labels = dbscan(data, epsilon, minPts) numPoints = size(data, 1); labels = zeros(numPoints, 1); clusterId = 1; for i = 1:numPoints if labels(i) == 0 if expandCluster(data, labels, i, clusterId, epsilon, minPts) clusterId = clusterId + 1; end end end end function result = expandCluster(data, labels, pointId, clusterId, epsilon, minPts) seeds = regionQuery(data, pointId, epsilon); if length(seeds) < minPts labels(pointId) = -1; % 标记为噪声 result = false; return; else labels(seeds) = clusterId; labels(pointId) = clusterId; while ~isempty(seeds) currentPoint = seeds(1); result = regionQuery(data, currentPoint, epsilon); if length(result) >= minPts for i = 1:length(result) resultPoint = result(i); if labels(resultPoint) == 0 || labels(resultPoint) == -1 if labels(resultPoint) == 0 seeds(end+1) = resultPoint; end labels(resultPoint) = clusterId; end end end seeds(1) = []; end result = true; end end function result = regionQuery(data, pointId, epsilon) result = []; for i = 1:size(data, 1) if pdist2(data(pointId, :), data(i, :)) <= epsilon result(end+1) = i; end end end ``` 上述代码定义了一个DBSCAN函数,该函数接受数据、邻域半径和邻域最小样本数作为输入,并返回每个样本点的簇标签。数据是一个包含x和y坐标的矩阵,其中前200个样本点属于两个不同的簇,后20个样本点被视为噪声。 执行DBSCAN算法后,将获得每个样本点的簇标签,并使用gscatter函数将聚类结果可视化出来。其中簇标签为正数表示样本点属于某个簇,为-1表示样本点被视为噪声。图中展示了两个簇和噪声点的位置。 希望这个示例可以帮助你理解如何使用MATLAB实现DBSCAN聚类算法,并处理聚类数据。
评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值