DBScan 是一种基于密度的聚类算法,主要算法流程如下图:
DBSCAN(D, eps, MinPts)
C = 0 //类别标示
for each unvisited point P in dataset D //遍历
mark P as visited //已经访问
NeighborPts = regionQuery(P, eps) //计算这个点的邻域
if sizeof(NeighborPts) < MinPts //不能作为核心点
mark P as NOISE //标记为噪音数据
else //作为核心点,根据该点创建一个类别
C = next cluster
expandCluster(P, NeighborPts, C, eps, MinPts) //根据该核心店扩展类别
expandCluster(P, NeighborPts, C, eps, MinPts)
add P to cluster C //扩展类别,核心店先加入
for each point P' in NeighborPts //然后针对核心店邻域内的点,如果该点没有被访问,
if P' is not visited
mark P' as visited //进行访问
NeighborPts' = regionQuery(P', eps) //如果该点为核心点,则扩充该类别
if sizeof(NeighborPts') >= MinPts
NeighborPts = NeighborPts joined with NeighborPts'
if P' is not yet member of any cluster //如果邻域内点不是核心点,并且无类别,比如噪音数据,则加入此类别
add P' to cluster C
regionQuery(P, eps) //计算邻域
return all points within P's eps-neighborhood
结合百度百科的伪代码: