聚类算法之概率密度函数 optics

 什么是OPTICS算法

在前面介绍的DBSCAN算法中,有两个初始参数E(邻域半径)和minPts(E邻域最小点数)需要用户手动设置输入,并且聚类的类簇结果对这两个参数的取值非常敏感,不同的取值将产生不同的聚类结果,其实这也是大多数其他需要初始化参数聚类算法的弊端。

为了克服DBSCAN算法这一缺点,提出了OPTICS算法(Ordering Points to identify the clustering structure)。OPTICS并 不显示的产生结果类簇,而是为聚类分析生成一个增广的簇排序(比如,以可达距离为纵轴,样本点输出次序为横轴的坐标图),这个排序代表了各样本点基于密度 的聚类结构。它包含的信息等价于从一个广泛的参数设置所获得的基于密度的聚类,换句话说,从这个排序中可以得到基于任何参数EminPtsDBSCAN算法的聚类结果。     

 OPTICS两个概念

核心距离

对象p的核心距离是指是p成为核心对象的最小E’。如果p不是核心对象,那么p的核心距离没有任何意义。

可达距离

对象q到对象p的可达距离是指p的核心距离和pq之间欧几里得距离之间的较大值。如果p不是核心对象,pq之间的可达距离没有意义。

例如:假设邻域半径E=2, minPts=3,存在点A(2,3),B(2,4),C(1,4),D(1,3),E(2,2),F(3,2)

A为核心对象,在AE领域中有点{A,B,C,D,E,F},其中A的核心距离为E’=1,因为在点AE’邻域中有点{A,B,D,E}>3;

F到核心对象点A的可达距离为2^(1/2),因为AF的欧几里得距离2^(1/2),大于点A的核心距离1.


3 算法描述

OPTICS算法额外存储了每个对象的核心距离和可达距离。基于OPTICS产生的排序信息来提取类簇。

算法描述如下:

算法:OPTICS

输入:样本集D, 邻域半径E, 给定点在E领域内成为核心对象的最小领域点数MinPts

输出:具有可达距离信息的样本点输出排序

方法:1 创建两个队列,有序队列和结果队列。(有序队列用来存储核心对象及其该核心对

         象的直接可达对象,并按可达距离升序排列;结果队列用来存储样本点的输出次

         序);

       2 如果所有样本集D中所有点都处理完毕,则算法结束。否则,选择一个未处理(即

不在结果队列中)且为核心对象的样本点,找到其所有直接密度可达样本点,如

过该样本点不存在于结果队列中,则将其放入有序队列中,并按可达距离排序;

      3 如果有序队列为空,则跳至步骤2,否则,从有序队列中取出第一个样本点(即可

达距离最小的样本点)进行拓展,并将取出的样本点保存至结果队列中,如果它不

存在结果队列当中的话。

 3.1 判断该拓展点是否是核心对象,如果不是,回到步骤3,否则找到该拓展点所

有的直接密度可达点;

3.2 判断该直接密度可达样本点是否已经存在结果队列,是则不处理,否则下一

步;

3.2 如果有序队列中已经存在该直接密度可达点,如果此时新的可达距离小于旧

的可达距离,则用新可达距离取代旧可达距离,有序队列重新排序;

3.3 如果有序队列中不存在该直接密度可达样本点,则插入该点,并对有序队列

   重新排序;

       4 算法结束,输出结果队列中的有序样本点。

大家或许会很疑惑,这里不也有输入参数EMinPts吗?其实这里的EMinPts只是起到算法辅助作用,也就是说EMinPts的细微变化并不会影响到样本点的相对输出顺序,这对我们分析聚类结果是没有任何影响。

我们采用与先前DBSCAN相同的样本点集合,

对于样本点

a={2,3};b={2,4};c={1,4};d={1,3};e={2,2};f={3,2};

g={8,7};h={8,6};i={7,7};j={7,6};k={8,5};

l={100,2};//孤立点

m={8,20};n={8,19};o={7,18};p={7,17};q={8,21};

并且使用相同的E=2 MinPts=4时,输出序列为

1->a:1.0

2->e:1.0

3->b:1.0

4->d:1.0

5->c:1.4142135623730951

6->f:1.4142135623730951

------

7->g:1.4142135623730951

8->j:1.4142135623730951

9->k:1.4142135623730951

10->i:1.4142135623730951

11->h:1.4142135623730951

------

12->n:2.0

13->q:2.0

14->o:2.0

15->m:2.0


按照算法,分三个阶段输出了三波值

{a,e,b,d,c,f} ,{g,j,k,I,h},{n,q,o,m}

这和DBSCAN的类簇结果是一样的。不仅如此,我们通过分析有序图还能直接得到当参数E=1.5,minPts=4DBSCAN的类簇结果,只要在坐标图中找到Y值小于1.5的样本点即可,只有两类{a,e,b,d,c,f} ,{g,j,k,I,h},其他点被认为是孤立点,和DBSCAN聚类算法取E=1.5,minPts=4时的结果一致。

所以说,这个OPTICS聚类算法所得的簇排序信息等价于一个广泛的参数设置所获得的基于密度的聚类结果。

具体实现算法如下:

package com.optics;

public class DataPoint {
    private String name; // 样本点名
    private double dimensioin[]; // 样本点的维度
    private double coreDistance; //核心距离,如果该点不是核心对象,则距离为-1
    private double reachableDistance; //可达距离

    public DataPoint(){
    }

    public DataPoint(DataPoint e){
        this.name=e.name;
        this.dimensioin=e.dimensioin;
        this.coreDistance=e.coreDistance;
        this.reachableDistance=e.reachableDistance;
    }

    public DataPoint(double dimensioin[],String name){
        this.name=name;
        this.dimensioin=dimensioin;
        this.coreDistance=-1;
        this.reachableDistance=-1;
    }

    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public double[] getDimensioin() {
        return dimensioin;
    }
    public void setDimensioin(double[] dimensioin) {
        this.dimensioin = dimensioin;
    }
    public double getCoreDistance() {
        return coreDistance;
    }
    public void setCoreDistance(double coreDistance) {
        this.coreDistance = coreDistance;
    }
    public double getReachableDistance() {
        return reachableDistance;
    }
    public void setReachableDistance(double reachableDistance) {
        this.reachableDistance = reachableDistance;
    }
}
package com.optics;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;

public class ClusterAnalysis {
    class ComparatorDp implements Comparator<DataPoint>{
        public int compare(DataPoint arg0, DataPoint arg1) {
            double temp=arg0.getReachableDistance()-arg1.getReachableDistance();
            int a = 0;
            if (temp < 0) {
                a = -1;
            } else {
                a = 1;
            }
            return a;
        }
    }

   
    public List<DataPoint> startAnalysis(List<DataPoint> dataPoints,
            double radius, int ObjectNum) {
        List<DataPoint> dpList = new ArrayList<DataPoint>();
        List<DataPoint> dpQue = new ArrayList<DataPoint>();

        int total = 0;
        while (total < dataPoints.size()) {
            if (isContainedInList(dataPoints.get(total), dpList) == -1 ) {
                List<DataPoint> tmpDpList = isKeyAndReturnObjects(dataPoints.get(total),
                        dataPoints, radius, ObjectNum);
                if(tmpDpList != null && tmpDpList.size() > 0){
                    DataPoint newDataPoint=new DataPoint(dataPoints.get(total));
                   dpQue.add(newDataPoint);
                }
            }
            while (!dpQue.isEmpty()) {
                DataPoint tempDpfromQ = dpQue.remove(0);
                DataPoint newDataPoint=new DataPoint(tempDpfromQ);
                dpList.add(newDataPoint);
                List<DataPoint> tempDpList = isKeyAndReturnObjects(tempDpfromQ,
                        dataPoints, radius, ObjectNum);
               System.out.println(newDataPoint.getName()+":"+newDataPoint.getReachableDistance());
                if (tempDpList != null && tempDpList.size() > 0) {
                    for (int i = 0; i < tempDpList.size(); i++) {
                        DataPoint tempDpfromList = tempDpList.get(i);
                        int indexInList = isContainedInList(tempDpfromList,
                                dpList);
                        int indexInQ = isContainedInList(tempDpfromList, dpQue);
                        if (indexInList == -1) {
                            if (indexInQ > -1) {
                                int index = -1;
                                for (DataPoint dataPoint : dpQue) {
                                    index++;
                                    if (index == indexInQ) {
                                        if (dataPoint.getReachableDistance() > tempDpfromList
                                               .getReachableDistance()) {
                                            dataPoint
                                                   .setReachableDistance(tempDpfromList
                                                           .getReachableDistance());
                                        }
                                    }
                                }
                            } else {
                                dpQue.add(new DataPoint(tempDpfromList));
                            }
                        }
                    }

                    // TODO:对Q进行重新排序
                    Collections.sort(dpQue, new ComparatorDp());
                }
            }
            System.out.println("------");
            total++;

        }

        return dpList;
    }

   
    public void displayDataPoints(List<DataPoint> dps){
        for(DataPoint dp: dps){
            System.out.println(dp.getName()+":"+dp.getReachableDistance());
        }
    }

   
    private int isContainedInList(DataPoint dp, List<DataPoint> dpList) {
        int index = -1;
        for (DataPoint dataPoint : dpList) {
            index++;
            if (dataPoint.getName().equals(dp.getName())) {
                return index;
            }
        }
        return -1;
    }

   
   private List<DataPoint> isKeyAndReturnObjects(DataPoint dataPoint,List<DataPoint> dataPoints,double radius,int ObjectNum){
       List<DataPoint> arrivableObjects=new ArrayList<DataPoint>(); //用来存储所有直接密度可达对象
       List<Double> distances=new ArrayList<Double>(); //欧几里得距离
       double coreDistance; //核心距离

        for (int i = 0; i < dataPoints.size(); i++) {
            DataPoint dp = dataPoints.get(i);
            double distance = getDistance(dataPoint, dp);
            if (distance <= radius) {
                distances.add(distance);
                arrivableObjects.add(dp);
            }
        }

       if(arrivableObjects.size()>=ObjectNum){
           List<Double> newDistances=new ArrayList<Double>(distances);
           Collections.sort(distances);
           coreDistance=distances.get(ObjectNum-1);
           for(int j=0;j<arrivableObjects.size();j++){
                if (coreDistance > newDistances.get(j)) {
                    if(newDistances.get(j)==0){
                        dataPoint.setReachableDistance(coreDistance);
                    }
                    arrivableObjects.get(j).setReachableDistance(coreDistance);
                }else{
                   arrivableObjects.get(j).setReachableDistance(newDistances.get(j));
                }
           }
           return arrivableObjects;
       }

       return null;
   }

   
    private double getDistance(DataPoint dp1,DataPoint dp2){
        double distance=0.0;
        double[] dim1=dp1.getDimensioin();
        double[] dim2=dp2.getDimensioin();
        if(dim1.length==dim2.length){
            for(int i=0;i<dim1.length;i++){
                double temp=Math.pow((dim1[i]-dim2[i]), 2);
                distance=distance+temp;
            }
            distance=Math.pow(distance, 0.5);
            return distance;
        }
        return distance;
    }

    public static void main(String[] args){
         ArrayList<DataPoint> dpoints = new ArrayList<DataPoint>();
        
         double[] a={2,3};
         double[] b={2,4};
         double[] c={1,4};
         double[] d={1,3};
         double[] e={2,2};
         double[] f={3,2};

         double[] g={8,7};
         double[] h={8,6};
         double[] i={7,7};
         double[] j={7,6};
         double[] k={8,5};

         double[] l={100,2};//孤立点


         double[] m={8,20};
         double[] n={8,19};
         double[] o={7,18};
         double[] p={7,17};
         double[] q={8,21};

         dpoints.add(new DataPoint(a,"a"));
         dpoints.add(new DataPoint(b,"b"));
         dpoints.add(new DataPoint(c,"c"));
         dpoints.add(new DataPoint(d,"d"));
         dpoints.add(new DataPoint(e,"e"));
         dpoints.add(new DataPoint(f,"f"));

         dpoints.add(new DataPoint(g,"g"));
         dpoints.add(new DataPoint(h,"h"));
         dpoints.add(new DataPoint(i,"i"));
         dpoints.add(new DataPoint(j,"j"));
         dpoints.add(new DataPoint(k,"k"));

         dpoints.add(new DataPoint(l,"l"));

         dpoints.add(new DataPoint(m,"m"));
         dpoints.add(new DataPoint(n,"n"));
         dpoints.add(new DataPoint(o,"o"));
         dpoints.add(new DataPoint(p,"p"));
         dpoints.add(new DataPoint(q,"q"));

         ClusterAnalysis ca=new ClusterAnalysis();
         List<DataPoint> dps=ca.startAnalysis(dpoints, 2, 4);
         ca.displayDataPoints(dps);
    }

}

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值