如果你渴望得到某样东西,你得让它自由,如果它回到你身边,它就是属于你的,如果它不回来,你就从未拥有过它。——大仲马《基督山伯爵》
生活是一面镜子,我们努力追求的第一件事,就是从中辨认出自己。——尼采
###目录
文章目录
### 1 聚类概念
关于聚类的一些相关的概念请看 这里。
聚类是对物理对象或者抽象对象的集合进行分组的过程,所生成的组称为簇,簇是数据对象的集合。簇内部两个对象之间应该具有较高的相似度,而对于不同簇的两个对象之间应该具有较高的相异度。相异度一般是根据描述对象的两个属性值进行计算,最常采用的度量指标是对象间的距离。
###2 聚类算法的介绍
####2-1 KMeans(K均值)
K
M
e
n
s
KMens
KMens是基于原型的、划分的聚类技术,试图划分用户指定 个数
k
k
k 的簇。
K-means算法的基本思想是:以空间中k个点为中心进行聚类,对最靠近他们的对象归类。通过迭代的方法,逐次更新各聚类中心的值,直至得到最好的聚类结果。
算法:
选择k个点作为初始质心
repeat
将每个点指派给最近的质心,形成k个簇
重新计算每个簇的质心
until 质心不再发生变化
相似度的计算可以使用欧氏距离或者曼哈顿距离。
考虑临近度是欧氏距离的数据,通常使用误差平方和
S
S
E
SSE
SSE(Sum of the Qquares Error)作为度量聚类质量的目标函数。
S
S
E
SSE
SSE的定义如下所示:
S
S
E
=
∑
i
=
1
K
∑
x
∈
C
i
d
i
s
t
(
c
i
,
x
)
SSE=\sum_{i=1}^K \sum_{x \in C_i} dist(c_i,x)
SSE=i=1∑Kx∈Ci∑dist(ci,x)
####2-2 EM(期望最大化)
E
M
EM
EM(Expectation Maximization)是
K
M
e
a
n
s
KMeans
KMeans方法的一个扩展,它不是把对象分配给一个确定的簇,而是根据对象与簇之间的隶属关系发生的概率来分配对象。EM算法是解决数据缺失问题的一种出色的算法。
E
M
EM
EM算法使用两个步骤交替计算:
第一步是计算期望(E),利用对隐藏变量的现有估计值,计算其最大似然估计值;
第二步是最大化(M),最大化在 E 步上求得的最大似然值来计算参数的值。
然后将M 步上找到的参数估计值被用于下一个 E 步计算中,这个过程不断交替进行。
参考链接 从最大似然到EM算法浅解
比较复杂的概率理论知识…… 目前我还没有彻底理解。
####2-3 DBSCAN(具有噪声的基于密度的聚类方法)
D
B
S
C
A
N
DBSCAN
DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一种基于密度的聚类算法,簇的个数由算法自动确定。将低密度区域中的点视为噪声而忽略,因此
D
B
S
C
A
N
DBSCAN
DBSCAN不产生完全聚类。
常用术语的定义:
- 半径(Eps):用户指定的距离
- 核心点(Core Point):位于基于密度的簇的内部。点的邻域由距离函数和用户指定的距离Eps共同决定。核心点的定义是,如果该点的给定邻域内的点的个数超过给定的阈值MinPts,MinPts由用户指定。
- 边界点(Border Point):边界点不是核心点,但是落在核心点的邻域内。
- 噪声点(Noise Point):既不是核心点也不是边界点的点称为噪声点。
DBSCAN算法描述:
输入: 包含n个对象的数据库,半径e,最少数目MinPts;
输出:所有生成的簇,达到密度要求。
(1)Repeat
(2)从数据库中抽出一个未处理的点;
(3)IF抽出的点是核心点 THEN 找出所有从该点密度可达的对象,形成一个簇;
(4)ELSE 抽出的点是边缘点(非核心对象),跳出本次循环,寻找下一个点;
(5)UNTIL 所有的点都被处理。
DBSCAN对用户定义的参数很敏感,细微的不同都可能导致差别很大的结果,而参数的选择无规律可循,只能靠经验确定。
其伪代码描述如下:
//输入:数据对象集合D,半径Eps,密度阈值MinPts
//输出:聚类C
DBSCAN(D, Eps, MinPts){
//未处理的当前集合
unprocessSet=null;
for each unvisited point p in D{
mark p as visited; //将p标记为已访问
N = getNeighbours (p, Eps);
unprocessSet(N);//候选集合构建
if sizeOf(N) < MinPts then
mark p as Noise; //如果满足sizeOf(N) < MinPts,则将p标记为噪声
else
C= next cluster; //建立新簇C
ExpandCluster (p, N, C, Eps, MinPts,unprocessSet);
}
}
//其中ExpandCluster算法伪码如下:
ExpandCluster(p, N, C, Eps, MinPts,unprocessSet){
add p to cluster C; //首先将核心点加入C
for each point p’ in unprocessSet N{
mark p' as visited;//标记为已经访问
N’ = getNeighbours (p’, Eps); //对N邻域内的所有点在进行半径检查
if sizeOf(N’) >= MinPts then
N = N+N’; //如果大于MinPts,就扩展N的数目
//扩大候选集
unprocessSet(N);
//如果当前不属于任何的簇,那么就将这个对象添加到当前的簇中
if p’ is not member of any cluster
add p’ to cluster C; //将p' 加入簇C
}
}
参考:百度百科:DBSCAN
###3 Weka聚类案例
####3-1 SimpleKMeans算法
weka.clusterers.SimpleKMeans
使用weather.numeric.arrf文件中的数据来测试运行结果如下:
=== Run information ===
Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10
Relation: weather
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
Test mode: evaluate on training data
=== Clustering model (full training set) ===
kMeans
======
Number of iterations: 3
Within cluster sum of squared errors: 16.237456311387238
Initial starting points (random):
Cluster 0: rainy,75,80,FALSE,yes
Cluster 1: overcast,64,65,TRUE,yes
Missing values globally replaced with mean/mode
Final cluster centroids:
Cluster#
Attribute Full Data 0 1
(14.0) (9.0) (5.0)
==============================================
outlook sunny sunny overcast
temperature 73.5714 75.8889 69.4
humidity 81.6429 84.1111 77.2
windy FALSE FALSE TRUE
play yes yes yes
Time taken to build model (full training data) : 0 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 9 ( 64%)
1 5 ( 36%)
聚类结果以表格的形式显示,行对应属性名,列对应簇中心。如果是数值属性则显示平均值,如果是标称属性,则显示簇所在列对应的属性标签。
Attribute | Full Data | 0 | 1 |
---|---|---|---|
- | (14.0) | (9.0) | (5.0) |
outlook | sunny | sunny | overcast |
temperature | 73.5714 | 75.8889 | 69.4 |
humidity | 81.6429 | 84.1111 | 77.2 |
windy | FALSE | FALSE | TRUE |
play | yes | yes | yes |
####3-2 EM算法
与上面的不同的是,这里的表头并没有显示实例的数量,只是在表头的括号内显示其先验概率。表中单元格显示数值属性正态分布的参数或者是标称属性的频率计数。小数,揭示了EM算法的“Soft”的特性,任何实例都可以在若干个簇之间分割。在输出的最后,显示了模型的对数似然值,这是相对于训练数据。
运行结果如下:
=== Run information ===
Scheme: weka.clusterers.EM -I 100 -N 2 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100
Relation: weather
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
Test mode: evaluate on training data
=== Clustering model (full training set) ===
EM
==
Number of clusters: 2
Number of iterations performed: 7
Cluster
Attribute 0 1
(0.35) (0.65)
==============================
outlook
sunny 3.8732 3.1268
overcast 1.7746 4.2254
rainy 2.1889 4.8111
[total] 7.8368 12.1632
temperature
mean 76.9173 71.8054
std. dev. 5.8302 5.8566
humidity
mean 90.1132 77.1719
std. dev. 3.8066 9.1962
windy
TRUE 3.14 4.86
FALSE 3.6967 6.3033
[total] 6.8368 11.1632
play
yes 2.1227 8.8773
no 4.7141 2.2859
[total] 6.8368 11.1632
Time taken to build model (full training data) : 0 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 4 ( 29%)
1 10 ( 71%)
//对数似然值
Log likelihood: -9.13037
####3-3 DBSCAN(具有噪声的基于密度的聚类方法)
DBSCAN使用欧式距离度量,以确定哪些实例属于同一个簇。不同于划分的方法,DBSCAN可以自动的确定簇的数量,发现任意形状的簇,并引入离群的概念。在用户指定的最小距离
ε
\varepsilon
ε 和 簇的最小值minPts的约束下,完成聚簇。某些不属于任何簇的实例,称为离群值。
OPTICS算法是DBSCAN算法在层次聚类方面的扩展。OPTICS规定了实例的顺序,这些实例进行二维可视化,揭示簇的层次结构,排序过程根据距离度量,以及在列表中彼此相邻的位置,按照顺序排列彼此最接近的实例。
OPTICS算法最后的生成结果是有顺序的可以自由选择可达距离的聚簇方法。
/OPTICS算法额外存储了每个对象的核心距离和可达距离。
基于OPTICS产生的排序信息来提取类簇。
算法描述如下:
算法:OPTICS
输入:样本集D, 邻域半径E, 给定点在E领域内成为核心对象的最小领域点数MinPts
输出:具有可达距离信息的样本点输出排序
方法:
1. 创建两个队列,有序队列和结果队列。(有序队列用来存储核心对象及其该核心对象的直接可达对象,并按可达距离升序排列;结果队列用来存储样本点的输出次序);
2. 如果所有样本集D中所有点都处理完毕,则算法结束。否则,选择一个未处理(即不在结果队列中)且为核心对象的样本点,找到其所有直接密度可达样本点,如果该样本点不存在于结果队列中,则将其放入有序队列中,并按可达距离排序;
3. 如果有序队列为空,则跳至步骤2,否则,从有序队列中取出第一个样本点(即可达距离最小的样本点)进行拓展,并将取出的样本点保存至结果队列中,如果它不存在结果队列当中的话.
3.1 判断该拓展点是否是核心对象,如果不是,回到步骤3,否则找到该拓展点所有的直接密度可达点;
3.2 判断该直接密度可达样本点是否已经存在结果队列,是则不处理,否则下一步;
3.3 如果有序队列中已经存在该直接密度可达点,如果此时新的可达距离小于旧的可达距离,则用新可达距离取代旧可达距离,有序队列重新排序;
3.4 如果有序队列中不存在该直接密度可达样本点,则插入该点,并对有序队列
重新排序;
4. 算法结束,输出结果队列中的有序样本点。
OPTICS的WEKA执行结果
=== Run information ===
Scheme: weka.clusterers.OPTICS -E 0.2 -M 5 -A "weka.core.EuclideanDistance -R first-last" -db-output .
Relation: iris
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
Ignored:
class
Test mode: evaluate on training data
=== Clustering model (full training set) ===
OPTICS clustering results
============================================================================================
Clustered DataObjects: 150
Number of attributes: 4
Epsilon: 0.2; minPoints: 5
Write results to file: no
Distance-type:
Number of generated clusters: 0
Elapsed time: .02
( 0.) 5.1,3.5,1.4,0.2 --> c_dist: 0.05 r_dist: UNDEFINED
( 17.) 5.1,3.5,1.4,0.3 --> c_dist: 0.061 r_dist: 0.05
( 39.) 5.1,3.4,1.5,0.2 --> c_dist: 0.05 r_dist: 0.05
( 4.) 5,3.6,1.4,0.2 --> c_dist: 0.071 r_dist: 0.05
( 27.) 5.2,3.5,1.5,0.2 --> c_dist: 0.053 r_dist: 0.05
( 28.) 5.2,3.4,1.4,0.2 --> c_dist: 0.058 r_dist: 0.05
( 7.) 5,3.4,1.5,0.2 --> c_dist: 0.058 r_dist: 0.05
( 40.) 5,3.5,1.3,0.3 --> c_dist: 0.068 r_dist: 0.053
( 49.) 5,3.3,1.4,0.2 --> c_dist: 0.069 r_dist: 0.053
( 11.) 4.8,3.4,1.6,0.2 --> c_dist: 0.077 r_dist: 0.058
( 35.) 5,3.2,1.2,0.2 --> c_dist: 0.083 r_dist: 0.069
( 26.) 5,3.4,1.6,0.4 --> c_dist: 0.085 r_dist: 0.073
( 20.) 5.4,3.4,1.7,0.2 --> c_dist: 0.09 r_dist: 0.075
( 24.) 4.8,3.4,1.9,0.2 --> c_dist: 0.107 r_dist: 0.077
( 6.) 4.6,3.4,1.4,0.3 --> c_dist: 0.103 r_dist: 0.077
( 34.) 4.9,3.1,1.5,0.1 --> c_dist: 0.053 r_dist: 0.083
( 12.) 4.8,3,1.4,0.1 --> c_dist: 0.053 r_dist: 0.053
( 37.) 4.9,3.1,1.5,0.1 --> c_dist: 0.053 r_dist: 0.053
( 9.) 4.9,3.1,1.5,0.1 --> c_dist: 0.053 r_dist: 0.053
( 30.) 4.8,3.1,1.6,0.2 --> c_dist: 0.053 r_dist: 0.053
( 29.) 4.7,3.2,1.6,0.2 --> c_dist: 0.053 r_dist: 0.053
( 2.) 4.7,3.2,1.3,0.2 --> c_dist: 0.071 r_dist: 0.053
( 3.) 4.6,3.1,1.5,0.2 --> c_dist: 0.06 r_dist: 0.053
( 47.) 4.6,3.2,1.4,0.2 --> c_dist: 0.058 r_dist: 0.053
( 1.) 4.9,3,1.4,0.2 --> c_dist: 0.06 r_dist: 0.053
( 42.) 4.4,3.2,1.3,0.2 --> c_dist: 0.083 r_dist: 0.058
( 25.) 5,3,1.6,0.2 --> c_dist: 0.067 r_dist: 0.06
( 45.) 4.8,3,1.4,0.3 --> c_dist: 0.083 r_dist: 0.06
( 38.) 4.4,3,1.3,0.2 --> c_dist: 0.083 r_dist: 0.077
( 13.) 4.3,3,1.1,0.1 --> c_dist: 0.123 r_dist: 0.083
( 8.) 4.4,2.9,1.4,0.2 --> c_dist: 0.126 r_dist: 0.083
( 23.) 5.1,3.3,1.7,0.5 --> c_dist: 0.128 r_dist: 0.085
( 48.) 5.3,3.7,1.5,0.2 --> c_dist: 0.088 r_dist: 0.088
( 10.) 5.4,3.7,1.5,0.2 --> c_dist: 0.1 r_dist: 0.088
( 19.) 5.1,3.8,1.5,0.3 --> c_dist: 0.081 r_dist: 0.088
( 21.) 5.1,3.7,1.5,0.4 --> c_dist: 0.095 r_dist: 0.081
( 44.) 5.1,3.8,1.9,0.4 --> c_dist: 0.099 r_dist: 0.081
( 46.) 5.1,3.8,1.6,0.2 --> c_dist: 0.095 r_dist: 0.081
( 36.) 5.5,3.5,1.3,0.2 --> c_dist: 0.095 r_dist: 0.09
( 31.) 5.4,3.4,1.5,0.4 --> c_dist: 0.103 r_dist: 0.09
( 43.) 5,3.5,1.6,0.6 --> c_dist: 0.132 r_dist: 0.093
( 5.) 5.4,3.9,1.7,0.4 --> c_dist: 0.108 r_dist: 0.099
( 18.) 5.7,3.8,1.7,0.3 --> c_dist: 0.129 r_dist: 0.108
( 16.) 5.4,3.9,1.3,0.4 --> c_dist: 0.123 r_dist: 0.108
( 22.) 4.6,3.6,1,0.2 --> c_dist: 0.143 r_dist: 0.115
( 14.) 5.8,4,1.2,0.2 --> c_dist: 0.168 r_dist: 0.129
( 32.) 5.2,4.1,1.5,0.1 --> c_dist: 0.164 r_dist: 0.136
( 33.) 5.5,4.2,1.4,0.2 --> c_dist: 0.154 r_dist: 0.154
( 15.) 5.7,4.4,1.5,0.4 --> c_dist: UNDEFINED r_dist: 0.154
(100.) 6.3,3.3,6,2.5 --> c_dist: 0.153 r_dist: UNDEFINED
(115.) 6.4,3.2,5.3,2.3 --> c_dist: 0.119 r_dist: 0.153
(136.) 6.3,3.4,5.6,2.4 --> c_dist: 0.127 r_dist: 0.119
(140.) 6.7,3.1,5.6,2.4 --> c_dist: 0.095 r_dist: 0.119
(120.) 6.9,3.2,5.7,2.3 --> c_dist: 0.108 r_dist: 0.095
(143.) 6.8,3.2,5.9,2.3 --> c_dist: 0.103 r_dist: 0.095
(145.) 6.7,3,5.2,2.3 --> c_dist: 0.114 r_dist: 0.095
(144.) 6.7,3.3,5.7,2.5 --> c_dist: 0.122 r_dist: 0.095
(124.) 6.7,3.3,5.7,2.1 --> c_dist: 0.13 r_dist: 0.103
(139.) 6.9,3.1,5.4,2.1 --> c_dist: 0.11 r_dist: 0.108
(102.) 7.1,3,5.9,2.1 --> c_dist: 0.144 r_dist: 0.11
(112.) 6.8,3,5.5,2.1 --> c_dist: 0.106 r_dist: 0.11
(104.) 6.5,3,5.8,2.2 --> c_dist: 0.114 r_dist: 0.106
(147.) 6.5,3,5.2,2 --> c_dist: 0.11 r_dist: 0.106
(141.) 6.9,3.1,5.1,2.3 --> c_dist: 0.11 r_dist: 0.11
(110.) 6.5,3.2,5.1,2 --> c_dist: 0.132 r_dist: 0.11
(116.) 6.5,3,5.5,1.8 --> c_dist: 0.11 r_dist: 0.11
(103.) 6.3,2.9,5.6,1.8 --> c_dist: 0.128 r_dist: 0.11
( 77.) 6.7,3,5,1.7 --> c_dist: 0.133 r_dist: 0.11
(137.) 6.4,3.1,5.5,1.8 --> c_dist: 0.119 r_dist: 0.11
(128.) 6.4,2.8,5.6,2.1 --> c_dist: 0.119 r_dist: 0.114
(132.) 6.4,2.8,5.6,2.2 --> c_dist: 0.141 r_dist: 0.114
(111.) 6.4,2.7,5.3,1.9 --> c_dist: 0.11 r_dist: 0.119
(123.) 6.3,2.7,4.9,1.8 --> c_dist: 0.123 r_dist: 0.11
(146.) 6.3,2.5,5,1.9 --> c_dist: 0.163 r_dist: 0.11
(126.) 6.2,2.8,4.8,1.8 --> c_dist: 0.117 r_dist: 0.117
(127.) 6.1,3,4.9,1.8 --> c_dist: 0.102 r_dist: 0.117
(138.) 6,3,4.8,1.8 --> c_dist: 0.1 r_dist: 0.102
(149.) 5.9,3,5.1,1.8 --> c_dist: 0.128 r_dist: 0.1
( 70.) 5.9,3.2,4.8,1.8 --> c_dist: 0.131 r_dist: 0.1
(148.) 6.2,3.4,5.4,2.3 --> c_dist: 0.175 r_dist: 0.119
( 83.) 6,2.7,5.1,1.6 --> c_dist: 0.129 r_dist: 0.12
(133.) 6.3,2.8,5.1,1.5 --> c_dist: 0.13 r_dist: 0.129
(134.) 6.1,2.6,5.6,1.4 --> c_dist: 0.193 r_dist: 0.129
( 54.) 6.5,2.8,4.6,1.5 --> c_dist: 0.103 r_dist: 0.13
( 58.) 6.6,2.9,4.6,1.3 --> c_dist: 0.097 r_dist: 0.103
( 74.) 6.4,2.9,4.3,1.3 --> c_dist: 0.106 r_dist: 0.097
( 75.) 6.6,3,4.4,1.4 --> c_dist: 0.083 r_dist: 0.097
( 65.) 6.7,3.1,4.4,1.4 --> c_dist: 0.103 r_dist: 0.083
( 86.) 6.7,3.1,4.7,1.5 --> c_dist: 0.099 r_dist: 0.083
( 76.) 6.8,2.8,4.8,1.4 --> c_dist: 0.136 r_dist: 0.097
( 52.) 6.9,3.1,4.9,1.5 --> c_dist: 0.11 r_dist: 0.099
( 51.) 6.4,3.2,4.5,1.5 --> c_dist: 0.11 r_dist: 0.099
( 50.) 7,3.2,4.7,1.4 --> c_dist: 0.148 r_dist: 0.102
( 97.) 6.2,2.9,4.3,1.3 --> c_dist: 0.084 r_dist: 0.106
( 63.) 6.1,2.9,4.7,1.4 --> c_dist: 0.093 r_dist: 0.084
( 71.) 6.1,2.8,4,1.3 --> c_dist: 0.112 r_dist: 0.084
( 91.) 6.1,3,4.6,1.4 --> c_dist: 0.097 r_dist: 0.084
( 78.) 6,2.9,4.5,1.5 --> c_dist: 0.106 r_dist: 0.093
( 73.) 6.1,2.8,4.7,1.2 --> c_dist: 0.123 r_dist: 0.093
( 61.) 5.9,3,4.2,1.5 --> c_dist: 0.108 r_dist: 0.097
( 66.) 5.6,3,4.5,1.5 --> c_dist: 0.11 r_dist: 0.108
( 96.) 5.7,2.9,4.2,1.3 --> c_dist: 0.066 r_dist: 0.108
( 55.) 5.7,2.8,4.5,1.3 --> c_dist: 0.106 r_dist: 0.066
( 88.) 5.6,3,4.1,1.3 --> c_dist: 0.094 r_dist: 0.066
( 95.) 5.7,3,4.2,1.2 --> c_dist: 0.106 r_dist: 0.066
( 99.) 5.7,2.8,4.1,1.3 --> c_dist: 0.073 r_dist: 0.066
( 82.) 5.8,2.7,3.9,1.2 --> c_dist: 0.09 r_dist: 0.073
( 94.) 5.6,2.7,4.2,1.3 --> c_dist: 0.086 r_dist: 0.073
( 90.) 5.5,2.6,4.4,1.2 --> c_dist: 0.107 r_dist: 0.086
( 92.) 5.8,2.6,4,1.2 --> c_dist: 0.095 r_dist: 0.088
( 67.) 5.8,2.7,4.1,1 --> c_dist: 0.114 r_dist: 0.09
( 89.) 5.5,2.5,4,1.3 --> c_dist: 0.094 r_dist: 0.094
( 53.) 5.5,2.3,4,1.3 --> c_dist: 0.141 r_dist: 0.094
( 69.) 5.6,2.5,3.9,1.1 --> c_dist: 0.089 r_dist: 0.094
( 80.) 5.5,2.4,3.8,1.1 --> c_dist: 0.099 r_dist: 0.089
( 81.) 5.5,2.4,3.7,1 --> c_dist: 0.141 r_dist: 0.089
( 79.) 5.7,2.6,3.5,1 --> c_dist: 0.119 r_dist: 0.094
( 64.) 5.6,2.9,3.6,1.3 --> c_dist: 0.12 r_dist: 0.094
( 84.) 5.4,3,4.5,1.5 --> c_dist: 0.144 r_dist: 0.11
( 56.) 6.3,3.3,4.7,1.6 --> c_dist: 0.146 r_dist: 0.11
( 59.) 5.2,2.7,3.9,1.4 --> c_dist: 0.154 r_dist: 0.126
( 72.) 6.3,2.5,4.9,1.5 --> c_dist: 0.145 r_dist: 0.13
( 85.) 6,3.4,4.5,1.6 --> c_dist: 0.181 r_dist: 0.131
(142.) 5.8,2.7,5.1,1.9 --> c_dist: 0.135 r_dist: 0.135
(101.) 5.8,2.7,5.1,1.9 --> c_dist: 0.135 r_dist: 0.135
(113.) 5.7,2.5,5,2 --> c_dist: 0.172 r_dist: 0.135
(121.) 5.6,2.8,4.9,2 --> c_dist: 0.148 r_dist: 0.135
( 68.) 6.2,2.2,4.5,1.5 --> c_dist: UNDEFINED r_dist: 0.145
( 87.) 6.3,2.3,4.4,1.3 --> c_dist: 0.17 r_dist: 0.145
(130.) 7.4,2.8,6.1,1.9 --> c_dist: 0.155 r_dist: 0.148
(108.) 6.7,2.5,5.8,1.8 --> c_dist: UNDEFINED r_dist: 0.151
(119.) 6,2.2,5,1.5 --> c_dist: UNDEFINED r_dist: 0.151
(125.) 7.2,3.2,6,1.8 --> c_dist: 0.181 r_dist: 0.154
(105.) 7.6,3,6.6,2.1 --> c_dist: 0.164 r_dist: 0.155
(107.) 7.3,2.9,6.3,1.8 --> c_dist: 0.158 r_dist: 0.155
(122.) 7.7,2.8,6.7,2 --> c_dist: 0.16 r_dist: 0.155
(129.) 7.2,3,5.8,1.6 --> c_dist: 0.184 r_dist: 0.158
( 93.) 5,2.3,3.3,1 --> c_dist: 0.16 r_dist: 0.16
( 57.) 4.9,2.4,3.3,1 --> c_dist: 0.18 r_dist: 0.16
( 60.) 5,2,3.5,1 --> c_dist: UNDEFINED r_dist: 0.16
( 98.) 5.1,2.5,3,1.1 --> c_dist: 0.18 r_dist: 0.16
(118.) 7.7,2.6,6.9,2.3 --> c_dist: UNDEFINED r_dist: 0.16
(135.) 7.7,3,6.1,2.3 --> c_dist: UNDEFINED r_dist: 0.164
( 62.) 6,2.2,4,1 --> c_dist: 0.173 r_dist: 0.17
(114.) 5.8,2.8,5.1,2.4 --> c_dist: UNDEFINED r_dist: 0.179
(109.) 7.2,3.6,6.1,2.5 --> c_dist: UNDEFINED r_dist: 0.199
(106.) 4.9,2.5,4.5,1.7 --> c_dist: UNDEFINED r_dist: 0.2
(117.) 7.7,3.8,6.7,2.2 --> c_dist: UNDEFINED r_dist: UNDEFINED
(131.) 7.9,3.8,6.4,2 --> c_dist: UNDEFINED r_dist: UNDEFINED
( 41.) 4.5,2.3,1.3,0.3 --> c_dist: UNDEFINED r_dist: UNDEFINED
Time taken to build model (full training data) : 0.17 seconds
=== Model and evaluation on training set ===
Clustered Instances
Unclustered instances : 150
可以比DBSCAN传递出更多的层次化聚类的信息。