实例1: 基于随机生成序列的动态聚类
set.seed(1234)
dat<-rbind(matrix(rnorm(100,mean=0,sd=0.2),ncol=2),
matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
colnames(dat)<-c("x","y")
plot(dat)
上图中很明显点大致地分为两类,下面对其进行K-means聚类:
(kmeans.1<-kmeans(dat,2)) # 将原始数据聚为两类
-------结果-------
K-means clustering with 2 clusters of sizes 50, 50
Cluster means:
x y
1 -0.0906106 0.02790591
2 1.0059934 1.01875248
Clustering vector:
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[29] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
[57] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[85] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Within cluster sum of squares by cluster:
[1] 3.643813 9.488740
(between_SS / total_SS = 80.6 %)
绘制聚类结果图,并标注出类质心点:
plot(dat,col=kmeans.1$cluster,main="聚成两类")
points(kmeans.1$centers,col=3:4,pch=8,cex=2)
· 此外,可以尝试聚成三类:
(kmeans.2<-kmeans(dat,3))
plot(dat,col=kmeans.2$cluster,main="聚成三类")
points(kmeans.2$centers,col=3:5,pch=8,cex=2)
----------结果------------
K-means clustering with 3 clusters of sizes 31, 50, 19
Within cluster sum of squares by cluster:
[1] 3.439893 3.643813 2.320827
(between_SS / total_SS = 86.1 %)
两种聚类结果都合理,为比较两者的优劣,引入新的指标进行衡量: