cereal<-read("cereal.DAT")
cereal1<-cereal[,c(3,4,5,6,7,8,9,10)]
cereal2=scale(cereal1[,-6])
x<-dist(cereal2,method="euclidean")
plot(cs<-hclust(x,method="single"),hang=-1,main="Single",xlab="Names")
> plot(cs<-hclust(x,method="complete"),hang=-1,main="Complete",xlab="Names")
kmeans_2<-kmeans(cereal2,centers=2)
> kmeans_3<-kmeans(cereal2,centers=3)
> kmeans_4<-kmeans(cereal2,centers=4)
> kmeans_2
K-means clustering with 2 clusters of sizes 15, 28
Cluster means:
V3 V4 V5 V6 V7 V9
1 0.3212195 0.9834243 0.6113040 0.06987299 0.9259502 -0.13329153
2 -0.1720819 -0.5268344 -0.3274843 -0.03743196 -0.4960447 0.07140617
V10
1 1.0172199
2 -0.5449392
Clustering vector: # 分组的索引
[1] 2 1 2 2 2 2 2 2 2 1 1 2 1 1 2 1 2 1 2 2 2 1 2 2 2 2 1 2 1 2 1 2 2 1 2 2
[37] 1 2 2 1 2 2 1
Within cluster sum of squares by cluster:
[1] 100.1913 116.4371
(between_SS / total_SS = 26.3 %) # 组间的平方和/总平方和,用于衡量点聚集程度
Available components:[1] "cluster" "centers" "totss" "withinss"
[5] "tot.withinss" "betweenss" "size" "iter"
[9] "ifault"
> kmeans_3
K-means clustering with 3 clusters of sizes 14, 17, 12
Cluster means: # 中心点坐标
V3 V4 V5 V6 V7 V9 V10
1 -0.6427892 0.08701265 -0.7731117 0.066266097 -0.1983071 -1.03081221 -0.3693651
2 0.2033765 -0.71769444 0.0290176 -0.050427365 -0.5929554 0.78738501 -0.5740162
3 0.4618041 0.91521903 0.8608553 -0.005871679 1.0713785 0.08715215 1.2441156
Clustering vector:
[1] 2 3 2 2 2 2 1 2 1 3 3 1 3 1 2 1 2 3 2 1 2 3 1 2 2 1 3 2 3 2 3 1 1 3 1 2 1 2 2 3 1 1 3
Within cluster sum of squares by cluster: # withinss,分组内平方和
[1] 67.37272 28.64366 80.74875
(between_SS / total_SS = 39.9 %)
Available components:[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size" "iter" "ifault"
> kmeans_4
K-means clustering with 4 clusters of sizes 4, 14, 17, 8
Cluster means:
V3 V4 V5 V6 V7 V9 V10
1 0.2421406 0.64239810 -0.2829216 0.56221331 2.2431414 0.6933723 2.4290156
2 -0.6427892 0.08701265 -0.7731117 0.06626610 -0.1983071 -1.0308122 -0.3693651
3 0.2033765 -0.71769444 0.0290176 -0.05042736 -0.5929554 0.7873850 -0.5740162
4 0.5716358 1.05162949 1.4327438 -0.28991417 0.4854970 -0.2159579 0.6516657
Clustering vector:
[1] 3 4 3 3 3 3 2 3 2 4 4 2 1 2 3 2 3 1 3 2 3 4 2 3 3 2 1 3 4 3 4 2 2 1 2 3 2 3 3 4 2 2 4
Within cluster sum of squares by cluster: # withinss,分组内平方和
[1] 18.43124 67.37272 28.64366 32.92862
(between_SS / total_SS = 49.9 %)
Available components:[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size" "iter" "ifault"
对象属性解读:
cluster,每个点的分组
centers,聚类的中心点坐标
totss,总平方和
withinss,每个分组内的平方和
tot.withinss,分组总和,sum(withinss)
betweenss,组间的平方和,totss – tot.withinss
size,每个组中的数据点数量
iter,迭代次数。
ifault,可能有问题的指标