R语言实现k均值聚类_r语言k均值聚类代码-CSDN博客

本文链接：https://blog.csdn.net/qq_43684686/article/details/105586973

 cereal<-read("cereal.DAT")
 cereal1<-cereal[,c(3,4,5,6,7,8,9,10)]
cereal2=scale(cereal1[,-6])
x<-dist(cereal2,method="euclidean")
 plot(cs<-hclust(x,method="single"),hang=-1,main="Single",xlab="Names")
> plot(cs<-hclust(x,method="complete"),hang=-1,main="Complete",xlab="Names")
 kmeans_2<-kmeans(cereal2,centers=2)
> kmeans_3<-kmeans(cereal2,centers=3)
> kmeans_4<-kmeans(cereal2,centers=4)
> kmeans_2
K-means clustering with 2 clusters of sizes 15, 28
Cluster means:
          V3         V4         V5          V6         V7          V9
1  0.3212195  0.9834243  0.6113040  0.06987299  0.9259502 -0.13329153
2 -0.1720819 -0.5268344 -0.3274843 -0.03743196 -0.4960447  0.07140617
         V10
1  1.0172199
2 -0.5449392
Clustering vector:   # 分组的索引
 [1] 2 1 2 2 2 2 2 2 2 1 1 2 1 1 2 1 2 1 2 2 2 1 2 2 2 2 1 2 1 2 1 2 2 1 2 2
[37] 1 2 2 1 2 2 1
Within cluster sum of squares by cluster:
[1] 100.1913 116.4371
 (between_SS / total_SS =  26.3 %)   # 组间的平方和/总平方和，用于衡量点聚集程度
 Available components:[1] "cluster"      "centers"      "totss"        "withinss"    
[5] "tot.withinss" "betweenss"    "size"         "iter"        
[9] "ifault"      
> kmeans_3
K-means clustering with 3 clusters of sizes 14, 17, 12
Cluster means:     # 中心点坐标
          V3          V4         V5           V6         V7          V9        V10
1 -0.6427892  0.08701265 -0.7731117  0.066266097 -0.1983071 -1.03081221 -0.3693651
2  0.2033765 -0.71769444  0.0290176 -0.050427365 -0.5929554  0.78738501 -0.5740162
3  0.4618041  0.91521903  0.8608553 -0.005871679  1.0713785  0.08715215  1.2441156
Clustering vector:
 [1] 2 3 2 2 2 2 1 2 1 3 3 1 3 1 2 1 2 3 2 1 2 3 1 2 2 1 3 2 3 2 3 1 1 3 1 2 1 2 2 3 1 1 3
 Within cluster sum of squares by cluster:     # withinss，分组内平方和  
[1] 67.37272 28.64366 80.74875
 (between_SS / total_SS =  39.9 %)
 Available components:[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"      
> kmeans_4
K-means clustering with 4 clusters of sizes 4, 14, 17, 8
Cluster means:
          V3          V4         V5          V6         V7         V9        V10
1  0.2421406  0.64239810 -0.2829216  0.56221331  2.2431414  0.6933723  2.4290156
2 -0.6427892  0.08701265 -0.7731117  0.06626610 -0.1983071 -1.0308122 -0.3693651
3  0.2033765 -0.71769444  0.0290176 -0.05042736 -0.5929554  0.7873850 -0.5740162
4  0.5716358  1.05162949  1.4327438 -0.28991417  0.4854970 -0.2159579  0.6516657
Clustering vector:
 [1] 3 4 3 3 3 3 2 3 2 4 4 2 1 2 3 2 3 1 3 2 3 4 2 3 3 2 1 3 4 3 4 2 2 1 2 3 2 3 3 4 2 2 4
 Within cluster sum of squares by cluster:   # withinss，分组内平方和  
[1] 18.43124 67.37272 28.64366 32.92862
 (between_SS / total_SS =  49.9 %)  
 Available components:[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"

对象属性解读：
cluster，每个点的分组
centers，聚类的中心点坐标
totss，总平方和
withinss，每个分组内的平方和
tot.withinss，分组总和，sum(withinss)
betweenss，组间的平方和，totss – tot.withinss
size，每个组中的数据点数量
iter，迭代次数。
ifault，可能有问题的指标