R聚类分析@;

原始数据:13个品牌的鼠标参数,如下表。
brand Touch Chips Driver Compatibility Game
1 Brand1 7.5 17.5 7.0 8.0 8.0
2 Brand2 7.5 19.5 7.0 7.0 9.0
3 Brand3 8.5 18.0 8.5 8.0 9.5
4 Brand4 9.0 18.5 8.5 8.0 9.5
5 Brand5 7.0 14.0 6.5 7.0 7.5
6 Brand6 7.0 16.0 6.5 7.5 8.0
7 Brand7 7.5 17.0 8.0 7.5 8.0
8 Brand8 8.0 17.5 8.5 7.5 8.5
9 Brand9 7.0 16.5 6.0 8.0 7.0
10 Brand10 7.5 17.0 7.5 8.5 8.0
11 Brand11 8.0 16.0 6.5 7.0 7.0
12 Brand12 7.0 15.5 6.0 8.0 7.0
13 Brand13 7.5 17.0 8.0 7.0 7.0

一、层次聚类分析:
这里采用Ward方法。
步骤如下:
1、数据读入,

> m<-read.csv(choose.files())

2、选择数据,

> library(dplyr)
> m<-select(m,Touch:Game)
> m
   Touch Chips Driver Compatibility Game
1    7.5  17.5    7.0           8.0  8.0
2    7.5  19.5    7.0           7.0  9.0
3    8.5  18.0    8.5           8.0  9.5
4    9.0  18.5    8.5           8.0  9.5
5    7.0  14.0    6.5           7.0  7.5
6    7.0  16.0    6.5           7.5  8.0
7    7.5  17.0    8.0           7.5  8.0
8    8.0  17.5    8.5           7.5  8.5
9    7.0  16.5    6.0           8.0  7.0
10   7.5  17.0    7.5           8.5  8.0
11   8.0  16.0    6.5           7.0  7.0
12   7.0  15.5    6.0           8.0  7.0
13   7.5  17.0    8.0           7.0  7.0

3、数据标准化

> m.s<-scale(m)
> m.s
           Touch       Chips     Driver Compatibility       Game
 [1,] -0.1868322  0.41293963 -0.2836931     0.7595545  0.0000000
 [2,] -0.1868322  1.84446367 -0.2836931    -1.2152872  1.0954451
 [3,]  1.4323802  0.77082064  1.2968829     0.7595545  1.6431677
 [4,]  2.2419864  1.12870165  1.2968829     0.7595545  1.6431677
 [5,] -0.9964384 -2.09222745 -0.8105518    -1.2152872 -0.5477226
 [6,] -0.9964384 -0.66070340 -0.8105518    -0.2278664  0.0000000
 [7,] -0.1868322  0.05505862  0.7700242    -0.2278664  0.0000000
 [8,]  0.6227740  0.41293963  1.2968829    -0.2278664  0.5477226
 [9,] -0.9964384 -0.30282239 -1.3374105     0.7595545 -1.0954451
[10,] -0.1868322  0.05505862  0.2431655     1.7469754  0.0000000
[11,]  0.6227740 -0.66070340 -0.8105518    -1.2152872 -1.0954451
[12,] -0.9964384 -1.01858441 -1.3374105     0.7595545 -1.0954451
[13,] -0.1868322  0.05505862  0.7700242    -1.2152872 -1.0954451
attr(,"scaled:center")
        Touch         Chips        Driver Compatibility          Game 
     7.615385     16.923077      7.269231      7.615385      8.000000 
attr(,"scaled:scale")
        Touch         Chips        Driver Compatibility          Game 
    0.6175842     1.3971124     0.9490211     0.5063697     0.9128709 

4、计算相异度

> m.dist<-dist(m.s) #返回包含相异度信息的特殊类,传给下面的hclust()函数做聚类分析,必不可少的步骤,
> class(m.dist)
[1] "dist"
> m.dist
           1         2         3         4         5         6         7         8         9        10        11        12
2  2.6738102                                                                                                              
3  2.8192460 3.2361673                                                                                                    
4  3.4073003 3.6207314 0.8851785                                                                                          
5  3.3777070 4.3738518 5.2189544 5.8312143                                                                                
6  1.7495004 3.0633159 4.0080796 4.6696517 1.8232556                                                                      
7  1.4877497 2.5470160 2.6620939 3.3172954 3.0065795 1.9146796                                                            
8  2.1044436 2.5451805 1.7200410 2.3041623 3.9387805 2.9181911 1.1666709                                                  
9  1.8649659 3.8828623 4.6358786 5.1938577 2.7712002 1.6064429 2.7202614 3.7079193                                        
10 1.1750144 3.6680445 2.8141578 3.4405217 3.9308549 2.4855779 2.0439129 2.4685747 2.3359712                              
11 2.6806252 3.4653866 4.3065072 4.6545784 2.2295986 2.1901709 2.4168157 3.0445017 2.6320159 3.5004425                    
12 2.2394293 4.3200494 4.8518689 5.4348628 2.3728231 1.6064429 2.9024908 3.9096818 0.7157620 2.5458578 2.6320159          
13 2.5176177 3.0186571 3.8486029 4.3277534 2.8398099 2.4168157 1.4747881 2.1762631 3.0207321 3.2019650 1.9146796 3.1858206

转化成矩阵,结果同上,对角线为0。

> as.matrix(m.dist)
          1        2         3         4        5        6        7        8        9       10       11       12       13
1  0.000000 2.673810 2.8192460 3.4073003 3.377707 1.749500 1.487750 2.104444 1.864966 1.175014 2.680625 2.239429 2.517618
2  2.673810 0.000000 3.2361673 3.6207314 4.373852 3.063316 2.547016 2.545181 3.882862 3.668045 3.465387 4.320049 3.018657
3  2.819246 3.236167 0.0000000 0.8851785 5.218954 4.008080 2.662094 1.720041 4.635879 2.814158 4.306507 4.851869 3.848603
4  3.407300 3.620731 0.8851785 0.0000000 5.831214 4.669652 3.317295 2.304162 5.193858 3.440522 4.654578 5.434863 4.327753
5  3.377707 4.373852 5.2189544 5.8312143 0.000000 1.823256 3.006579 3.938781 2.771200 3.930855 2.229599 2.372823 2.839810
6  1.749500 3.063316 4.0080796 4.6696517 1.823256 0.000000 1.914680 2.918191 1.606443 2.485578 2.190171 1.606443 2.416816
7  1.487750 2.547016 2.6620939 3.3172954 3.006579 1.914680 0.000000 1.166671 2.720261 2.043913 2.416816 2.902491 1.474788
8  2.104444 2.545181 1.7200410 2.3041623 3.938781 2.918191 1.166671 0.000000 3.707919 2.468575 3.044502 3.909682 2.176263
9  1.864966 3.882862 4.6358786 5.1938577 2.771200 1.606443 2.720261 3.707919 0.000000 2.335971 2.632016 0.715762 3.020732
10 1.175014 3.668045 2.8141578 3.4405217 3.930855 2.485578 2.043913 2.468575 2.335971 0.000000 3.500443 2.545858 3.201965
11 2.680625 3.465387 4.3065072 4.6545784 2.229599 2.190171 2.416816 3.044502 2.632016 3.500443 0.000000 2.632016 1.914680
12 2.239429 4.320049 4.8518689 5.4348628 2.372823 1.606443 2.902491 3.909682 0.715762 2.545858 2.632016 0.000000 3.185821
13 2.517618 3.018657 3.8486029 4.3277534 2.839810 2.416816 1.474788 2.176263 3.020732 3.201965 1.914680 3.185821 0.000000

5、执行层次聚类分析

> m.h<-hclust(m.dist,method='ward.D2')
> class(m.h)
[1] "hclust"
> m.h

Call:
hclust(d = m.dist, method = "ward.D2")

Cluster method   : ward.D2 
Distance         : euclidean 
Number of objects: 13 

> str(m.h) #structrue函数
List of 7
 $ merge      : int [1:12, 1:2] -9 -3 -7 -1 -6 -11 3 -5 -2 6 ...
 $ height     : num [1:12] 0.716 0.885 1.167 1.175 1.808 ...
 $ order      : int [1:13] 11 13 5 6 9 12 3 4 2 7 ...
 $ labels     : NULL
 $ method     : chr "ward.D2"
 $ call       : language hclust(d = m.dist, method = "ward.D2")
 $ dist.method: chr "euclidean"
 - attr(*, "class")= chr "hclust"
> m.h$merge
      [,1] [,2]
 [1,]   -9  -12
 [2,]   -3   -4
 [3,]   -7   -8
 [4,]   -1  -10
 [5,]   -6    1
 [6,]  -11  -13
 [7,]    3    4
 [8,]   -5    5
 [9,]   -2    7
[10,]    6    8
[11,]    2    9
[12,]   10   11

6、树状图(Dendomgram)呈现

> m.cnum<-cutree(m.h,k=4)
> m.cnum
 [1] 1 1 2 2 3 3 1 1 3 1 4 3 4
> plot(m.h)![在这里插入图片描述](https://img-blog.csdnimg.cn/2019122012043216.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTM4NzE2MA==,size_16,color_FFFFFF,t_70)

在这里插入图片描述

> rect.hclust(m.h,k=3)

在这里插入图片描述

7、可视化进阶(待完善)

> install.packages('dendextend')
> library(dendextend).
> m.dend<-as.dendrogram(m.h) # 聚类分析结果转化为dendrogram类
> install.packages('colorspace') #
> library(colorspace)
> m.dend<-color_branches(m.dend,k=4)
> m.dend
'dendrogram' with 2 branches and 13 members total, at height 7.174631 
> class(m.dend)
[1] "dendrogram"
> labels_colors(m.dend)<-rainbow_hcl(4)[sort_levels_values(m.cnum[order.dendrogram(m.dend)])]#使品牌标签变为与各自聚类相同的颜色
> plot(m.dend)

在这里插入图片描述

> install.packages('gplots') #绘制热图捕捉各个聚类的视觉特征,使用heatmap.2()函数
> library(gplots)
> heatmap.2(m.s,main='不同品牌鼠标属性热 图',dendrogram='row',Rowv=m.dend,Colv=FALSE,col=colorspace::diverge_hcl(256),RowSideColors=labels_colors(m.dend)[order(order.dendrogram(m.dend))]) #注意参数m.s,是用来做聚类分析的标准化数据,并非dist()函数转化的结果。

在这里插入图片描述

二、非层次聚类分析步骤:(为何结果每次不同)
这里采用K-means方法。步骤如下:

> set.seed(100)
> m.km<-kmeans(m.s,center=4)
> m.km
K-means clustering with 4 clusters of sizes 2, 4, 5, 2

Cluster means:
        Touch      Chips     Driver Compatibility       Game
1 -0.18683220  0.2339991 -0.0202638     1.2532650  0.0000000
2  0.01556935  0.5918801  0.6383096    -0.7215768  0.1369306
3 -0.67259591 -0.9470082 -1.0212953    -0.2278664 -0.7668116
4  1.83718328  0.9497611  1.2968829     0.7595545  1.6431677

Clustering vector:
 [1] 1 2 4 4 3 3 2 2 3 1 3 3 2

Within cluster sum of squares by cluster:
[1] 0.6903294 7.5874419 9.1861416 0.3917705
 (between_SS / total_SS =  70.2 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"      
> m.km$cluster
 [1] 1 2 4 4 3 3 2 2 3 1 3 3 2
> tapply(row.names(m),m.km$cluster,unique) #显示各聚类包含的鼠标品牌
$`1`
[1] "1"  "10"

$`2`
[1] "2"  "7"  "8"  "13"

$`3`
[1] "5"  "6"  "9"  "11" "12"

$`4`
[1] "3" "4"
> install.packages('GGally')
> library(GGally)
> m.km.df<-as.data.frame(cbind(m.s,cluster=m.km$cluster))
> m.km.df
        Touch       Chips     Driver Compatibility       Game cluster
1  -0.1868322  0.41293963 -0.2836931     0.7595545  0.0000000       1
2  -0.1868322  1.84446367 -0.2836931    -1.2152872  1.0954451       2
3   1.4323802  0.77082064  1.2968829     0.7595545  1.6431677       4
4   2.2419864  1.12870165  1.2968829     0.7595545  1.6431677       4
5  -0.9964384 -2.09222745 -0.8105518    -1.2152872 -0.5477226       3
6  -0.9964384 -0.66070340 -0.8105518    -0.2278664  0.0000000       3
7  -0.1868322  0.05505862  0.7700242    -0.2278664  0.0000000       2
8   0.6227740  0.41293963  1.2968829    -0.2278664  0.5477226       2
9  -0.9964384 -0.30282239 -1.3374105     0.7595545 -1.0954451       3
10 -0.1868322  0.05505862  0.2431655     1.7469754  0.0000000       1
11  0.6227740 -0.66070340 -0.8105518    -1.2152872 -1.0954451       3
12 -0.9964384 -1.01858441 -1.3374105     0.7595545 -1.0954451       3
13 -0.1868322  0.05505862  0.7700242    -1.2152872 -1.0954451       2
> class(m.km.df)
[1] "data.frame"
> m.km.df$cluster<-factor(m.km.df$cluster)
> ggparcoord(m.km.df,columns=1:6,groupColumn='cluster')+facet_wrap(~cluster)+theme_bw()

在这里插入图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值