R聚类分析@;

最新推荐文章于 2024-06-13 09:47:29 发布

倚东

最新推荐文章于 2024-06-13 09:47:29 发布

阅读量495

点赞数

分类专栏： R语言数据分析及挖掘数据分析文章标签：数据分析

本文链接：https://blog.csdn.net/weixin_45387160/article/details/103334518

版权

数据分析同时被 3 个专栏收录

9 篇文章 2 订阅

订阅专栏

R语言

8 篇文章 1 订阅

订阅专栏

数据分析及挖掘

5 篇文章 1 订阅

订阅专栏

原始数据：13个品牌的鼠标参数，如下表。
brand Touch Chips Driver Compatibility Game
1 Brand1 7.5 17.5 7.0 8.0 8.0
2 Brand2 7.5 19.5 7.0 7.0 9.0
3 Brand3 8.5 18.0 8.5 8.0 9.5
4 Brand4 9.0 18.5 8.5 8.0 9.5
5 Brand5 7.0 14.0 6.5 7.0 7.5
6 Brand6 7.0 16.0 6.5 7.5 8.0
7 Brand7 7.5 17.0 8.0 7.5 8.0
8 Brand8 8.0 17.5 8.5 7.5 8.5
9 Brand9 7.0 16.5 6.0 8.0 7.0
10 Brand10 7.5 17.0 7.5 8.5 8.0
11 Brand11 8.0 16.0 6.5 7.0 7.0
12 Brand12 7.0 15.5 6.0 8.0 7.0
13 Brand13 7.5 17.0 8.0 7.0 7.0

一、层次聚类分析：
这里采用Ward方法。
步骤如下:
1、数据读入，

> m<-read.csv(choose.files())

2、选择数据，

> library(dplyr)
> m<-select(m,Touch:Game)
> m
   Touch Chips Driver Compatibility Game
1    7.5  17.5    7.0           8.0  8.0
2    7.5  19.5    7.0           7.0  9.0
3    8.5  18.0    8.5           8.0  9.5
4    9.0  18.5    8.5           8.0  9.5
5    7.0  14.0    6.5           7.0  7.5
6    7.0  16.0    6.5           7.5  8.0
7    7.5  17.0    8.0           7.5  8.0
8    8.0  17.5    8.5           7.5  8.5
9    7.0  16.5    6.0           8.0  7.0
10   7.5  17.0    7.5           8.5  8.0
11   8.0  16.0    6.5           7.0  7.0
12   7.0  15.5    6.0           8.0  7.0
13   7.5  17.0    8.0           7.0  7.0

3、数据标准化

> m.s<-scale(m)
> m.s
           Touch       Chips     Driver Compatibility       Game
 [1,] -0.1868322  0.41293963 -0.2836931     0.7595545  0.0000000
 [2,] -0.1868322  1.84446367 -0.2836931    -1.2152872  1.0954451
 [3,]  1.4323802  0.77082064  1.2968829     0.7595545  1.6431677
 [4,]  2.2419864  1.12870165  1.2968829     0.7595545  1.6431677
 [5,] -0.9964384 -2.09222745 -0.8105518    -1.2152872 -0.5477226
 [6,] -0.9964384 -0.66070340 -0.8105518    -0.2278664  0.0000000
 [7,] -0.1868322  0.05505862  0.7700242    -0.2278664  0.0000000
 [8,]  0.6227740  0.41293963  1.2968829    -0.2278664  0.5477226
 [9,] -0.9964384 -0.30282239 -1.3374105     0.7595545 -1.0954451
[10,] -0.1868322  0.05505862  0.2431655     1.7469754  0.0000000
[11,]  0.6227740 -0.66070340 -0.8105518    -1.2152872 -1.0954451
[12,] -0.9964384 -1.01858441 -1.3374105     0.7595545 -1.0954451
[13,] -0.1868322  0.05505862  0.7700242    -1.2152872 -1.0954451
attr(,"scaled:center")
        Touch         Chips        Driver Compatibility          Game 
     7.615385     16.923077      7.269231      7.615385      8.000000 
attr(,"scaled:scale")
        Touch         Chips        Driver Compatibility          Game 
    0.6175842     1.3971124     0.9490211     0.5063697     0.9128709

4、计算相异度

> m.dist<-dist(m.s) #返回包含相异度信息的特殊类，传给下面的hclust（）函数做聚类分析，必不可少的步骤，
> class(m.dist)
[1] "dist"
> m.dist
           1         2         3         4         5         6         7         8         9        10        11        12
2  2.6738102                                                                                                              
3  2.8192460 3.2361673                                                                                                    
4  3.4073003 3.6207314 0.8851785                                                                                          
5  3.3777070 4.3738518 5.2189544 5.8312143                                                                                
6  1.7495004 3.0633159 4.0080796 4.6696517 1.8232556                                                                      
7  1.4877497 2.5470160 2.6620939 3.3172954 3.0065795 1.9146796                                                            
8  2.1044436 2.5451805 1.7200410 2.3041623 3.9387805 2.9181911 1.1666709                                                  
9  1.8649659 3.8828623 4.6358786 5.1938577 2.7712002 1.6064429 2.7202614 3.7079193                                        
10 1.1750144 3.6680445 2.8141578 3.4405217 3.9308549 2.4855779 2.0439129 2.4685747 2.3359712                              
11 2.6806252 3.4653866 4.3065072 4.6545784 2.2295986 2.1901709 2.4168157 3.0445017 2.6320159 3.5004425                    
12 2.2394293 4.3200494 4.8518689 5.4348628 2.3728231 1.6064429 2.9024908 3.9096818 0.7157620 2.5458578 2.6320159          
13 2.5176177 3.0186571 3.8486029 4.3277534 2.8398099 2.4168157 1.4747881 2.1762631 3.0207321 3.2019650 1.9146796 3.1858206

转化成矩阵,结果同上，对角线为0。

> as.matrix(m.dist)
          1        2         3         4        5        6        7        8        9       10       11       12       13
1  0.000000 2.673810 2.8192460 3.4073003 3.377707 1.749500 1.487750 2.104444 1.864966 1.175014 2.680625 2.239429 2.517618
2  2.673810 0.000000 3.2361673 3.6207314 4.373852 3.063316 2.547016 2.545181 3.882862 3.668045 3.465387 4.320049 3.018657
3  2.819246 3.236167 0.0000000 0.8851785 5.218954 4.008080 2.662094 1.720041 4.635879 2.814158 4.306507 4.851869 3.848603
4  3.407300 3.620731 0.8851785 0.0000000 5.831214 4.669652 3.317295 2.304162 5.193858 3.440522 4.654578 5.434863 4.327753
5  3.377707 4.373852 5.2189544 5.8312143 0.000000 1.823256 3.006579 3.938781 2.771200 3.930855 2.229599 2.372823 2.839810
6  1.749500 3.063316 4.0080796 4.6696517 1.823256 0.000000 1.914680 2.918191 1.606443 2.485578 2.190171 1.606443 2.416816
7  1.487750 2.547016 2.6620939 3.3172954 3.006579 1.914680 0.000000 1.166671 2.720261 2.043913 2.416816 2.902491 1.474788
8  2.104444 2.545181 1.7200410 2.3041623 3.938781 2.918191 1.166671 0.000000 3.707919 2.468575 3.044502 3.909682 2.176263
9  1.864966 3.882862 4.6358786 5.1938577 2.771200 1.606443 2.720261 3.707919 0.000000 2.335971 2.632016 0.715762 3.020732
10 1.175014 3.668045 2.8141578 3.4405217 3.930855 2.485578 2.043913 2.468575 2.335971 0.000000 3.500443 2.545858 3.201965
11 2.680625 3.465387 4.3065072 4.6545784 2.229599 2.190171 2.416816 3.044502 2.632016 3.500443 0.000000 2.632016 1.914680
12 2.239429 4.320049 4.8518689 5.4348628 2.372823 1.606443 2.902491 3.909682 0.715762 2.545858 2.632016 0.000000 3.185821
13 2.517618 3.018657 3.8486029 4.3277534 2.839810 2.416816 1.474788 2.176263 3.020732 3.201965 1.914680 3.185821 0.000000

5、执行层次聚类分析

> m.h<-hclust(m.dist,method='ward.D2')
> class(m.h)
[1] "hclust"
> m.h

Call:
hclust(d = m.dist, method = "ward.D2")

Cluster method   : ward.D2 
Distance         : euclidean 
Number of objects: 13 

> str(m.h) #structrue函数
List of 7
 $ merge      : int [1:12, 1:2] -9 -3 -7 -1 -6 -11 3 -5 -2 6 ...
 $ height     : num [1:12] 0.716 0.885 1.167 1.175 1.808 ...
 $ order      : int [1:13] 11 13 5 6 9 12 3 4 2 7 ...
 $ labels     : NULL
 $ method     : chr "ward.D2"
 $ call       : language hclust(d = m.dist, method = "ward.D2")
 $ dist.method: chr "euclidean"
 - attr(*, "class")= chr "hclust"
> m.h$merge
      [,1] [,2]
 [1,]   -9  -12
 [2,]   -3   -4
 [3,]   -7   -8
 [4,]   -1  -10
 [5,]   -6    1
 [6,]  -11  -13
 [7,]    3    4
 [8,]   -5    5
 [9,]   -2    7
[10,]    6    8
[11,]    2    9
[12,]   10   11

6、树状图（Dendomgram）呈现

> m.cnum<-cutree(m.h,k=4)
> m.cnum
 [1] 1 1 2 2 3 3 1 1 3 1 4 3 4
> plot(m.h)![在这里插入图片描述](https://img-blog.csdnimg.cn/2019122012043216.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTM4NzE2MA==,size_16,color_FFFFFF,t_70)

在这里插入图片描述

> rect.hclust(m.h,k=3)

在这里插入图片描述

7、可视化进阶（待完善）

> install.packages('dendextend')
> library(dendextend).
> m.dend<-as.dendrogram(m.h) # 聚类分析结果转化为dendrogram类
> install.packages('colorspace') #
> library(colorspace)
> m.dend<-color_branches(m.dend,k=4)
> m.dend
'dendrogram' with 2 branches and 13 members total, at height 7.174631 
> class(m.dend)
[1] "dendrogram"
> labels_colors(m.dend)<-rainbow_hcl(4)[sort_levels_values(m.cnum[order.dendrogram(m.dend)])]#使品牌标签变为与各自聚类相同的颜色
> plot(m.dend)

在这里插入图片描述

> install.packages('gplots') #绘制热图捕捉各个聚类的视觉特征，使用heatmap.2()函数
> library(gplots)
> heatmap.2(m.s,main='不同品牌鼠标属性热 图',dendrogram='row',Rowv=m.dend,Colv=FALSE,col=colorspace::diverge_hcl(256),RowSideColors=labels_colors(m.dend)[order(order.dendrogram(m.dend))]) #注意参数m.s，是用来做聚类分析的标准化数据，并非dist（）函数转化的结果。

在这里插入图片描述

二、非层次聚类分析步骤：(为何结果每次不同)
这里采用K-means方法。步骤如下：

> set.seed(100)
> m.km<-kmeans(m.s,center=4)
> m.km
K-means clustering with 4 clusters of sizes 2, 4, 5, 2

Cluster means:
        Touch      Chips     Driver Compatibility       Game
1 -0.18683220  0.2339991 -0.0202638     1.2532650  0.0000000
2  0.01556935  0.5918801  0.6383096    -0.7215768  0.1369306
3 -0.67259591 -0.9470082 -1.0212953    -0.2278664 -0.7668116
4  1.83718328  0.9497611  1.2968829     0.7595545  1.6431677

Clustering vector:
 [1] 1 2 4 4 3 3 2 2 3 1 3 3 2

Within cluster sum of squares by cluster:
[1] 0.6903294 7.5874419 9.1861416 0.3917705
 (between_SS / total_SS =  70.2 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"      
> m.km$cluster
 [1] 1 2 4 4 3 3 2 2 3 1 3 3 2
> tapply(row.names(m),m.km$cluster,unique) #显示各聚类包含的鼠标品牌
$`1`
[1] "1"  "10"

$`2`
[1] "2"  "7"  "8"  "13"

$`3`
[1] "5"  "6"  "9"  "11" "12"

$`4`
[1] "3" "4"
> install.packages('GGally')
> library(GGally)
> m.km.df<-as.data.frame(cbind(m.s,cluster=m.km$cluster))
> m.km.df
        Touch       Chips     Driver Compatibility       Game cluster
1  -0.1868322  0.41293963 -0.2836931     0.7595545  0.0000000       1
2  -0.1868322  1.84446367 -0.2836931    -1.2152872  1.0954451       2
3   1.4323802  0.77082064  1.2968829     0.7595545  1.6431677       4
4   2.2419864  1.12870165  1.2968829     0.7595545  1.6431677       4
5  -0.9964384 -2.09222745 -0.8105518    -1.2152872 -0.5477226       3
6  -0.9964384 -0.66070340 -0.8105518    -0.2278664  0.0000000       3
7  -0.1868322  0.05505862  0.7700242    -0.2278664  0.0000000       2
8   0.6227740  0.41293963  1.2968829    -0.2278664  0.5477226       2
9  -0.9964384 -0.30282239 -1.3374105     0.7595545 -1.0954451       3
10 -0.1868322  0.05505862  0.2431655     1.7469754  0.0000000       1
11  0.6227740 -0.66070340 -0.8105518    -1.2152872 -1.0954451       3
12 -0.9964384 -1.01858441 -1.3374105     0.7595545 -1.0954451       3
13 -0.1868322  0.05505862  0.7700242    -1.2152872 -1.0954451       2
> class(m.km.df)
[1] "data.frame"
> m.km.df$cluster<-factor(m.km.df$cluster)
> ggparcoord(m.km.df,columns=1:6,groupColumn='cluster')+facet_wrap(~cluster)+theme_bw()

在这里插入图片描述

倚东

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
R聚类分析@;

原始数据：13个品牌的鼠标参数，如下表。brand Touch Chips Driver Compatibility Game1 Brand1 7.5 17.5 7.0 8.0 8.02 Brand2 7.5 19.5 7.0 7.0 9.03 Brand3 8.5 18.0 8.5 ...
复制链接

扫一扫