【R语言】factoextra包绘制PCA主成分分析图

最新推荐文章于 2024-05-15 16:20:05 发布

noob_k

最新推荐文章于 2024-05-15 16:20:05 发布

阅读量3.3k

点赞数 4

分类专栏： R 文章标签： r语言 RStudio 开发语言

本文链接：https://blog.csdn.net/qq_40625733/article/details/126283825

版权

R 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

PCA主成分分析绘图

1.加载安装包

这里要用到三个包：“ggplot2”，“factoextra”，“FactoMineR”。

> remove(list = ls()) #清除工作空间对象
> library(ggplot2)
> library(factoextra)
> library(FactoMineR)

2.导入数据

> df1 <- read.csv("C:/Users/luokai/Desktop/PCA1.csv",header = F,row.names = 1)
> head(df1)
                          V2          V3          V4          V5          V6          V7          V8          V9         V10
group                     S1          S1          S1          S2          S2          S2          S3          S3          S3
Lysobacter       15.67017534 13.61251717 15.31950199 0.620339447 1.028696844 0.931545775 9.590293125 12.01051385 13.18866976
Sphingopyxis     1.421585551 2.066318096  1.89594359 1.184185059 2.530712182 2.001358393 19.49599794 11.86851337 5.062409126
Luteimonas       10.25218367 5.792161251  7.84162888 0.188194401  0.25226339 0.301435814 4.858893936 6.421368599 7.970115437
Sphingomonas     1.763317345  2.20514914 1.944134178 3.473059503 8.116277912 7.852562178 3.111211348 5.399206852 4.232195358
Stenotrophomonas 5.698662704 4.113037875 4.859814216 0.565634948 0.714519085 0.692012891 3.851443855 4.370827154  4.76256787

这里我的数据有三类(group): S1,S2和S3，每类3个数据。

3.数据集处理

> df1 <- as.data.frame(t(df1)) #转置
> df2 <- subset(df1, select = -group)  #这里把表示属性的group列删了，不然待会儿PCA分析会报错。

4.PCA分析

> df2.pca <- PCA(df2,scale.unit = T,graph = F)
Error in PCA(df2, scale.unit = T, graph = F) : 
The following variables are not quantitative:  Lysobacter
The following variables are not quantitative:  Sphingopyxis
The following variables are not quantitative:  Luteimonas
The following variables are not quantitative:  Sphingomonas
The following variables are not quantitative:  Stenotrophomonas
The following variables are not quantitative:  Xanthomonas

使用的是"FactoMineR"包的PCA命令来进行主成分分析。
这里报错,检查发现数据集df2中的数据都是character而非numeric，所以在PCA分析时会出现错误，运用以下命令转换一下数据类型再分析就可以了。

> df2[,1:1761] <- as.numeric(unlist(df2[,1:1761])) #此数据集有1761列，都转化成numeric
> df2.pca <- PCA(df2,scale.unit = T,graph = F) #不要生成图片

5.作图

> fviz_pca_ind(df2.pca,
             geom.ind = "point", #只显示点
             mean.point=F, #这个设成True作出的图上会多出一个较大的中心点
             pointsize =5,
             pointshape = 21,
             fill.ind = df1$group, #根据类别着色
             palette = c("#894329","#12727D","#FCB886"), #颜色，我这里有三组
             legend.title = "Groups",
             title="") +
  theme_bw()