PCA主成分分析绘图
1.加载安装包
这里要用到三个包:“ggplot2”,“factoextra”,“FactoMineR”。
> remove(list = ls()) #清除工作空间对象
> library(ggplot2)
> library(factoextra)
> library(FactoMineR)
2.导入数据
> df1 <- read.csv("C:/Users/luokai/Desktop/PCA1.csv",header = F,row.names = 1)
> head(df1)
V2 V3 V4 V5 V6 V7 V8 V9 V10
group S1 S1 S1 S2 S2 S2 S3 S3 S3
Lysobacter 15.67017534 13.61251717 15.31950199 0.620339447 1.028696844 0.931545775 9.590293125 12.01051385 13.18866976
Sphingopyxis 1.421585551 2.066318096 1.89594359 1.184185059 2.530712182 2.001358393 19.49599794 11.86851337 5.062409126
Luteimonas 10.25218367 5.792161251 7.84162888 0.188194401 0.25226339 0.301435814 4.858893936 6.421368599 7.970115437
Sphingomonas 1.763317345 2.20514914 1.944134178 3.473059503 8.116277912 7.852562178 3.111211348 5.399206852 4.232195358
Stenotrophomonas 5.698662704 4.113037875 4.859814216 0.565634948 0.714519085 0.692012891 3.851443855 4.370827154 4.76256787
这里我的数据有三类(group): S1,S2和S3,每类3个数据。
3.数据集处理
> df1 <- as.data.frame(t(df1)) #转置
> df2 <- subset(df1, select = -group) #这里把表示属性的group列删了,不然待会儿PCA分析会报错。
4.PCA分析
> df2.pca <- PCA(df2,scale.unit = T,graph = F)
Error in PCA(df2, scale.unit = T, graph = F) :
The following variables are not quantitative: Lysobacter
The following variables are not quantitative: Sphingopyxis
The following variables are not quantitative: Luteimonas
The following variables are not quantitative: Sphingomonas
The following variables are not quantitative: Stenotrophomonas
The following variables are not quantitative: Xanthomonas
使用的是"FactoMineR"包的PCA命令来进行主成分分析。
这里报错,检查发现数据集df2中的数据都是character而非numeric,所以在PCA分析时会出现错误,运用以下命令转换一下数据类型再分析就可以了。
> df2[,1:1761] <- as.numeric(unlist(df2[,1:1761])) #此数据集有1761列,都转化成numeric
> df2.pca <- PCA(df2,scale.unit = T,graph = F) #不要生成图片
5.作图
> fviz_pca_ind(df2.pca,
geom.ind = "point", #只显示点
mean.point=F, #这个设成True作出的图上会多出一个较大的中心点
pointsize =5,
pointshape = 21,
fill.ind = df1$group, #根据类别着色
palette = c("#894329","#12727D","#FCB886"), #颜色,我这里有三组
legend.title = "Groups",
title="") +
theme_bw()
图片导出: