R语言prcomp函数进行主成分分析PCA

主成分分析(Principal Component Analysis,PCA)是一种降维方法,通常用于降低大型数据集的维数。PCA作图可用于查看数据的分布,是否有批次效应等。

 prcomp {stats}

Description

Performs a principal components analysis on the given data matrix and returns the results as an object of class prcomp.

Usage

prcomp(x, ...)

## S3 method for class 'formula'
prcomp(formula, data = NULL, subset, na.action, ...)

## Default S3 method:
prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE,
       tol = NULL, rank. = NULL, ...)

## S3 method for class 'prcomp'
predict(object, newdata, ...)

Arguments

formula

a formula with no response variable, referring only to numeric variables.

data

an optional data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).

subset

an optional vector used to select rows (observations) of the data matrix x.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit.

...

arguments passed to or from other methods. If x is a formula one might specify scale. or tol.

x

a numeric or complex matrix (or data frame) which provides the data for the principal components analysis.

retx

a logical value indicating whether the rotated variables should be returned.

center

a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.

scale.

a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is FALSE for consistency with S, but in general scaling is advisable. Alternatively, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.

tol

a value indicating the magnitude below which components should be omitted. (Components are omitted if their standard deviations are less than or equal to tol times the standard deviation of the first component.) With the default null setting, no components are omitted (unless rank. is specified less than min(dim(x)).). Other settings for tol could be tol = 0or tol = sqrt(.Machine$double.eps), which would omit essentially constant components.

rank.

optionally, a number specifying the maximal rank, i.e., maximal number of principal components to be used. Can be set as alternative or in addition to tol, useful notably when the desired rank is considerably smaller than the dimensions of the matrix.

object

object of class inheriting from "prcomp"

newdata

An optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. If the original fit used a formula or a data frame or a matrix with column names, newdata must contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order.

R代码

###1.基因表达数据数据
data()  # 产看R内置数据集
data(sample.ExpressionSet)  # 载入数据集
#featureNames(sample.ExpressionSet)
#sampleNames(sample.ExpressionSet)

## 表达谱数据 matrix
exprs_matrix <- exprs(sample.ExpressionSet)
# 转置后,行为样品,列为特征
m_matrix <- t(exprs_matrix) 
#head(m_matrix)
#dim(m_matrix)

## 表型数据
pdata <- phenoData(sample.ExpressionSet)
class(pdata) #[1] "AnnotatedDataFrame"
# AnnotatedDataFrame转data.frame
p_df <- as(pdata, "data.frame")
#colnames(p_df)
#rownames(p_df)

###2.主成分分析
# 每一个样品的表达谱数据进行PCA分析
pca <- prcomp(m_matrix,center = TRUE,scale. = TRUE)
class(pca) #"prcomp"
# pca$sdev,pca$center,pca$scale,pca$rotation, pca$x

# 各样品的PCA结果
m_df <- as.data.frame(pca$x) 
summ <- summary(pca)
# summ$importance,summ$sdev,summ$center,summ$scale,summ$rotation, summ$x

# 提取主成分的方差贡献率,生成坐标轴标题
xlab <- paste0("PC1(",round(summ$importance[2,1]*100,2),"%)")
ylab <- paste0("PC2(",round(summ$importance[2,2]*100,2),"%)")

## 合并PCA结果和表型数据
final_df<-cbind(m_df,p_df)

###3.绘制PCA图
library(ggplot2)
p.pca <- ggplot(data = final_df,aes(x = PC1,y = PC2,color = type))+
  stat_ellipse(aes(fill = type),
               type = "norm",geom = "polygon",alpha = 0.25,color = NA)+ # 添加置信椭圆
  geom_point(size = 3.5)+
  # color = "Condition" 改变注释的文字
  labs(x = xlab,y = ylab,color = "Condition",title = "PCA Scores Plot")+
  guides(fill = "none")+
  theme_bw()+
  scale_fill_manual(values = c("blue","red"))+
  scale_colour_manual(values = c("blue","red"))+
  theme(plot.title = element_text(hjust = 0.5,size = 15),
        axis.text = element_text(size = 11),axis.title = element_text(size = 13),
        legend.text = element_text(size = 11),legend.title = element_text(size = 13),
        plot.margin = unit(c(0.4,0.4,0.4,0.4),'cm'))
p.pca
#ggsave(p.pca,filename = "PCA.pdf") # 保存文件

  • 2
    点赞
  • 30
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值