limma利用广义线性模型进行差异表达基因分析,主要用于分析微阵列数据。
Data analysis, linear models and differential expression for microarray data.
1. 读入基因表达谱数据
exprSet <- read.delim(file)
# read.csv,read.table 或其他方式导入
2. 设计矩阵
group <- c(rep('normal',5),rep('tumor',6)) # mock
group = factor(group)
design <- model.matrix(~0+group)
## design <- model.matrix(~group)
#含有差异比较,不需要比较矩阵(makeContrasts生成,不需要下面的fit2步骤
rownames(design) = colnames(exprSet)
colnames(design) <- levels(group)
3. 生成DGEList对象
dgelist <- DGEList(counts = exprSet, group = group)
4. 数据标准化、筛选
# 计算比例因子以将原始库大小转换为有效库大小。
dgelist <- calcNormFactors(dgelist)
# 筛选
keep_gene <- rowSums(cpm(dgelist) > 1 ) >= 2 # 自定义
table(keep_gene)
dgelist <- dgelist[keep_gene, , keep.lib.sizes = FALSE]
5. 差异表达基因分析
# RNA_seq数据需要counts转化,芯片数据不需要这一步
v <- voom(DGElist, design, plot = TRUE, normalize = "quantile")
fit <- lmFit(v, design)
cont.matrix <- makeContrasts(contrasts = c('grouptumor-groupnormal'), levels = design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
nrDEG_limma_voom = topTable(fit2, coef = 'tumor-normal', n = Inf)
nrDEG_limma_voom = na.omit(nrDEG_limma_voom)
head(nrDEG_limma_voom)
voom: 将计数数据转换log2-counts per million (logCPM),估计均值-方差关系,并使用此关系计算适当的观察水平权重。然后,数据就可以进行线性建模了。
6. 提取差异表达基因
padj = 0.01 # 自定义
foldChange= 2 # 自定义
nrDEG_limma_voom_signif = nrDEG_limma_voom[(nrDEG_limma_voom$adj.P.Val < padj &
(nrDEG_limma_voom$logFC>foldChange | nrDEG_limma_voom$logFC<(-foldChange))),]
nrDEG_limma_voom_signif = nrDEG_limma_voom_signif[order(nrDEG_limma_voom_signif$logFC),]
save(nrDEG_limma_voom_signif, file = 'nrDEG_limma_voom_signif')