参考文章:
RNA-seq(7): DEseq2筛选差异表达基因并注释
转录组入门7-用DESeq2进行差异表达分析
Analyzing RNA-seq data with DESeq2
RNA-seq练习 第三部分(DEseq2筛选差异表达基因,可视化)
1.关于DESeq2的概述
A basic task in the analysis of count data from RNA-seq is the detection of differentially expressed genes. The count data are presented as a table which reports, for each sample, the number of sequence fragments that have been assigned to each gene. Analogous data also arise for other assay types, including comparative ChIP-Seq, HiC, shRNA screening, and mass spectrometry. An important analysis question is the quantification and statistical inference of systematic changes between conditions, as compared to within-condition variability. The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models; the estimates of dispersion and logarithmic fold changes incorporate data-driven prior distributions.
2.DESeq2包的安装
#对于R3.6以上版本,首先安装BiocManager软件,然后安装DESeq2软件
>BiocManager::install("DESeq2")
#依赖包会同时安装
#检查软件是否安装成功,加载
>library(DESeq2)
3.需要输入的数据结构
- count matrix(表达矩阵)—countData:就是我们前面通过read count计算后并融合生成的矩阵,行为各个基因,列为各个样品,中间为计算reads或者fragment得到的整数。
gene_id control_1 control_2 sh_1_1 sh_1_2 sh_2_2 sh_2_3
0610005C13Rik 0 0 4 5 1 0
0610007P14Rik 230 0 1119 1197 1868 1439
0610009B22Rik 46 0 225 272 285 228
0610009L18Rik 3 0 12 12 16 16
0610009O20Rik 157 1 684 702 499 636
0610010B08Rik 0 0 0 0 0 0
- sample information matrix(样品信息矩阵)—colData:它的类型是一个dataframe(数据框),第一列是样品名称,第二列是样品的处理情况(对照还是处理等),即condition,condition的类型是一个factor。
condition
control_1 control
control_2 control
sh_2_1 sh_2
sh_2_2 sh_2
sh_2_3 sh_2
- 差异比较矩阵—design:告诉差异分析函数是要分析哪些变量间的差异,简单说就是说明哪些是对照哪些是处理。
4.载入数据
流程代码:
#调用DESeq2包
library(DESeq2)
#设置工作目录
setwd("G:/zhaoxiujuan/Rtreatment")
#设置mycounts变量
mycounts <- read.table("G:/zhaoxiujuan/Rtreatment/raw_count_file", header = T, row.names = 1)
#显示mycounts信息
head(mycounts)
#设置样品组别、重复数
condition <- factor(c(rep("control", 2), rep("sh_2", 3)), levels = c("control","sh_2"))
#显示condition设置
condition
#设置colData值
colData <- data.frame(row.names = colnames