RNA-seq流程学习笔记（15）-使用DESeq2进行差异基因分析

最新推荐文章于 2025-02-27 15:30:10 发布

垚垚爸爱学习

最新推荐文章于 2025-02-27 15:30:10 发布

阅读量3.7w

点赞数 63

分类专栏： RNA-seq学习笔记 R学习笔记

本文链接：https://blog.csdn.net/xiaomotong123/article/details/106900481

版权

本文详细介绍了如何使用DESeq2进行RNA-seq数据分析，从概述、安装、输入数据结构到构建dds矩阵、标准化、提取分析结果以及筛选显著差异基因的过程。DESeq2基于负二项式广义线性模型，用于检测RNA-seq数据中的差异表达基因，文中还提到了差异表达基因的筛选标准。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

参考文章：
RNA-seq(7): DEseq2筛选差异表达基因并注释
 转录组入门7-用DESeq2进行差异表达分析
 Analyzing RNA-seq data with DESeq2
RNA-seq练习第三部分（DEseq2筛选差异表达基因,可视化）

1.关于DESeq2的概述

A basic task in the analysis of count data from RNA-seq is the detection of differentially expressed genes. The count data are presented as a table which reports, for each sample, the number of sequence fragments that have been assigned to each gene. Analogous data also arise for other assay types, including comparative ChIP-Seq, HiC, shRNA screening, and mass spectrometry. An important analysis question is the quantification and statistical inference of systematic changes between conditions, as compared to within-condition variability. The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models; the estimates of dispersion and logarithmic fold changes incorporate data-driven prior distributions.

2.DESeq2包的安装

#对于R3.6以上版本，首先安装BiocManager软件，然后安装DESeq2软件
>BiocManager::install("DESeq2")
#依赖包会同时安装
#检查软件是否安装成功，加载
>library(DESeq2)

3.需要输入的数据结构

count matrix（表达矩阵）—countData：就是我们前面通过read count计算后并融合生成的矩阵，行为各个基因，列为各个样品，中间为计算reads或者fragment得到的整数。

      gene_id  control_1  control_2  sh_1_1  sh_1_2  sh_2_2  sh_2_3
0610005C13Rik          0          0       4       5       1       0
0610007P14Rik        230          0    1119    1197    1868    1439
0610009B22Rik         46          0     225     272     285     228
0610009L18Rik          3          0      12      12      16      16
0610009O20Rik        157          1     684     702     499     636
0610010B08Rik          0          0       0       0      0       0

sample information matrix（样品信息矩阵）—colData：它的类型是一个dataframe（数据框），第一列是样品名称，第二列是样品的处理情况（对照还是处理等），即condition，condition的类型是一个factor。

                 condition
       control_1   control
       control_2   control
       sh_2_1         sh_2
       sh_2_2         sh_2
       sh_2_3         sh_2

差异比较矩阵—design：告诉差异分析函数是要分析哪些变量间的差异，简单说就是说明哪些是对照哪些是处理。

4.载入数据

流程代码：

#调用DESeq2包
library(DESeq2)
#设置工作目录
setwd("G:/zhaoxiujuan/Rtreatment")
#设置mycounts变量
mycounts <- read.table("G:/zhaoxiujuan/Rtreatment/raw_count_file", header = T, row.names = 1)
#显示mycounts信息
head(mycounts)
#设置样品组别、重复数
condition <- factor(c(rep("control", 2), rep("sh_2", 3)), levels = c("control","sh_2"))
#显示condition设置
condition
#设置colData值
colData <- data.frame(row.names = colnames

最低0.47元/天解锁文章