哈佛大学——差异表达分析（十二）可视化

最新推荐文章于 2024-05-14 11:31:45 发布

零级伪码农

最新推荐文章于 2024-05-14 11:31:45 发布

阅读量5.2k

点赞数

分类专栏： RNA-seq 笔记文章标签：生物信息学 r语言数据分析

本文链接：https://blog.csdn.net/weixin_46585008/article/details/109546509

版权

文章目录

学习目标
结果可视化
- 绘制显著DE基因

学习目标

使用数据可视化探索表达数据
使用火山图来评估DEG统计数据之间的关系
利用热图绘制重要基因的表达

结果可视化

当我们处理大量数据时，图形化地显示这些信息以获得更深入的了解是很有用的。在这节课中，我们将让你开始一些基本的和更高级的图形，通常用于探索差异基因表达数据，然而，这些图形中的许多也可以帮助可视化其他类型的数据。
我们将使用三个不同的数据对象，我们已经在早期的课程中创建:

我们示例的元数据(一个dataframe):meta
我们每个样本(一个矩阵)中每个基因的归一化表达数据:normalized_counts
我们在上一课中生成的DESeq2结果tibble版本:res_tableOE_tb和res_tableKD_tb

首先，让我们从数据框创建一个元数据tibble(不要丢失行名!)

mov10_meta <- meta %>% 
              rownames_to_column(var="samplename") %>% 
              as_tibble()

接下来，让我们将带有gene symbols的列引入normalized_counts对象，这样我们就可以使用它们来标记我们的图。Ensembl ID在很多方面都很有用，但作为生物学家，我们更容易识别这些gene symbols。

# DESeq2 creates a matrix when you use the counts() function
## First convert normalized_counts to a data frame and transfer the row names to a new column called "gene"
normalized_counts <- counts(dds, normalized=T) %>% 
                     data.frame() %>%
                     rownames_to_column(var="gene") 
  
# Next, merge together (ensembl IDs) the normalized counts data frame with a subset of the annotations in the tx2gene data frame (only the columns for ensembl gene IDs and gene symbols)
grch38annot <- tx2gene %>% 
               dplyr::select(ensgene, symbol) %>% 
               dplyr::distinct()

## This will bring in a column of gene symbols
normalized_counts <- merge(normalized_counts, grch38annot, by.x="gene", by.y="ensgene")

# Now create a tibble for the normalized counts
normalized_counts <- normalized_counts %>%
                     as_tibble()
  
normalized_counts

> normalized_counts
# A tibble: 57,761 x 10
   gene      Irrel_kd_1 Irrel_kd_2 Irrel_kd_3 Mov10_kd_2 Mov10_kd_3 Mov10_oe_1 Mov10_oe_2 Mov10_oe_3 symbol
   <chr>          <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl> <chr> 
 1 ENSG0000~   3925.       3794.      3961.      3952.      3941.      2727.       2730.     3178.   TSPAN6
 2 ENSG0000~     24.2        30.2       30.7       23.7       13.9       20.4        33.3      33.6  TNMD  
 3 ENSG0000~   1326.       1341.      1180.      1515.      1432.      1542.       1548.     1943.   DPM1  
 4 ENSG0000~    456.        422.       476.       597.       610.       527.        518.      541.   SCYL3 
 5 ENSG0000~   1250.       1212.      1134.      1389.      1300.       965.        999.     1029.   C1orf~
 6 ENSG0000~      0.897       1.04       0          1.28       0          0           0         0    FGR   
 7 ENSG0000~     19.7        18.7        9.34      10.2        2.14       7.34       17.5       6.11 CFH   
 8 ENSG0000~   2922.       2705.      2642.      2942.      2971.      2397.       2786.     2544.   FUCA2 
 9 ENSG0000~   2691.       2538.      2625.      3008.      3

最低0.47元/天解锁文章

零级伪码农

关注

0
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
哈佛大学——差异表达分析（十二）可视化

文章目录学习目标结果可视化绘制显著DE基因使用DESeq2 `plotCounts()`绘制单个基因的表达利用ggplot2绘制单个基因的表达使用ggplot2绘制多个基因(例如前20个)热图火山图学习目标使用数据可视化探索表达数据使用火山图来评估DEG统计数据之间的关系利用热图绘制重要基因的表达结果可视化当我们处理大量数据时，图形化地显示这些信息以获得更深入的了解是很有用的。在这节课中，我们将让你开始一些基本的和更高级的图形，通常用于探索差异基因表达数据，然而，这些图形中的许多也可以帮助可
复制链接

扫一扫