install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)

在基因组研究领域,探索一个或一系列与感兴趣的途径相关的基因表达谱是很常见的。这里我介绍一个很实用而且华丽的包ggpubr,可以提供发表级质量的作图效果,而且可以直接套用特定期刊规定的调色板,以方便生命科学家进行探索性数据分析(EDA)。

18b9408947e6339b3320d821b2f20cff.png

举个栗子

看到这样一张图,小伙伴们是不是觉得很专业?是不是想做出一张同样的图?下面我将逐步演示。事先说明,所有这些图都可以使用非常灵活的ggplot2 R包创建。然而,要自定义gglot,对于初学者来说,语法可能看起来不透明,这增加了没有高级R编程技能的研究人员的难度。ggpubr是一个围绕ggplot2的包装器,它提供了一些易于使用的函数,用于创建基于“ggplot2”的发表级绘图。我们将使用ggpubr函数从TCGA基因组数据集中可视化基因表达谱。

Contents:

  • Prerequisites:ggpubr package,TCGA data
  • Gene expression data
  • Box plots
  • Violin plots
  • Stripcharts and dot plots
  • Density plots
  • Histogram plots
  • Empirical cumulative density function
  • Quantile - Quantile plot

必备条件

1、ggpubr包:可以用CRAN以如下命令安装。

install.packages("ggpubr")

或者,从Github安装最新的测试版。

if(!require(devtools)) install.packages("devtools")devtools::install_github("kassambara/ggpubr")

然后加载该包。

library(ggpubr)

2、TCGA数据

癌症基因组图谱(TCGA)数据是一个公开的数据,包含33种癌症的临床和基因组数据。这些数据包括基因表达、CNV图谱、SNP基因型、DNA甲基化、miRNA图谱、外显子组测序和其他类型的数据。Marcin等人开发的RTCGA 软件包为获取TCGA中可用的临床和基因组数据提供了方便的解决方案。具体的安装方法可以查询Bioconductor仓库或者参考我的另一篇文章《RTCGA:TCGA数据挖掘的终极利器》。下面的R代码需要安装核心RTCGA软件包以及clinical和mRNA基因表达数据包。

要查看每种癌症类型的可用数据类型,请使用以下命令:

library(RTCGA)infoTCGA()
b1dd33d4fbab55a80f360adc49f54ec6.png

每种癌症类型的可用数据类型

3、基因表达数据

RTCGA包中的函数expressionTCGA()可以很容易地提取一种或多种癌症类型中感兴趣的基因的表达值。在下面的R代码中,我们首先从3个不同的数据集中(乳腺浸润性癌BRCA,卵巢浆液性囊腺癌OV,肺鳞癌LUSC)提取5个感兴趣的基因GATA3、PTEN、XBP1、ESR1和MUC1的mRNA表达。

library(RTCGA)library(RTCGA.mRNA)expr 
00f62f319a7ad945d06db65357dd14a3.png

提取mRNA表达值

要显示每个数据集中的样本数,请键入以下内容。

nb_samples 
6793840778e9dbbdfdd23829aa5ef0e8.png

样本数

我们可以通过删除“mRNA”标记来简化数据集名称。这可以使用R基本函数gsub()来完成。

expr$dataset 

让我们也简化一下患者的条形码(barcode)列。下面的R码会将条形码更改为BRCA1、BRCA2、…,ov1,ov2,…等。

expr$bcr_patient_barcode 
30863b2e6c615c148c46aa94d6467ca5.png

简化标签后的数据集

上述演示所需数据集在网上也已经整理好,可供下载。此数据是练习本教程中提供的R代码所必需的。如果您在安装RTCGA包时遇到一些问题,您可以简单地加载数据,如下所示:

expr 

盒装图、框图(Box plots)

创建基因表达谱的框图,按组着色(此处为数据集/癌症类型):

library(ggpubr)# GATA3ggboxplot(expr, x = "dataset", y = "GATA3",          title = "GATA3", ylab = "Expression",          color = "dataset", palette = "jco")# PTENggboxplot(expr, x = "dataset", y = "PTEN",          title = "PTEN", ylab = "Expression",          color = "dataset", palette = "jco")
818ad0d4775fc030ee8261bf532951ec.png

palette参数用于使用不同的调色板。关于调色板知识,以后打算再写一篇文章来系统性介绍。目前您只需知道,ggpubr可以直接调用ggsci包的科学期刊调色板,例如:“NPG”,“AAAS”,“Lancet”,“JCO”,“ucscgb”等。很显然,上面代码直接调用了适合于JCO杂志的调色板,很美观大方。

您可以一次创建一个曲线图列表,而不是为每个基因重复相同的R代码,如下所示:

# Create a  list of plotsp 

请注意,当参数y包含多个变量(这里是多个基因名称)时,参数titlexlabylab也可以是与y长度相同的字符向量。要将p值和显著性级别添加到框图中,简单地说,您可以这样做:

my_comparisons 
d7fc929d6aa064b1308634cdf678cfbe.png

对于每个基因,您可以按如下方式比较不同的组

compare_means(c(GATA3, PTEN, XBP1) ~ dataset, data = expr)
a25fb53a905fd2fcfe27b3d995bc1ef2.png

基因在不同癌症中两两比较

如果要选择要显示的项目(此处为癌症类型)或要从绘图中删除特定项目,请使用参数selectremove,如下所示:

# Select BRCA and OV cancer typesggboxplot(expr, x = "dataset", y = "GATA3",          title = "GATA3", ylab = "Expression",          color = "dataset", palette = "jco",          select = c("BRCA", "OV"))# or remove BRCAggboxplot(expr, x = "dataset", y = "GATA3",          title = "GATA3", ylab = "Expression",          color = "dataset", palette = "jco",          remove = "BRCA")
0631f357cf181d46f4c78a42fb061f6a.png

要更改数据集在x轴上的顺序,请使用参数order。例如order=c(“LUSC”,“OV”,“BRCA”)

# Order data setsggboxplot(expr, x = "dataset", y = "GATA3",          title = "GATA3", ylab = "Expression",          color = "dataset", palette = "jco",          order = c("LUSC", "OV", "BRCA"))
d1ee96eebec2e29a7b224a4733890e74.png

数据集在x轴上的顺序

要创建水平绘图,请使用参数rotate=true

ggboxplot(expr, x = "dataset", y = "GATA3",          title = "GATA3", ylab = "Expression",          color = "dataset", palette = "jco",          rotate = TRUE)
e7716e33d243147028a4ec243f5982af.png

水平盒装图

要将三个基因表达图合并为多面板图,请使用参数combine=TRUE

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,          ylab = "Expression",          color = "dataset", palette = "jco")
e67eb744b4b32ea43e5a90344c32a815.png

多面板图

也可以使用参数merge=TRUEmerge=“asis”合并这3个绘图。

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          merge = TRUE,          ylab = "Expression",           palette = "jco")
5fc5e4d938c32710ff38b9df6bf2919c.png

融合盒装图,肿瘤类型作为分组变量

在上面的图表中,很容易直观地比较每种癌症类型中不同基因的表达水平。但是你可能想把基因(y变量)放在x轴上,以便比较不同细胞亚群中的表达水平。在这种情况下,y变量(即:基因)成为x刻度标签,而x变量(即:数据集)成为分组变量。为此,请使用参数merge=“Flip”

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          merge = "flip",          ylab = "Expression",           palette = "jco")
ccf9e1f98cee037c6c43e550fb39f31e.png

基因作为分组变量

您可能希望在框图上添加抖动点。每一点都对应于个别的观察结果。要添加抖动点,请使用参数add=“jitter”,如下所示。要自定义添加的元素,请指定参数add.params

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,          color = "dataset", palette = "jco",          ylab = "Expression",           add = "jitter",                              # Add jittered points          add.params = list(size = 0.1, jitter = 0.2)  # Point size and the amount of jittering          )
5c0523d2465160c052159be079558979.png

添加jitter

注意,当使用ggboxplot()时,参数add的合理值是c(“jitter”,“dotplot”)之一。如果您决定使用add=“dotplot”,当您有一个很密集的点图时,您可以调整点大小和bin宽度。您可以按如下方式添加和调整点图。

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,          color = "dataset", palette = "jco",          ylab = "Expression",           add = "dotplot",                              # Add dotplot          add.params = list(binwidth = 0.1, dotsize = 0.3)          )
d48500466311cf667e00bf7a04014dd6.png

添加dotplot

您可能希望在盒装图中标记前n个最高或最低值的样本名称。在这种情况下,可以使用以下参数:

label:包含点标签的列的名称。

label.select:可以有两种格式:

指定要显示的一些标签的字符向量。

包含以下组件之一或组合的列表:

top.uptop.down:用于显示顶部向上/向下点的标签。例如,label.select=list(top.up=10,
top.down=4)

criteria:例如,要按x和y变量值进行过滤,请使用以下命令: label.select=list(criteria=“`y`>3.9 & `y`<5 & `x` %in% c(‘BRCA’,‘OV’)”)

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,          color = "dataset", palette = "jco",          ylab = "Expression",           add = "jitter",                               # Add jittered points          add.params = list(size = 0.1, jitter = 0.2),  # Point size and the amount of jittering          label = "bcr_patient_barcode",                # column containing point labels          label.select = list(top.up = 2, top.down = 2),# Select some labels to display          font.label = list(size = 9, face = "italic"), # label font          repel = TRUE                                  # Avoid label text overplotting          )
001655d98174176392f01b1cced4659e.png

top样本标记

可以按如下方式指定复杂的标签。

label.select.criteria  3.9 & `x` %in% c('BRCA', 'OV')")ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,          color = "dataset", palette = "jco",          ylab = "Expression",           label = "bcr_patient_barcode",              # column containing point labels          label.select = label.select.criteria,       # Select some labels to display          font.label = list(size = 9, face = "italic"), # label font          repel = TRUE                                # Avoid label text overplotting          )
97fa78d580368e063432c153ca216560.png

自定义复杂标签

小提琴图(Violin plots)

下面的R代码绘制内部带有框图的小提琴图。

ggviolin(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,           color = "dataset", palette = "jco",          ylab = "Expression",           add = "boxplot")
3dc8f2bb50625d007837561772ac0dee.png

内部带有框图的小提琴图

除了在小提琴图内添加框图外,您可以按如下方式添加中位数+分位数范围。

ggviolin(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,           color = "dataset", palette = "jco",          ylab = "Expression",           add = "median_iqr")
f4fa6608ca8522b6254b4798232440c7.png

带有中位数+分位数范围的小提琴图

使用函数ggviolin()时,参数add的合理值包括:“means”、“means_se”、“means_sd”、“mean_ci”、“mean_range”、“median”、“median_iqr”、“median_mad”、“median_range”。您还可以在小提琴曲线图中添加“jitter”点和“dotplot”,如前所述。

条形图和点图(Stripcharts and dot plots)

要绘制条形图,请键入以下内容。

ggstripchart(expr, x = "dataset",             y = c("GATA3", "PTEN", "XBP1"),             combine = TRUE,              color = "dataset", palette = "jco",             size = 0.1, jitter = 0.2,             ylab = "Expression",              add = "median_iqr",             add.params = list(color = "gray"))
80f73c998e7a372b0c70edbb389200a0.png

条形图

对于点图,用下面代码。

ggdotplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,           color = "dataset", palette = "jco",          fill = "white",          binwidth = 0.1,          ylab = "Expression",           add = "median_iqr",          add.params = list(size = 0.9))
eb185aa9267d00d20a56cdfa82bb431b.png

带dotplot的小提琴图

密度图(Density plots)

要将分布可视化为密度图,请使用函数ggdensity(),如下所示。

# Basic density plotggdensity(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..density..",       combine = TRUE,                  # Combine the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE                       # Add marginal rug)
5d72103de11b43674478748569ebd367.png

基本密度图

# Change color and fill by datasetggdensity(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..density..",       combine = TRUE,                  # Combine the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE,                      # Add marginal rug       color = "dataset",        fill = "dataset",       palette = "jco")
7d51c0cfbe381889c66b7c279d611c5a.png

改变颜色,以数据集填充

# Merge the 3 plots# and use y = "..count.." instead of "..density.."ggdensity(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
5efc116f67f95abcc87232317593509f.png

三图融合

# color and fill by x variablesggdensity(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       color = ".x.", fill = ".x.",     # color and fill by x variables       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
445e64266e7c077922dceb39b673d8e5.png

颜色以变量填充

# Facet by "dataset"ggdensity(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       color = ".x.", fill = ".x.",        facet.by = "dataset",            # Split by "dataset" into multi-panel       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
5c96a3608ca067406a67bd6f8c2ae5ad.png

以数据集做分面图

直方图(Histogram plots)

要将分布可视化为直方图,请使用函数gghistogram(),如下所示。

# Basic histogram plot gghistogram(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..density..",       combine = TRUE,                  # Combine the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE                       # Add marginal rug)
9c51b9452f798832a8250afd3130a7eb.png

基本直方图

# Change color and fill by datasetgghistogram(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..density..",       combine = TRUE,                  # Combine the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE,                      # Add marginal rug       color = "dataset",        fill = "dataset",       palette = "jco")
ebb3db56e7a7120cca82871d531cd4b5.png

彩色直方图,以数据集填充

# Merge the 3 plots# and use y = "..count.." instead of "..density.."gghistogram(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
82c0fc863b0f7f881cb75f977a62fd5f.png

融合直方图

# color and fill by x variablesgghistogram(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       color = ".x.", fill = ".x.",     # color and fill by x variables       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
f5519da6cdbd794482eff5b3cdb87b2c.png

以x变量填充

# Facet by "dataset"gghistogram(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       color = ".x.", fill = ".x.",        facet.by = "dataset",            # Split by "dataset" into multi-panel       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
5c7bbee602d436111e7f84b00044dcde.png

以数据集分面

经验累积密度函数(Empirical cumulative density function)

# Basic ECDF plot ggecdf(expr,       x = c("GATA3", "PTEN",  "XBP1"),       combine = TRUE,                        xlab = "Expression", ylab = "F(expression)")
59ae192dd3a1f113d9f22adf84cca38d.png

基本ECDF

# Change color  by datasetggecdf(expr,       x = c("GATA3", "PTEN",  "XBP1"),       combine = TRUE,                        xlab = "Expression", ylab = "F(expression)",       color = "dataset", palette = "jco")
bbbe0ba50e2c991ed9cb24201a7381e4.png

根据数据集上色

# Merge the 3 plots and color by x variablesggecdf(expr,       x = c("GATA3", "PTEN",  "XBP1"),       merge = TRUE,                        xlab = "Expression", ylab = "F(expression)",       color = ".x.", palette = "jco")
ce201dd02a3ff71ebbbff5c65c4520ed.png

融合ECDF

# Merge the 3 plots and color by x variables# facet by "dataset" into multi-panelggecdf(expr,       x = c("GATA3", "PTEN",  "XBP1"),       merge = TRUE,                        xlab = "Expression", ylab = "F(expression)",       color = ".x.", palette = "jco",       facet.by = "dataset")
83c52954aed500da4b0c6321b8ad527f.png

根据数据集分面

分位数-分位数曲线图(Quantile - Quantile plot)

# Basic ECDF plot ggqqplot(expr,       x = c("GATA3", "PTEN",  "XBP1"),       combine = TRUE, size = 0.5)
49b624d3bf845223266e95d6561d41d5.png

基本Q-Q图

# Change color  by datasetggqqplot(expr,       x = c("GATA3", "PTEN",  "XBP1"),       combine = TRUE, color = "dataset", palette = "jco",       size = 0.5)
0b0910c2808ebf9acf7fafa2f9cd0a03.png

以数据集上色

# Merge the 3 plots and color by x variablesggqqplot(expr,       x = c("GATA3", "PTEN",  "XBP1"),       merge = TRUE,         color = ".x.", palette = "jco")
0ae3e812da325e14c1d99f7dc8dcfb0a.png

融合Q-Q图

# Merge the 3 plots and color by x variables# facet by "dataset" into multi-panelggqqplot(expr,       x = c("GATA3", "PTEN",  "XBP1"),       merge = TRUE, size = 0.5,       color = ".x.", palette = "jco",       facet.by = "dataset")
d1b575045c26225713005be55455a98e.png

以数据集做分面图

看了上面的演示,小伙伴们是不是跃跃欲试了呢?欢迎继续关注我的后续文章!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值