数据分析服务请访问以下链接:
数据和代码获取:请查看主页个人信息
大家好!本期来介绍ggplot2包绘制火山图的方法,火山图通常用于生物学、医学、环境科学等领域,以展示基因表达、蛋白质含量、化学成分等在不同实验条件下的变化情况,适用于两组样本直接的比较。
本期教学使用的数据由Deseq2包差异分析得到:
载入R包和数据
rm(list=ls())
pacman::p_load(tidyverse, ggrepel)
DEG_limma_voom <- read.table("DEG_limma_voom.txt", header=T, sep="\t")
head(DEG_limma_voom)
根据logFC和Pvalue进行分组
# 分组信息
data <-
DEG_limma_voom %>%
mutate(change = as.factor(ifelse(P.Value < 0.05 & abs(logFC) > 1,
ifelse(logFC > 1 ,'Up','Down'),'No change'))) %>%
rownames_to_column('gene')
table(data)
这里我们常使用P.Value小于0.05,logFC值=1进行分组,可以看到上调基因有3032个,下调基因有3413个,差异不显著的基因有14051个。
绘图:按照logFC和pvalue的阈值进行关键基因标记
ggplot(data,aes(logFC, -log10(adj.P.Val)))+
geom_hline(yintercept = -log10(0.05), linetype = "dashed", color = "#999999")+
geom_vline(xintercept = c(-1.2,1.2), linetype = "dashed", color = "#999999")+
geom_point(aes(color = change),
size = 0.2,
alpha = 0.5) +
theme_bw(base_size = 12)+
ggsci::scale_color_jama() +
theme(panel.grid = element_blank(),
legend.position = 'right') +
# 添加标签:
geom_text_repel(data = filter(data, abs(logFC) > 1 & -log10(adj.P.Val) > 38),
max.overlaps = getOption("ggrepel.max.overlaps", default = 20),
aes(label = gene,
color = change),
size = 2) +
xlab("Log2FC")+
ylab("-Log10(FDR q-value)")
这里我们使用ggrepel包对abs(logFC) > 1 且 -log10(adj.P.Val) > 38的基因进行标记,散点的颜色随-log10(adj.P.Val)值渐变。
你也可以标记你想要的目标基因,接下来我们随机筛选几个作为目标基因。
set.seed(123)
core_gene <- data$gene[sample(300)[1:15]]
core_gene
这里我们将火山图的散点颜色进行个性化,图片风格模仿Nature communication的一篇文章
# 出图
ggplot(data,aes(logFC, -log10(adj.P.Val)))+
geom_hline(yintercept = -log10(0.05), linetype = "dashed", color = "black")+
geom_vline(xintercept = c(-1.2,1.2), linetype = "dashed", color = "black")+
geom_point(aes(size = -log10(adj.P.Val),
color = -log10(adj.P.Val)))+
scale_color_gradientn(values = seq(0,1,0.2),
colors = c("#39489f","#39bbec","#f9ed36","#f38466","#b81f25"))+
scale_size_continuous(range = c(0,1))+
theme_bw(base_size = 12)+
theme(panel.grid = element_blank(),
legend.position = 'right',
legend.justification = c(0,1))+
# 设置图例
guides(col =
guide_colorbar(title = "-Log10_q-value",
ticks.colour = NA,
reverse = T,
title.vjust = 0.8,
barheight = 8,
barwidth = 1),
size = "none") +
# 添加标签:
geom_text_repel(data = filter(data, gene %in% core_gene),
max.overlaps = getOption("ggrepel.max.overlaps", default = 20),
# 这里的filter很关键,筛选你想要标记的基因
aes(label = gene),
size = 2,
color = 'black') +
xlab("Log2FC")+
ylab("-Log10(FDR q-value)")
关键词“火山图”获取数据和代码