目录
前言
之前学习了如何 使用itol.toolkit添加进化树注释。本文主要使用r语言中的ape
包来实现icamp结果可视化
本文思路参考及主要代码来源于Nature Microbiology树形图的R语言可视化代码
数据来源(由于文章还未发表,需要请私信)
复现目标依然是Ning D, Wang Y, Fan Y, et al. Environmental stress mediates groundwater microbial community assembly中的Fig.4
一、ape包介绍
APE (Analysis of Phylogenetics and Evolution) 是1个用R语言编写的用于分子进化和系统发育分析的免费软件包(ape包原始文章)
ape包有两个优点,一是可将树文件直接输入,并提供相关的统计分析函数;二是绘图自由度高,可利用它绘出高质量的树形图
这里推荐一个很详细的ape包介绍微信文章系列:R包神器 | 系统发育和进化分析 - ape (一)
本文主要用到了ape包中的getMRCA()和plot.phylo()等函数
二、绘图步骤
1.导入库
library(tidyverse)
library(magrittr)
library(ape)
library(base)
library(vegan)
library(phytools)
library(scales)
后面三个包用于画图
2.读入数据
分别读入nwk树文件(可选),分类信息文件,注释信息文件
otu_tree <- read.tree("Galaxy9-[FastTree.nwk].nwk")#读取进化树
otutab <- read.csv("otu_taxon.csv", row.names = 1)#读otu分类信息
data <- read.csv("data.csv" )#读取注释信息
具体文件格式请查看数据来源,保证自己的数据与样例数据的格式完全一致,包括文件名,变量名,otu序列命名方式(这个后面会提到见)
虽然可以根据分类信息来生成树,仍然建议读入nwk格式的树文件
3. 使用vegan包中的taxa2dist函数建树(可选)
taxonomy_info <- otutab %>%
select(taxonomy) %>%
separate(taxonomy, into = letters[1:6] , sep=";")
tree <- as.phylo(hclust(vegan::taxa2dist(taxonomy_info, varstep = TRUE)))
根据otutable中的分类信息直接建树,具体实现细节可以看这篇文章。
对比其他软件建的树(左图)和用taxa2dist函数建的树(右图)可发现taxa2dist函数建树在距离上较为笼统,不具有发表质量。此外,这个函数还有错误分类的问题(见文章)。
因此,尽管使用taxa2dist函数可以实现不使用nwk树文件就绘制出进化树,仍推荐先使用其他软件建树,以获得高质量的树形图。
4.分组
classifier <- otutab$taxonomy
classifier <- as.character(classifier)
cat1 = sapply(strsplit(classifier, ";"), function(x) x[2])
cat2 = sapply(strsplit(classifier, ";"), function(x) x[3])
cat1 <- gsub(" D_1__","",cat1)
cat2 <- gsub(" D_2__","",cat2)
cat1 <- gsub("p__","",cat1)
cat2 <- gsub("c__","",cat2)
groups <- cat2#cat1按门分类,cat2按纲分类
names(groups) <- dimnames(otutab)[[1]]
OTU_taxa <- groups %>% enframe("OTU", "class")
核对一下分组信息:
classes <- OTU_taxa %>% pull(class) %>% unique()
> classes
[1] "Nitrososphaeria" "Bathyarchaeia" "Thermoplasmata"
[4] "Methanosarcinia" "AUnclassified" "Lokiarchaeia"
[7] "Methanomicrobia" "Halobacteria" "Nanoarchaeia"
[10] "Methanobacteria" "Campylobacteria" "Gammaproteobacteria"
[13] "Anaerolineae" "Desulfobulbia" "Alphaproteobacteria"
[16] "Nitrospiria" "Thermoanaerobaculia" "Acidimicrobiia"
[19] "Deltaproteobacteria_bacterium_CSP1_8" "Desulfobacteria" "Holophagae"
[22] "Clostridia" "Bacteroidia" "Desulfuromonadia"
[25] "Thermoleophilia" "BUnclassified" "Actinobacteria"
> OTU_taxa %>%
+ group_by(class) %>%
+ summarise(n())
# A tibble: 27 × 2
class `n()`
<chr> <int>
1 AUnclassified 2
2 Acidimicrobiia 3
3 Actinobacteria 1
4 Alphaproteobacteria 4
5 Anaerolineae 6
6 BUnclassified 1
7 Bacteroidia 4
8 Bathyarchaeia 20
9 Campylobacteria 2
10 Clostridia 1
设定颜色,这里把所有归类为others的类都设定成"#A3A3A3"
class_color <- tribble(
~class, ~color,
"Anaerolineae", "#0072B2",
"Thermoleophilia", "#A3A3A3",
"Bacteroidia", "#006FA0",
"Desulfobulbia", "#E69F00",
"Desulfobacteria", "#009E73",
"Thermoanaerobaculia","#8064A2",
"Acidimicrobiia","#BD79BC",
"Campylobacteria","#A3A3A3",
"Desulfuromonadia","#A3A3A3",
"Alphaproteobacteria","#FC7E15",
"Deltaproteobacteria_bacterium_CSP1_8","#A3A3A3",
"Gammaproteobacteria","#D55E00",
"BUnclassified","#918C76",
"Bathyarchaeia","#37B9FF",
"Nitrososphaeria", "#5EDEFC",
"Methanosarcinia", "#D8B900",
"Thermoplasmata", "#92D050",
"Lokiarchaeia", "#E85E00",
"Nanoarchaeia", "#943DC1",
"Halobacteria", "#005786",
"Methanobacteria", "#BD79BC",
"Methanomicrobia", "#CC79A7",
"AUnclassified", "#918C76",
"Clostridia","#A3A3A3",
"Nitrospiria","#A3A3A3",
"Holophagae","#A3A3A3",
"Actinobacteria","#A3A3A3",
) %>%
deframe()
5.给树枝上色
edge_color <- as.data.frame(otu_tree$edge)
edge_color$color <- '#000000'#默认为黑色
classes <- OTU_taxa %>% pull(class) %>% unique()
for (class_ in classes){
Ancestor <- getMRCA(otu_tree,OTU_taxa %>% filter(class_==class) %>% pull(OTU))
if(is.null(Ancestor)){
Children <- match(OTU_taxa %>% filter(class_==class) %>% pull(OTU),otu_tree$tip.label)
}else{
Children <- getDescendants(otu_tree,node=Ancestor)
}
edge_color[edge_color$V2 %in% Children, 'color'] <- class_color[class_]
}
树枝上色实现方案:用getMRCA()
根据同纲内所有的tip.label
返回最小共同父节点,然后用getDescendants()
返回父节点之下的所有子节点。
> tree[["tip.label"]]
[1] "Archaea_OTU_1" "Archaea_OTU_10" "Archaea_OTU_102" "Archaea_OTU_11" "Archaea_OTU_129" "Archaea_OTU_13"
[7] "Archaea_OTU_131" "Archaea_OTU_14" "Archaea_OTU_146" "Archaea_OTU_148" "Archaea_OTU_16" "Archaea_OTU_17"
[13] "Archaea_OTU_2" "Archaea_OTU_20" "Archaea_OTU_214" "Archaea_OTU_23" "Archaea_OTU_247" "Archaea_OTU_252"
请确保otu_table中的otu序列命名格式和
tip.label
格式是一样的,这一步需要将二者对应起来
> tree[["edge"]]
[,1] [,2]
[1,] 105 106
[2,] 106 125
[3,] 125 35
[4,] 125 146
[5,] 146 17
[6,] 146 147
[7,] 147 26
树文件中"edge"
类实际上存放的是树枝两端的两个节点,[,1]
是更靠内的节点(父节点),[,2]
是靠外的节点(子节点)。
根据这个特性,用getDescendants()
返回父节点之下的所有子节点后,便可对edge上色。
6. 绘制树形图以及注释
先统一颜色
# 保证data的otu顺序和树的tip.label顺序相同
data <- data[match(otu_tree$tip.label, data$OTU),]
# 获得分组颜色
OTU_taxa %<>% left_join(class_color %>% enframe('class', 'color'))
# 再处理一下data
data %<>% left_join(OTU_taxa)
画树
# 注意绘图顺序,避免图层覆盖
##plot.phylo画树
plot.phylo(otu_tree, type = "radial", show.tip.label=F, x.lim = c(-3,3), y.lim=c(-3,3), edge.color = edge_color$color ,cex = 1)
plot.phylo()函数参数
type = "radial"
才能画出这种按照外圆圈对齐的layout
multibar
## multi bar数据准备
val_mul <- data %>% as.data.frame() %>% `row.names<-`(.$OTU) %>% select(HeS,HoS,DL,HD,DR)
val_mul <- data.matrix(val_mul)
val_mul <- scales::rescale(val_mul, c(0,0.25)) #标准化到0~0.3
col1 <- c("#9180AC", "#D9BDD8","#8AB1D2","#F99391","#9DD0C7")
##画multibar
ring(val_mul, otu_tree, offset = 0.08, col = col1)
simplebar和背景颜色
## simple bar数据准备
val1 <- data %>% as.data.frame() %>% `row.names<-`(.$OTU) %>% select(DP)
val1 <- scales::rescale(val1$DP, c(0,0.3)) #标准化到0~0.3
val2 <- data %>% as.data.frame() %>% `row.names<-`(.$OTU) %>% select(SP)
val2 <- scales::rescale(val2$SP, c(0,0.3)) #标准化到0~0.3
col2 <- data$color
## 先增背景色
ring(0.32, otu_tree, offset = 0.38, col = "#ddf3ff")
## 再增柱形图
ring(val1, otu_tree, "ring", offset = 0.38, col = col2, bd_color = "#ddf3ff")
## 先增背景色
ring(0.32, otu_tree, offset = 0.72, col = "#fbf9d9")
## 再增柱形图
ring(val2, otu_tree, offset = 0.72, col = col2, bd_color = "#fbf9d9")
strip
ring(0.02, otu_tree, style='r', col = col2, offset = 1.15) #0.02也是根据树的半径来的
每个部分的大小可自行调整,需要先进行计算,计算方式点这里。
用ai加上donuts&bar plot并美化后最终结果如图
总结
通过ape包可以无需树文件利用分类学信息直接建树,这一点非常的方便。此外,如果需要对数据进行事先分析的话,ape包也提供了相应的函数。功能基本满足绘制进化树注释的需要了。
此外,十分感谢公众号ttfriends的文章Nature Microbiology树形图的R语言可视化代码以及周通大佬对于本人学习过程中的帮助!!!