graphlan/iTOL画进化树记录

最新推荐文章于 2024-05-06 11:36:40 发布

土豆西红柿青椒

最新推荐文章于 2024-05-06 11:36:40 发布

阅读量3.7k

点赞数 2

分类专栏：生物信息文章标签：生物信息学

本文链接：https://blog.csdn.net/weixin_43151909/article/details/114275656

版权

生物信息专栏收录该内容

24 篇文章

订阅专栏

最近遇到一个问题，需要将进化树展示出来，并对特定物种进行标记。

我的数据是来自于人类的gut microbiome数据，分析流程是metaphlan3的分析流程，基于此输出，以及几天的调研，我下面总结了自己如何一步步画出进化树的以供参考。

metaphlan3 输入文件

我有一个来自于metaphlan3的输出文件，这里使用来自metaphlan3官网的例子文件

wget https://github.com/biobakery/biobakery/raw/master/demos/biobakery_demos/data/metaphlan3/output/merged_abundance_table.txt

进化树文件（metaphlan3提供的）

进入这个网址 https://github.com/biobakery/MetaPhlAn/tree/master/metaphlan/utils

mpa_v30_CHOCOPhlAn_201901_species_tree.nwk 进化树文件, 把这个仓库下载到本地

git clone https://github.com/biobakery/MetaPhlAn.git

或者直接download zip，在本地解压即可

可以画树的工具

iTOL ; ggtree; grahplan，目前我只尝试了iTOL和graphlan两种，itol功能多且全，比graphlan画出来的进化树漂亮很多。最下面的ref有很多有用的帖子

graphlan

https://huttenhower.sph.harvard.edu/graphlan

这里面有非常详细的画图过程

https://github.com/biobakery/biobakery/wiki/metaphlan3#create-a-cladogram-with-graphlan

进入到预先安装好的conda 环境

conda activate graphlan

生成 GraPhlAn 需要的 input files

merged_abundance_table.txt 例子：

# 从第二行开始直到文件末尾（不要#mpa_v30... 的注释）
# 截掉第二列 tax_id 

tail -n +2 merged_abundance_table.txt | cut -f1,3- > merged_abundance_table_reformatted.txt

# 生成tree文件，graphlan需要2个文件，第一个是树文件，第二个是注释文件；
下面这个export2graphlan.py就是生产这两个文件的脚本。

GraPhlAn requires two inputs: (i) a tree structure to represent and (ii) graphical annotation options for the tree.
Run the following command to generate the two input files for GraPhlAn (the tree and annotation files) providing the abundance table (merged_abundance_table.txt) created in prior tutorial steps: :

export2graphlan.py --skip_rows 1 -i merged_abundance_table_reformatted.txt --tree merged_abundance.tree.txt --annotation merged_abundance.annot.txt --most_abundant 100 --abundance_threshold 1 --least_biomarkers 10 --annotations 5,6 --external_annotations 7 --min_clade_size 1

参数介绍：

skip_rows : 跳过第一行（header）
most_abundant 100 选取abudance最大的100个highlight
abudance_threshlod abundance 阈值
least_biomarkers 最少的biomarker（这个不太清楚-应该是和lefse有关的）
annotations 注释taxonomic levels5， 6
external_annotation 外部legend的设置的是taxonomic level7

生成xml树文件并且生成图片

external legends for level7

iTOL 画图

iTOL 本身是一个在线画图工具，随着使用人数的增加，目前保存等功能都需要付费了。

iTOL 中对进化树的注释，需要依赖注释文件，生成注释文件是非常耗时麻烦的，基于此，推荐一个好用的r package ： table2itol https://github.com/mgoeker/table2itol

这个工具也在iTOL官网被推荐了。https://itol.embl.de/help.cgi#external

part2 会使用itol 工具画出满足下面要求的species tree of life 图

1. 保留tips ( 16 species )

2. 画柱状图(table2itol output)

3. 标记部分物种(table2itol output)

4. 热图(table2itol output)

5. 上传到itol网站进化树

6. 拖拽注释

输入文件还是上文用到的merge_table.txt

首先下载这个代码仓库：

git clone https://github.com/mgoeker/table2itol.git 
chmod +x table2itol/table2itol.R #对此脚本增加可执行权限
./table2itol/table2itol.R #查看帮助信息

我本次选择的树是从metaphlan3的github下载的进化树nwk文件

https://github.com/biobakery/MetaPhlAn/tree/master/metaphlan/utils

mpa_v30_CHOCOPhlAn_201901_species_tree.nwk

只保留在merge_table_species 里面有的16个物种（species level）：

#!/usr/bin/R 
# selected tips : this is Rscript 

library(ape)
library(tidyverse)
library(stringi)

# read the total tree: 读入总树
tree <- read.tree("metaphlan/utils/mpa_v30_CHOCOPhlAn_201901_species_tree.nwk")


tree$tip.label <- str_remove_all(tree$tip.label, "GCA_[0-9]+\\|")

# read the merge table to get the selected species 只保留merge table里有的物种
merge_table <- read_delim("merged_abundance_table.txt", delim = "\t", skip = 1)


selected_species <- merge_table$clade_name

selected_species <- intersect(selected_species, tree$tip.label) 

length(selected_species) #16 species

species <- merge_table %>%
  filter(., str_detect(clade_name, "s__"))

nrow(species)

length(selected_species)

tree <- tree %>% keep.tip(selected_species)
write.tree(tree, "tree.selected.nwk")

生成树文件tree.selected.nwk

下面就是生成各种注释文件：

注释文件1: 标记不同的phylum，label标记成species

#!/usr/bin/R 

# 承接上面的R脚本

# annotation1 : 制作table2itol的输入文件：phylum和species label column


annotation1 <- tibble(clade_name = selected_species,
                      Phylum = str_split(selected_species, "\\|", simplify = T)[,2],
                      Species = str_split(selected_species, "\\|", simplify = T)[,7])
              
        
write_delim(annotation1, "annotation1.txt", delim = "\t")        

# annotation2: targeted species 用符号标记感兴趣的species

annotation2 <- tibble(clade_name = selected_species, 
                      targeted_species = c(rep("TRUE",4), rep("FALSE",1),
                    rep("TRUE",1),rep("FALSE",3),rep("TRUE",2),rep("FALSE",5)),
                    interested_species = c(rep("FALSE",12), rep("TRUE",2),
                                           rep("FALSE",2)))


write_delim(annotation2, "annotation2.txt", delim = "\t")   

# annotation3 : calculate mean abundance of each sample 生成柱状图注释

annotation3 <- merge_table %>%
  filter(., str_detect(clade_name, "s__")) %>%
  mutate(rowsum = rowSums(.[,3:8])) %>% # total 560 samples 
  add_column(Mean_abundance = .$rowsum/6) %>%
  select(clade_name, Mean_abundance)

write_delim(annotation3, "annotation3.txt", delim = "\t") 


# annotation4 : make heatmap 生成热图注释

annotation4 <- merge_table[,c(1,3:8)]
write_delim(annotation4, "annotation4.txt", delim = "\t")

生成这四个注释文件之后，run table2itol脚本生成itol可以识别的注释文件

# label the species and phylum 
Rscript table2itol/table2itol.R -D test1/ -i clade_name -l Species -b Phylum -t %s -w 0.5 annotation1.txt 

# label the targeted species 
Rscript table2itol/table2itol.R -a -D test2/ -i clade_name -t %s -w 0.5 annotation2.txt -l clade_name

# i wanna make the above NO-targeted or NO-interested species : No shape 
# after reading the description in iTOL website 

# extract header 
head -n15 test2/iTOL_binary-targeted_species.txt > test2/iTOL_binary-targeted_species-change-shape.txt

# change the FALSE/0 -> -1 (-1 means no shape) 
tail -n +16 test2/iTOL_binary-targeted_species.txt | sed 's/\t0/\t-1/g' >> test2/iTOL_binary-targeted_species-change-shape.txt  

 
# label the abundance barplot 

Rscript table2itol/table2itol.R -c double -D test3/ -i clade_name -t %s annotation3.txt 

# heatmap 

Rscript table2itol/table2itol.R -a -d -c none -D test4/ -i clade_name -l clade_name -t %s -w 0.5 annotation4.txt

附上一个itol官网给的注释类型：

https://itol.embl.de/help.cgi#datasets （上面手动用sed 更改的就是参考这里binary 类型改的。）

正式画图：

s1》打开itol官网，注册一个账号，登陆账号，上传刚刚的tree.selected.nwk （16 species），打开这个tree

display mode 选择circular

s2》拖拽annotation文件到itol我们打开的tree

顺序依次为 : test1 里的注释文件：iTOL_labels-Species.txt; iTOL_treecolors-Phylum.txt

test2 : iTOL_binary-interested_species.txt; iTOL_binary-targeted_species-change-shape.txt

test4: *profile.txt

test3: iTOL_gradient-Mean_abundance.txt

拖拽完毕，就可以export了。由于我不是付费会员，只能一次性调好，然后点击export

用inkscape打开这个svg，稍微调整legend的位置，就可以生成下面的图：（是不是很简单:-) )

微调的系数，后面再更新

小tips：

如何把phyloxml格式的进化树转换成nwk格式的tree:

library(rphyloxml)
library(ape)
fn <- system.file("merged_abundance.xml", package="rphyloxml")
xmltree <- read_phyloxml(fn)
test <- read_phyloxml("merged_abundance_noannotation.xml")
apetree <- phyloxml2phylo(test)[[1]] # class(apetree)
write.tree(apetree,"merged_abundance.nwk")

如有问题，请多多指教~

Ref: 感谢下面这些帖子！

https://github.com/biobakery/biobakery/wiki/metaphlan3

https://mp.weixin.qq.com/s/pR7uKYxlXad9fBp7o-vkMA

https://blog.csdn.net/woodcorpse/article/details/104547961/

https://huttenhower.sph.harvard.edu/graphlan

https://forum.biobakery.org/t/metaphlaan3-genome-list/970/6