biobakery流程之有参宏基因组分析

在这里插入图片描述
最近涉猎了一些有参宏基因组的部分,自己构建流程多少是有点难度的,特别是步骤衔接、资源管理部分,站在巨人的肩上,可以避免自己的错误参数设置,而且可重复性也会理有所保证。之前看过nf-core的流程,这里看下biobakery,docker部分参考了生信菜鸟团的鲍志炜在公众号 BioLinkX 生物信息学小组的推文。
在这里插入图片描述

1、安装

1)docker 省事模式

# 宏基因组有参分析
# 安装dokcer,这就不表了,官方教程即可
# docker 构建,参考鲍志炜 BioLinkX 生物信息学小组的推文
# 根据推文建立了Dockerfile, 然后构建
docker build -t zjd/biobakery:0.15.1 .
# 测试下
docker run --rm -it \
-v /media/:/tmp_data zjd/biobakery:0.15.1 /bin/bash
exit # 退出

2)或者conda也不错

# conda的安装也略了
# 建立环境
conda create -n biobakery
# 激活环境
conda activate biobakery
# 安装流程,选用北外的镜像加速,有版本2和3,这里选了3,还是速度有点慢
conda install -c ftps://mirrors.bfsu.edu.cn/anaconda/cloud/biobakery biobakery_workflows=3.0.0a7
# 然后发现kndeaddata不兼容,于是手动再装一遍
pip install kneaddata #好像会把依赖再装下

2、数据库准备

# 从国家微生物科学中心下载,加速不止半点,不过有时不稳定呢
# https://nmdc.cn/datadownload
# kneaddata
wget ftp://download.nmdc.cn/tools/kneaddata/human_genome/*  -c 
# humann3
wget -c ftp://download.nmdc.cn/tools/humann3/full_chocophlan.v296_201901.tar.gz .
wget -c ftp://download.nmdc.cn/tools/humann3/uniref90_annotated_v201901.tar.gz .
wget -c ftp://download.nmdc.cn/tools/humann3/full_mapping_v201901.tar.gz .
# metaphlan3
wget ftp://download.nmdc.cn/tools/humann3/metaphlan_databases/* -c

3、运行程序

# export STRAINPHLAN_DB_MARKERS=/tmp_data/biobakery_wmgx_demo/strainphlan_db_markers
# export STRAINPHLAN_DB_REFERENCE=/tmp_data/biobakery_wmgx_demo/strainphlan_db_reference
export KNEADDATA_DB_HUMAN_GENOME=/tmp_data/db/kneaddata/human_genome
humann_config --update database_folders utility_mapping /tmp_data/db/humann3/utility_mapping
humann_config --update database_folders protein /tmp_data/db/humann3/uniref
humann_config --update database_folders nucleotide /tmp_data/db/humann3/chocophlan

cd /tmp_data/project/metagenomics_test
# --dry-run 流程运行
time biobakery_workflows wmgx --input ./ --output output_data \
--bypass-strain-profiling --local-jobs  2 --threads 8  \
--pair-identifier _1  --remove-intermediate-output # 这个参数是只输出命令测试 --dry-run

以下是一个运行例子:

biobakery_workflows wmgx --input ./ --output outputads 8 #--dry-runtrain-profiling --local-jobs  5 --threa
(Jan 23 07:27:31) [ 0/28 -   0.00%] **Ready    ** Task  4: kneaddata____s5-3
(Jan 23 07:27:31) [ 0/28 -   0.00%] **Ready    ** Task  0: kneaddata____s5-2
(Jan 23 07:27:32) [ 0/28 -   0.00%] **Started  ** Task  4: kneaddata____s5-3
(Jan 23 07:27:32) [ 0/28 -   0.00%] **Started  ** Task  0: kneaddata____s5-2

(Jan 23 08:33:31) [ 1/28 -   3.57%] **Completed** Task  0: kneaddata____s5-2
(Jan 23 08:33:31) [ 1/28 -   3.57%] **Ready    ** Task  8: metaphlan____s5-2
(Jan 23 08:33:31) [ 1/28 -   3.57%] **Started  ** Task  8: metaphlan____s5-2
(Jan 23 08:35:42) [ 2/28 -   7.14%] **Completed** Task  4: kneaddata____s5-3
(Jan 23 08:35:42) [ 2/28 -   7.14%] **Ready    ** Task 10: metaphlan____s5-3
(Jan 23 08:35:42) [ 2/28 -   7.14%] **Ready    ** Task  7: kneaddata_read_count_table
(Jan 23 08:35:42) [ 2/28 -   7.14%] **Started  ** Task  7: kneaddata_read_count_table
(Jan 23 08:35:42) [ 2/28 -   7.14%] **Started  ** Task 10: metaphlan____s5-3
(Jan 23 08:35:42) [ 3/28 -  10.71%] **Completed** Task  7: kneaddata_read_count_table
(Jan 23 08:49:19) [ 4/28 -  14.29%] **Completed** Task  8: metaphlan____s5-2
(Jan 23 08:49:19) [ 4/28 -  14.29%] **Ready    ** Task 13: humann____s5-2
(Jan 23 08:49:19) [ 4/28 -  14.29%] **Started  ** Task 13: humann____s5-2
(Jan 23 08:52:07) [ 5/28 -  17.86%] **Completed** Task 10: metaphlan____s5-3
(Jan 23 08:52:07) [ 5/28 -  17.86%] **Ready    ** Task 15: humann____s5-3
(Jan 23 08:52:07) [ 5/28 -  17.86%] **Ready    ** Task 11: metaphlan_join_taxonomic_profiles
(Jan 23 08:52:07) [ 5/28 -  17.86%] **Started  ** Task 15: humann____s5-3
(Jan 23 08:52:07) [ 5/28 -  17.86%] **Started  ** Task 11: metaphlan_join_taxonomic_profiles
(Jan 23 08:52:07) [ 6/28 -  21.43%] **Completed** Task 11: metaphlan_join_taxonomic_profiles
(Jan 23 08:52:07) [ 6/28 -  21.43%] **Ready    ** Task 12: metaphlan_count_species
(Jan 23 08:52:07) [ 6/28 -  21.43%] **Started  ** Task 12: metaphlan_count_species
(Jan 23 08:52:07) [ 7/28 -  25.00%] **Completed** Task 12: metaphlan_count_species

(Jan 23 14:19:38) [ 8/28 -  28.57%] **Completed** Task 15: humann____s5-3
(Jan 23 14:19:38) [ 8/28 -  28.57%] **Ready    ** Task 18: humann_regroup_UniRef2EC____s5-3
(Jan 23 14:19:38) [ 8/28 -  28.57%] **Ready    ** Task 23: humann_renorm_genes_relab____s5-3
(Jan 23 14:19:38) [ 8/28 -  28.57%] **Ready    ** Task 27: humann_renorm_pathways_relab____s5-3
(Jan 23 14:19:38) [ 8/28 -  28.57%] **Started  ** Task 18: humann_regroup_UniRef2EC____s5-3
(Jan 23 14:19:39) [ 8/28 -  28.57%] **Started  ** Task 23: humann_renorm_genes_relab____s5-3
(Jan 23 14:19:39) [ 8/28 -  28.57%] **Started  ** Task 27: humann_renorm_pathways_relab____s5-3
(Jan 23 14:19:39) [ 9/28 -  32.14%] **Completed** Task 27: humann_renorm_pathways_relab____s5-3
(Jan 23 14:19:43) [10/28 -  35.71%] **Completed** Task 23: humann_renorm_genes_relab____s5-3
(Jan 23 14:19:47) [11/28 -  39.29%] **Completed** Task 18: humann_regroup_UniRef2EC____s5-3
(Jan 23 14:19:47) [11/28 -  39.29%] **Ready    ** Task 25: humann_renorm_ecs_relab____s5-3
(Jan 23 14:19:47) [11/28 -  39.29%] **Started  ** Task 25: humann_renorm_ecs_relab____s5-3
(Jan 23 14:19:48) [12/28 -  42.86%] **Completed** Task 25: humann_renorm_ecs_relab____s5-3
(Jan 23 14:24:40) [13/28 -  46.43%] **Completed** Task 13: humann____s5-2
(Jan 23 14:24:40) [13/28 -  46.43%] **Ready    ** Task 16: humann_count_alignments_species
(Jan 23 14:24:40) [13/28 -  46.43%] **Ready    ** Task 17: humann_regroup_UniRef2EC____s5-2
(Jan 23 14:24:40) [13/28 -  46.43%] **Started  ** Task 16: humann_count_alignments_species
(Jan 23 14:24:40) [13/28 -  46.43%] **Ready    ** Task 19: humann_join_tables_genefamilies
(Jan 23 14:24:40) [13/28 -  46.43%] **Started  ** Task 17: humann_regroup_UniRef2EC____s5-2
(Jan 23 14:24:40) [13/28 -  46.43%] **Ready    ** Task 21: humann_join_tables_pathabundance
(Jan 23 14:24:40) [13/28 -  46.43%] **Started  ** Task 19: humann_join_tables_genefamilies
(Jan 23 14:24:40) [13/28 -  46.43%] **Ready    ** Task 22: humann_renorm_genes_relab____s5-2
(Jan 23 14:24:40) [13/28 -  46.43%] **Started  ** Task 21: humann_join_tables_pathabundance
(Jan 23 14:24:40) [13/28 -  46.43%] **Ready    ** Task 26: humann_renorm_pathways_relab____s5-2
(Jan 23 14:24:40) [13/28 -  46.43%] **Started  ** Task 22: humann_renorm_genes_relab____s5-2
(Jan 23 14:24:40) [14/28 -  50.00%] **Completed** Task 16: humann_count_alignments_species
(Jan 23 14:24:40) [14/28 -  50.00%] **Started  ** Task 26: humann_renorm_pathways_relab____s5-2
(Jan 23 14:24:41) [15/28 -  53.57%] **Completed** Task 21: humann_join_tables_pathabundance
(Jan 23 14:24:41) [16/28 -  57.14%] **Completed** Task 26: humann_renorm_pathways_relab____s5-2
(Jan 23 14:24:41) [16/28 -  57.14%] **Ready    ** Task 30: humann_join_tables_pathways_relab
(Jan 23 14:24:41) [16/28 -  57.14%] **Started  ** Task 30: humann_join_tables_pathways_relab
(Jan 23 14:24:41) [17/28 -  60.71%] **Completed** Task 30: humann_join_tables_pathways_relab
(Jan 23 14:24:41) [17/28 -  60.71%] **Ready    ** Task 33: humann_count_features_pathways
(Jan 23 14:24:41) [17/28 -  60.71%] **Started  ** Task 33: humann_count_features_pathways
(Jan 23 14:24:41) [18/28 -  64.29%] **Completed** Task 33: humann_count_features_pathways
(Jan 23 14:24:45) [19/28 -  67.86%] **Completed** Task 22: humann_renorm_genes_relab____s5-2
(Jan 23 14:24:45) [19/28 -  67.86%] **Ready    ** Task 28: humann_join_tables_genes_relab
(Jan 23 14:24:45) [19/28 -  67.86%] **Started  ** Task 28: humann_join_tables_genes_relab
(Jan 23 14:24:45) [20/28 -  71.43%] **Completed** Task 19: humann_join_tables_genefamilies
(Jan 23 14:24:49) [21/28 -  75.00%] **Completed** Task 17: humann_regroup_UniRef2EC____s5-2
(Jan 23 14:24:49) [21/28 -  75.00%] **Ready    ** Task 20: humann_join_tables_ecs
(Jan 23 14:24:49) [21/28 -  75.00%] **Ready    ** Task 24: humann_renorm_ecs_relab____s5-2
(Jan 23 14:24:49) [21/28 -  75.00%] **Started  ** Task 20: humann_join_tables_ecs
(Jan 23 14:24:49) [21/28 -  75.00%] **Started  ** Task 24: humann_renorm_ecs_relab____s5-2
(Jan 23 14:24:49) [22/28 -  78.57%] **Completed** Task 20: humann_join_tables_ecs
(Jan 23 14:24:49) [23/28 -  82.14%] **Completed** Task 24: humann_renorm_ecs_relab____s5-2
(Jan 23 14:24:49) [23/28 -  82.14%] **Ready    ** Task 29: humann_join_tables_ecs_relab
(Jan 23 14:24:49) [23/28 -  82.14%] **Started  ** Task 29: humann_join_tables_ecs_relab
(Jan 23 14:24:50) [24/28 -  85.71%] **Completed** Task 29: humann_join_tables_ecs_relab
(Jan 23 14:24:50) [24/28 -  85.71%] **Ready    ** Task 32: humann_count_features_ecs
(Jan 23 14:24:50) [25/28 -  89.29%] **Completed** Task 28: humann_join_tables_genes_relab
(Jan 23 14:24:50) [25/28 -  89.29%] **Started  ** Task 32: humann_count_features_ecs
(Jan 23 14:24:50) [25/28 -  89.29%] **Ready    ** Task 31: humann_count_features_genes
(Jan 23 14:24:50) [25/28 -  89.29%] **Started  ** Task 31: humann_count_features_genes
(Jan 23 14:24:50) [26/28 -  92.86%] **Completed** Task 32: humann_count_features_ecs
(Jan 23 14:24:50) [27/28 -  96.43%] **Completed** Task 31: humann_count_features_genes
(Jan 23 14:24:50) [27/28 -  96.43%] **Ready    ** Task 34: humann_merge_feature_counts
(Jan 23 14:24:50) [27/28 -  96.43%] **Started  ** Task 34: humann_merge_feature_counts
(Jan 23 14:24:51) [28/28 - 100.00%] **Completed** Task 34: humann_merge_feature_counts
Run Finished

大约十个小时一个样本的速度,如果采用8核心的话,速度是有点慢,但是宏基因组数据的巨大,导致数据处理的时间变长不可避免。这个流程还可以做一些其他分析的,毕竟获得一个profiling表只是开始,比较差异,计算多样性情况才是一般研究所需要的。

评论 11
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值