metaphlan3和strainphlan3运行记录

文章描述了一位用户在使用MetaPhlAn3分析fastq文件时遇到的问题,包括指定数据库版本后的错误和安装新版本的困扰。解决方案涉及更新软件到3.0.8版以修复bug,以及使用mamba安装环境。在运行过程中,用户还遇到了需要至少4个样本的错误,通过合并多个样本解决了此问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

我所用的conda环境是metaphlan3.0.7,之前跑的数据用的都是v30的数据库。 但是由于目前metaphlan已经升级到v31,如果直接运行metaphlan,会自动下载v31的数据库。

指定数据库版本后遇到的问题1:

command:

https://forum.biobakery.org/t/install-metaphlan-3/369

(humann) user@d079e601f094:/analysis$ metaphlan P10E0.fastq.gz \
-x mpa_v30_CHOCOPhlAn_201901 \
--input_type fastq -s P10E0.new.sam.bz2 \
--bowtie2out P10E0.new.bowtie2.bz2 \
-o P10E0_new_profiled.tsv

报错:BowTie2 output file detected: /sbidata/projects/lzhang/202012_small_molecule/Analysis/metaphlan_human3/Result/StrainPhlAn/test_project/bowtie2/P10E0.new.bowtie2.bz2

Please provide the size of the metagenome using the --nreads parameter when running MetaPhlAn using SAM files as input
Exiting...

WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.
 

我用的是fastq文件,怎么会报错bowite2的问题呢,通过查阅发现是软件本身bug。

https://forum.biobakery.org/t/metaphlan3-bowtie2db-output-files-need-the-size-of-the-metagenome-using-the-nreads-parameter/2006/11

来自于帖子的解决办法:https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.0

I merged the PR, it should not take long to be available now

fbeghiniSegata Lab member

May '21

I merged the PR, it should not take long to be available now

总结下解决办法:since there is an error in 3.0.7, I will install 3.0.8 in my own conda environment. 重新安装3.0.8的环境 。

问题2:安装metaphlan3的旧环境(因为现在已经升级到4了)

使用conda安装速度在我的计算机上(Ubuntu)太慢,选择用mamba安装。

conda create --name mpa_strainphlan3

conda install -c conda-forge mamba

mamba install -c bioconda metaphlan=3.0.8 python=3.10.8

 Updating specs:

   - metaphlan=3.0.8
 

重新跑上面的命令行,仍然报错

Please provide the size of the metagenome using the --nreads parameter when running MetaPhlAn using SAM files as input
Exiting...

WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.
 

尝试安装3.0.9 

mamba install -c bioconda metaphlan=3.0.9
Traceback (most recent call last):
  File "/home/l/miniconda3/envs/mpa_strainphlan3/bin/mamba", line 7, in <module>
    from mamba.mamba import main
  File "/home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/mamba/mamba.py", line 49, in <module>
    import libmambapy as api
  File "/home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/libmambapy/__init__.py", line 7, in <module>
    raise e
  File "/home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/libmambapy/__init__.py", line 4, in <module>
    from libmambapy.bindings import *  # noqa: F401,F403
ImportError: /home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/libmambapy/../../../libmamba.so.2: undefined symbol: archive_write_add_filter_zstd
 

这个报错解决方法:Undefined symbol: archive_write_add_filter_zstd · Issue #1775 · mamba-org/mamba · GitHub

conda install libarchive==3.5.2 -c conda-forge

升级metaphlan3.0.9

mamba install -c bioconda metaphlan=3.0.9

步骤一:环境安装好之后就是先跑metaphlan3

metaphlan P10E0.new.bowtie2.bz2 \
--bowtie2db HUMAnN3_db/stable_201901b/db_v30/ \
--input_type bowtie2out \
-s P10E0.new.sam.bz2 --bowtie2out P10E0.new.bowtie2.bz2 \
-o P10E0_new_profiled.tsv
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.

warnings 没关系,只是通知你有些species它们可以有别的名字。

These are from MetaPhlAn, they just inform you that some species found can have “alternative” taxonomies (the list of species in the additional_species column). All the species listed under additional_species are not represented by any markers but they were found to be <5% ANI distant from the “reference” species (clade_name).

Unexpected output (format) - #2 by fbeghini - MetaPhlAn - The bioBakery help forum

这里的bowtie2db的数据库也可以用--index mpa_v30_CHOCOPhlAn_201901 软件会自动下载。或者自己从dropbox或者googledrive或者zento下载再解压就可以了,索引metaphlan自己会建立好。

MetaPhlAn 3.0 · biobakery/MetaPhlAn Wiki · GitHub 这里又提到数据库下载地址。

比如 zento数据库 

curl -o mpa_v30_CHOCOPhlAn_201901.tar "https://zenodo.org/record/3957592/files/mpa_v30_CHOCOPhlAn_201901.tar?download=1"

步骤二 :strainphlan3

sample2markers.py -i sams/P10E0.new.sam.bz2 \
	-o consensus_markers/P10E0.new.pkl -n 8
提取ecoli的序列

extract_markers.py -d /opt/conda/envs/humann/lib/python3.9/site-packages/metaphlan/metaphlan_databases/mpa_v31_CHOCOPhlAn_201901.pkl -c s__Escherichia_coli -o clade_markers


strainphlan -d shared2/HUMAnN3_db/stable_201901b/db_v30/mpa_v30_CHOCOPhlAn_201901.pkl -s consensus_markers/P10E0.new.pkl -m clade_markers/s__Escherichia_coli.fna -o output -n 8 -c s__Escherichia_coli --mutation_rates


[e] The main inputs samples + references are less than 4
Wed Jan 11 15:09:48 2023: Stop StrainPhlAn 3.0 execution.

这里error说明至少需要4个样本才能运行,添加多几个样本,再重新跑。

5个样本脚本如下:

# first cat together the fq files into merge_fq/test dir 
# second run metaphlan get bowtie mapping file 
cd test_project/
mkdir -p sams/
mkdir -p bowtie2/
mkdir -p profiles/


for f in merge_fq/test/*gz
do
    echo "Running MetaPhlAn on ${f}"
    bn=$(basename ${f})
    bn=`echo $bn|sed s/.fastq.gz//`
    echo $bn
    metaphlan ${f} --input_type fastq --bowtie2db /shared2/HUMAnN3_db/stable_201901b/db_v30/ -s sams/${bn}.sam.bz2 --bowtie2out bowtie2/${bn}.bowtie2.bz2 -o profiles/${bn}_profiled.tsv
done

# third : extract consensus markers 
mkdir -p consensus_markers
sample2markers.py -i sams/*.sam.bz2 -o consensus_markers -n 8

# forth : extract ecoli sequence 

mkdir -p clade_markers
extract_markers.py -d /shared2/HUMAnN3_db/stable_201901b/db_v30/mpa_v30_CHOCOPhlAn_201901.pkl -c s__Escherichia_coli -o clade_markers
mkdir -p output
strainphlan -d /shared2/HUMAnN3_db/stable_201901b/db_v30/mpa_v30_CHOCOPhlAn_201901.pkl -s consensus_markers/*.pkl -m clade_markers/s__Escherichia_coli.fna -o output -n 8 -c s__Escherichia_coli --mutation_rates

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值