metaphlan3和strainphlan3运行记录

土豆西红柿青椒

已于 2023-01-11 23:03:40 修改

阅读量1.6k

点赞数

分类专栏：生物信息文章标签： linux 生物信息

于 2023-01-11 23:03:08 首次发布

本文链接：https://blog.csdn.net/weixin_43151909/article/details/128637302

版权

生物信息专栏收录该内容

24 篇文章

订阅专栏

文章描述了一位用户在使用MetaPhlAn3分析fastq文件时遇到的问题，包括指定数据库版本后的错误和安装新版本的困扰。解决方案涉及更新软件到3.0.8版以修复bug，以及使用mamba安装环境。在运行过程中，用户还遇到了需要至少4个样本的错误，通过合并多个样本解决了此问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

我所用的conda环境是metaphlan3.0.7，之前跑的数据用的都是v30的数据库。但是由于目前metaphlan已经升级到v31，如果直接运行metaphlan，会自动下载v31的数据库。

指定数据库版本后遇到的问题1：

command:

https://forum.biobakery.org/t/install-metaphlan-3/369

(humann) user@d079e601f094:/analysis$ metaphlan P10E0.fastq.gz \
-x mpa_v30_CHOCOPhlAn_201901 \
--input_type fastq -s P10E0.new.sam.bz2 \
--bowtie2out P10E0.new.bowtie2.bz2 \
-o P10E0_new_profiled.tsv

报错：BowTie2 output file detected: /sbidata/projects/lzhang/202012_small_molecule/Analysis/metaphlan_human3/Result/StrainPhlAn/test_project/bowtie2/P10E0.new.bowtie2.bz2

Please provide the size of the metagenome using the --nreads parameter when running MetaPhlAn using SAM files as input
Exiting...

WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.

我用的是fastq文件，怎么会报错bowite2的问题呢，通过查阅发现是软件本身bug。

https://forum.biobakery.org/t/metaphlan3-bowtie2db-output-files-need-the-size-of-the-metagenome-using-the-nreads-parameter/2006/11

来自于帖子的解决办法：https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.0

I merged the PR, it should not take long to be available now

fbeghiniSegata Lab member

May '21

I merged the PR, it should not take long to be available now

总结下解决办法：since there is an error in 3.0.7, I will install 3.0.8 in my own conda environment. 重新安装3.0.8的环境。

问题2：安装metaphlan3的旧环境（因为现在已经升级到4了）

使用conda安装速度在我的计算机上（Ubuntu）太慢，选择用mamba安装。

conda create --name mpa_strainphlan3

conda install -c conda-forge mamba

mamba install -c bioconda metaphlan=3.0.8 python=3.10.8

Updating specs:

- metaphlan=3.0.8

重新跑上面的命令行，仍然报错

Please provide the size of the metagenome using the --nreads parameter when running MetaPhlAn using SAM files as input
Exiting...

WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.

尝试安装3.0.9

mamba install -c bioconda metaphlan=3.0.9
Traceback (most recent call last):
File "/home/l/miniconda3/envs/mpa_strainphlan3/bin/mamba", line 7, in <module>
from mamba.mamba import main
File "/home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/mamba/mamba.py", line 49, in <module>
import libmambapy as api
File "/home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/libmambapy/__init__.py", line 7, in <module>
raise e
File "/home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/libmambapy/__init__.py", line 4, in <module>
from libmambapy.bindings import * # noqa: F401,F403
ImportError: /home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/libmambapy/../../../libmamba.so.2: undefined symbol: archive_write_add_filter_zstd

这个报错解决方法：Undefined symbol: archive_write_add_filter_zstd · Issue #1775 · mamba-org/mamba · GitHub

conda install libarchive==3.5.2 -c conda-forge

升级metaphlan3.0.9

mamba install -c bioconda metaphlan=3.0.9

步骤一：环境安装好之后就是先跑metaphlan3

metaphlan P10E0.new.bowtie2.bz2 \
--bowtie2db HUMAnN3_db/stable_201901b/db_v30/ \
--input_type bowtie2out \
-s P10E0.new.sam.bz2 --bowtie2out P10E0.new.bowtie2.bz2 \
-o P10E0_new_profiled.tsv
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.

warnings 没关系，只是通知你有些species它们可以有别的名字。

These are from MetaPhlAn, they just inform you that some species found can have “alternative” taxonomies (the list of species in the additional_species column). All the species listed under additional_species are not represented by any markers but they were found to be <5% ANI distant from the “reference” species (clade_name).

Unexpected output (format) - #2 by fbeghini - MetaPhlAn - The bioBakery help forum

这里的bowtie2db的数据库也可以用--index mpa_v30_CHOCOPhlAn_201901 软件会自动下载。或者自己从dropbox或者googledrive或者zento下载再解压就可以了，索引metaphlan自己会建立好。

MetaPhlAn 3.0 · biobakery/MetaPhlAn Wiki · GitHub 这里又提到数据库下载地址。

比如 zento数据库

curl -o mpa_v30_CHOCOPhlAn_201901.tar "https://zenodo.org/record/3957592/files/mpa_v30_CHOCOPhlAn_201901.tar?download=1"

步骤二：strainphlan3

sample2markers.py -i sams/P10E0.new.sam.bz2 \
	-o consensus_markers/P10E0.new.pkl -n 8

提取ecoli的序列

extract_markers.py -d /opt/conda/envs/humann/lib/python3.9/site-packages/metaphlan/metaphlan_databases/mpa_v31_CHOCOPhlAn_201901.pkl -c s__Escherichia_coli -o clade_markers


strainphlan -d shared2/HUMAnN3_db/stable_201901b/db_v30/mpa_v30_CHOCOPhlAn_201901.pkl -s consensus_markers/P10E0.new.pkl -m clade_markers/s__Escherichia_coli.fna -o output -n 8 -c s__Escherichia_coli --mutation_rates


[e] The main inputs samples + references are less than 4
Wed Jan 11 15:09:48 2023: Stop StrainPhlAn 3.0 execution.

这里error说明至少需要4个样本才能运行，添加多几个样本，再重新跑。

5个样本脚本如下：

# first cat together the fq files into merge_fq/test dir 
# second run metaphlan get bowtie mapping file 
cd test_project/
mkdir -p sams/
mkdir -p bowtie2/
mkdir -p profiles/


for f in merge_fq/test/*gz
do
    echo "Running MetaPhlAn on ${f}"
    bn=$(basename ${f})
    bn=`echo $bn|sed s/.fastq.gz//`
    echo $bn
    metaphlan ${f} --input_type fastq --bowtie2db /shared2/HUMAnN3_db/stable_201901b/db_v30/ -s sams/${bn}.sam.bz2 --bowtie2out bowtie2/${bn}.bowtie2.bz2 -o profiles/${bn}_profiled.tsv
done

# third : extract consensus markers 
mkdir -p consensus_markers
sample2markers.py -i sams/*.sam.bz2 -o consensus_markers -n 8

# forth : extract ecoli sequence 

mkdir -p clade_markers
extract_markers.py -d /shared2/HUMAnN3_db/stable_201901b/db_v30/mpa_v30_CHOCOPhlAn_201901.pkl -c s__Escherichia_coli -o clade_markers
mkdir -p output
strainphlan -d /shared2/HUMAnN3_db/stable_201901b/db_v30/mpa_v30_CHOCOPhlAn_201901.pkl -s consensus_markers/*.pkl -m clade_markers/s__Escherichia_coli.fna -o output -n 8 -c s__Escherichia_coli --mutation_rates