我所用的conda环境是metaphlan3.0.7,之前跑的数据用的都是v30的数据库。 但是由于目前metaphlan已经升级到v31,如果直接运行metaphlan,会自动下载v31的数据库。
指定数据库版本后遇到的问题1:
command:
https://forum.biobakery.org/t/install-metaphlan-3/369
(humann) user@d079e601f094:/analysis$ metaphlan P10E0.fastq.gz \
-x mpa_v30_CHOCOPhlAn_201901 \
--input_type fastq -s P10E0.new.sam.bz2 \
--bowtie2out P10E0.new.bowtie2.bz2 \
-o P10E0_new_profiled.tsv
报错:BowTie2 output file detected: /sbidata/projects/lzhang/202012_small_molecule/Analysis/metaphlan_human3/Result/StrainPhlAn/test_project/bowtie2/P10E0.new.bowtie2.bz2
Please provide the size of the metagenome using the --nreads parameter when running MetaPhlAn using SAM files as input
Exiting...
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.
我用的是fastq文件,怎么会报错bowite2的问题呢,通过查阅发现是软件本身bug。
https://forum.biobakery.org/t/metaphlan3-bowtie2db-output-files-need-the-size-of-the-metagenome-using-the-nreads-parameter/2006/11
来自于帖子的解决办法:https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.0
I merged the PR, it should not take long to be available now
fbeghiniSegata Lab member
I merged the PR, it should not take long to be available now
总结下解决办法:since there is an error in 3.0.7, I will install 3.0.8 in my own conda environment. 重新安装3.0.8的环境 。
问题2:安装metaphlan3的旧环境(因为现在已经升级到4了)
使用conda安装速度在我的计算机上(Ubuntu)太慢,选择用mamba安装。
conda create --name mpa_strainphlan3
conda install -c conda-forge mamba
mamba install -c bioconda metaphlan=3.0.8 python=3.10.8
Updating specs:
- metaphlan=3.0.8
重新跑上面的命令行,仍然报错
Please provide the size of the metagenome using the --nreads parameter when running MetaPhlAn using SAM files as input
Exiting...
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.
尝试安装3.0.9
mamba install -c bioconda metaphlan=3.0.9
Traceback (most recent call last):
File "/home/l/miniconda3/envs/mpa_strainphlan3/bin/mamba", line 7, in <module>
from mamba.mamba import main
File "/home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/mamba/mamba.py", line 49, in <module>
import libmambapy as api
File "/home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/libmambapy/__init__.py", line 7, in <module>
raise e
File "/home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/libmambapy/__init__.py", line 4, in <module>
from libmambapy.bindings import * # noqa: F401,F403
ImportError: /home/l/miniconda3/envs/mpa_strainphlan3/lib/python3.10/site-packages/libmambapy/../../../libmamba.so.2: undefined symbol: archive_write_add_filter_zstd
这个报错解决方法:Undefined symbol: archive_write_add_filter_zstd · Issue #1775 · mamba-org/mamba · GitHub
conda install libarchive==3.5.2 -c conda-forge
升级metaphlan3.0.9
mamba install -c bioconda metaphlan=3.0.9
步骤一:环境安装好之后就是先跑metaphlan3
metaphlan P10E0.new.bowtie2.bz2 \
--bowtie2db HUMAnN3_db/stable_201901b/db_v30/ \
--input_type bowtie2out \
-s P10E0.new.sam.bz2 --bowtie2out P10E0.new.bowtie2.bz2 \
-o P10E0_new_profiled.tsv
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.
warnings 没关系,只是通知你有些species它们可以有别的名字。
These are from MetaPhlAn, they just inform you that some species found can have “alternative” taxonomies (the list of species in the additional_species
column). All the species listed under additional_species
are not represented by any markers but they were found to be <5% ANI distant from the “reference” species (clade_name
).
Unexpected output (format) - #2 by fbeghini - MetaPhlAn - The bioBakery help forum
这里的bowtie2db的数据库也可以用--index mpa_v30_CHOCOPhlAn_201901 软件会自动下载。或者自己从dropbox或者googledrive或者zento下载再解压就可以了,索引metaphlan自己会建立好。
MetaPhlAn 3.0 · biobakery/MetaPhlAn Wiki · GitHub 这里又提到数据库下载地址。
比如 zento数据库
curl -o mpa_v30_CHOCOPhlAn_201901.tar "https://zenodo.org/record/3957592/files/mpa_v30_CHOCOPhlAn_201901.tar?download=1"
步骤二 :strainphlan3
sample2markers.py -i sams/P10E0.new.sam.bz2 \
-o consensus_markers/P10E0.new.pkl -n 8
提取ecoli的序列
extract_markers.py -d /opt/conda/envs/humann/lib/python3.9/site-packages/metaphlan/metaphlan_databases/mpa_v31_CHOCOPhlAn_201901.pkl -c s__Escherichia_coli -o clade_markers
strainphlan -d shared2/HUMAnN3_db/stable_201901b/db_v30/mpa_v30_CHOCOPhlAn_201901.pkl -s consensus_markers/P10E0.new.pkl -m clade_markers/s__Escherichia_coli.fna -o output -n 8 -c s__Escherichia_coli --mutation_rates
[e] The main inputs samples + references are less than 4
Wed Jan 11 15:09:48 2023: Stop StrainPhlAn 3.0 execution.
这里error说明至少需要4个样本才能运行,添加多几个样本,再重新跑。
5个样本脚本如下:
# first cat together the fq files into merge_fq/test dir
# second run metaphlan get bowtie mapping file
cd test_project/
mkdir -p sams/
mkdir -p bowtie2/
mkdir -p profiles/
for f in merge_fq/test/*gz
do
echo "Running MetaPhlAn on ${f}"
bn=$(basename ${f})
bn=`echo $bn|sed s/.fastq.gz//`
echo $bn
metaphlan ${f} --input_type fastq --bowtie2db /shared2/HUMAnN3_db/stable_201901b/db_v30/ -s sams/${bn}.sam.bz2 --bowtie2out bowtie2/${bn}.bowtie2.bz2 -o profiles/${bn}_profiled.tsv
done
# third : extract consensus markers
mkdir -p consensus_markers
sample2markers.py -i sams/*.sam.bz2 -o consensus_markers -n 8
# forth : extract ecoli sequence
mkdir -p clade_markers
extract_markers.py -d /shared2/HUMAnN3_db/stable_201901b/db_v30/mpa_v30_CHOCOPhlAn_201901.pkl -c s__Escherichia_coli -o clade_markers
mkdir -p output
strainphlan -d /shared2/HUMAnN3_db/stable_201901b/db_v30/mpa_v30_CHOCOPhlAn_201901.pkl -s consensus_markers/*.pkl -m clade_markers/s__Escherichia_coli.fna -o output -n 8 -c s__Escherichia_coli --mutation_rates