QIIME2简介
QIIME是微生物组领域最广泛使用的分析流程,是一款强大、可扩展和去中心化的微生物组分析平台。QIIME 2从原始DNA序列开始分析,直接获取出版级的统计和图片结果。
官方安装文档:https://docs.qiime2.org/2024.2/install/native/#miniconda
1. 安装QIIME 2 Amplicon
推荐使用conda根据yml文件创建一个新的环境,以安装2024.2版本为例。
# 下载yml
wget https://data.qiime2.org/distro/amplicon/qiime2-amplicon-2024.2-py38-linux-conda.yml
# conda创建环境
conda env create -n qiime2-amplicon-2024.2 --file qiime2-amplicon-2024.2-py38-linux-conda.yml
# 查看环境列表
conda env list
# 激活qiime2环境
conda activate qiime2-amplicon-2024.2
# 检查qiime是否安装成功
qiime --help
2. 分类器模型下载
网址:https://docs.qiime2.org/2024.2/data-resources/
Taxonomy classifiers for use with q2-feature-classifier
此分类器为朴素贝叶斯分类器,在根据你的特定样品制备和测序参数(包括用于扩增的引物和序列读取的长度)进行训练时表现最佳。因此,一般来说,您应该按照使用q2-feature-classifier训练特征分类器中的说明来训练您自己的分类分类器。
# Silva 138 99% OTUs full-length sequences (MD5: b8609f23e9b17bd4a1321a8971303310)
wget https://data.qiime2.org/2024.2/common/silva-138-99-nb-classifier.qza
# Silva 138 99% OTUs from 515F/806R region of sequences (MD5: e05afad0fe87542704be96ff483824d4)
wget https://data.qiime2.org/2024.2/common/silva-138-99-515-806-nb-classifier.qza
# Greengenes2 2022.10 full length sequences (MD5: 98d34227fe67b34f62b464466cca4ffa)
wget https://data.qiime2.org/classifiers/greengenes/gg_2022_10_backbone_full_length.nb.qza
# Greengenes2 2022.10 from 515F/806R region of sequences (MD5: 43de361005ae6dcae61b078c0c835021)
wget https://data.qiime2.org/classifiers/greengenes/gg_2022_10_backbone.v4.nb.qza
对于Silva 138,如果你使用任何这些预训练的分类器,请引用以下参考文献:
- Michael S Robeson II, Devon R O’Rourke, Benjamin D Kaehler, MichalZiemski, Matthew R Dillon, Jeffrey T Foster, Nicholas A Bokulich RESCRIPt: Reproducible sequence taxonomy reference database management for the masses. bioRxiv 2020.10.05.326504; doi: https://doi.org/10.1101/2020.10.05.326504
- Bokulich, N.A., Kaehler, B.D., Rideout, J.R. et al. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome 6, 90 (2018).
- See the SILVA website for the latest citation information for this reference database.
对于Greengenes2则可以查看以下文献:
- McDonald, D. et al. Greengenes2 unifies microbial data in a single reference tree. Nature Biotechnology (2023). https://www.nature.com/articles/s41587-023-01845-1
Weighted Taxonomic Classifiers
如果你的样本来自软件测试的14种栖息地类型中的任何一种,这些加权分类器分类精度还可靠。如果你的样本不是来自这些栖息地之一,就需要根据栖息地进行权重训练。
参考github: https://github.com/BenKaehler/readytowear
# Weighted Silva 138 99% OTUs full-length sequences (MD5: 48965bb0a9e63c411452a460d92cfc04)
wget https://data.qiime2.org/2024.2/common/silva-138-99-nb-weighted-classifier.qza
# Weighted Greengenes 13_8 99% OTUs full-length sequences (MD5: 2baf87fce174c5f6c22a4c4086b1f1fe)
wget https://data.qiime2.org/2024.2/common/gg-13-8-99-nb-weighted-classifier.qza
# Weighted Greengenes 13_8 99% OTUs from 515F/806R region of sequences (MD5: 8fb808c4af1c7526a2bdfaafa764e21f)
wget https://data.qiime2.org/2024.2/common/gg-13-8-99-515-806-nb-weighted-classifier.qza
3. 数据库下载
标记基因数据库
# Greengenes (16S rRNA)下载网址
http://ftp.microbio.me/greengenes_release/2022.10/
# Silva (16S/18S rRNA)
https://www.arb-silva.de/download/archive/qiime
# 最新数据库 2Gb
wget -c -b https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_132_release.zip
# 预先格式化的SILVA参考序列和分类文件,都是使用REALNT处理的
wget https://data.qiime2.org/2024.2/common/silva-138-99-seqs.qza
wget https://data.qiime2.org/2024.2/common/silva-138-99-tax.qza
wget https://data.qiime2.org/2024.2/common/silva-138-99-seqs-515-806.qza
wget https://data.qiime2.org/2024.2/common/silva-138-99-tax-515-806.qza
SEPP参考数据库
# Silva 128 SEPP reference database (MD5: 7879792a6f42c5325531de9866f5c4de)
wget -c -b https://data.qiime2.org/2024.2/common/sepp-refs-silva-128.qza
# Greengenes 13_8 SEPP reference database (MD5: 9ed215415b52c362e25cb0a8a46e1076)
wget -c -b https://data.qiime2.org/2024.2/common/sepp-refs-gg-13-8.qza
其他数据库资源
UNITE(真菌ITS)、Qiita(公共微生物组数据)