目录
ANI简介
平均核苷酸相似度(Average Nucleotide Identity,ANI)指两个微生物基因组同源片段之间平均的碱基相似度,是在核苷酸水平比较两个基因组亲缘关系的指标,在近缘物种之间有较高的区分度。
计算原理(链接) 和 计算方法汇总(链接) 可参见已发布内容。
pyani安装
# 建python 3.8环境
conda create -n pyani-env python=3.8
#激活环境
conda activate pyani-env
# 安装pyani(若运行 conda install pyani 失败,直接使用下面conda官网安装代码)
conda install bioconda::pyani
pyani使用
average_nucleotide_identity.py -i <输入文件夹地址> -o <输出文件夹地址> -m ANIm -g
average_nucleotide_identity.py -i <输入文件夹地址> -o <输出文件夹地址> -m ANIb -g
average_nucleotide_identity.py -i <输入文件夹地址> -o <输出文件夹地址> -m ANIblastall -g
average_nucleotide_identity.py -i <输入文件夹地址> -o <输出文件夹地址> -m TETRA -g
#亲缘关系较近的菌株计算ANI使用ANIm法,较远使用ANIb法
#ANIm: uses MUMmer (NUCmer) to align the input sequences
#ANIb: uses BLASTN+ to align 1020nt fragments of the input sequences
#ANIblastall: uses legacy BLASTN to align 10al20nt fragments of the input sequences
#TETRA: calculates tetranucleotide frequencies of each input sequence
结果
例:用ANIm方法,产生文件名为ANIm_alignment_coverage、ANIm_alignment_lengths、ANIm_hadamard、ANIm_percentage_identity、ANIm_similarity_errors的tab文件和热图。ANIm_percentage_identity是我们一般用到的ANI值。
全部参数
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-o OUTDIRNAME, --outdir OUTDIRNAME
Output directory (required)
-i INDIRNAME, --indir INDIRNAME
Input directory name (required)
-v, --verbose Give verbose output
-f, --force Force file overwriting
-s FRAGSIZE, --fragsize FRAGSIZE
Sequence fragment size for ANIb (default 1020)
-l LOGFILE, --logfile LOGFILE
Logfile location
--skip_nucmer Skip NUCmer runs, for testing (e.g. if output already
present)
--skip_blastn Skip BLASTN runs, for testing (e.g. if output already
present)
--noclobber Don't nuke existing files
--nocompress Don't compress/delete the comparison output
-g, --graphics Generate heatmap of ANI 生成热图
--gformat GFORMAT Graphics output format(s) [pdf|png|jpg|svg] (default
pdf,png,eps meaning three file formats)
--gmethod {mpl,seaborn}
Graphics output method (default mpl)
--labels LABELS Path to file containing sequence labels
--classes CLASSES Path to file containing sequence classes
-m {ANIm,ANIb,ANIblastall,TETRA}, --method {ANIm,ANIb,ANIblastall,TETRA}
ANI method (default ANIm)
--scheduler {multiprocessing,SGE}
Job scheduler (default multiprocessing, i.e. locally)
--workers WORKERS Number of worker processes for multiprocessing
(default zero, meaning use all available cores)
--SGEgroupsize SGEGROUPSIZE
Number of jobs to place in an SGE array group (default
10000)
--SGEargs SGEARGS Additional arguments for qsub
--maxmatch Override MUMmer to allow all NUCmer matches
--nucmer_exe NUCMER_EXE
Path to NUCmer executable
--filter_exe FILTER_EXE
Path to delta-filter executable
--blastn_exe BLASTN_EXE
Path to BLASTN+ executable
--makeblastdb_exe MAKEBLASTDB_EXE
Path to BLAST+ makeblastdb executable
--blastall_exe BLASTALL_EXE
Path to BLASTALL executable
--formatdb_exe FORMATDB_EXE
Path to BLAST formatdb executable
--write_excel Write Excel format output tables
--rerender Rerender graphics output without recalculation
--subsample SUBSAMPLE
Subsample a percentage [0-1] or specific number (1-n)
of input sequences
--seed SEED Set random seed for reproducible subsampling.
--jobprefix JOBPREFIX
Prefix for SGE jobs (default ANI).