pyani安装及应用-计算ANI

小黑生信笔记

已于 2024-03-19 13:29:47 修改

阅读量739

点赞数 11

分类专栏：生物信息文章标签： linux github

于 2024-03-19 12:55:48 首次发布

本文链接：https://blog.csdn.net/weixin_47520675/article/details/136836311

版权

生物信息专栏收录该内容

1 篇文章 0 订阅

订阅专栏

本文介绍了Pyani库，一个用于计算平均核苷酸相似度（ANI）的Python工具，重点讲解了如何在Python环境中安装Pyani，以及如何使用ANIm、ANIb、ANIblastall和TETRA方法进行基因组比较。提供了详细的命令行参数说明和示例结果。

摘要由CSDN通过智能技术生成

ANI简介

平均核苷酸相似度（Average Nucleotide Identity，ANI）指两个微生物基因组同源片段之间平均的碱基相似度，是在核苷酸水平比较两个基因组亲缘关系的指标，在近缘物种之间有较高的区分度。

计算原理（链接）和计算方法汇总（链接）可参见已发布内容。

pyani安装

# 建python 3.8环境
conda create -n pyani-env python=3.8
#激活环境
conda activate pyani-env
# 安装pyani（若运行 conda install pyani 失败，直接使用下面conda官网安装代码）
conda install bioconda::pyani

pyani使用

average_nucleotide_identity.py -i <输入文件夹地址> -o <输出文件夹地址> -m ANIm -g
average_nucleotide_identity.py -i <输入文件夹地址> -o <输出文件夹地址> -m ANIb -g
average_nucleotide_identity.py -i <输入文件夹地址> -o <输出文件夹地址> -m ANIblastall -g
average_nucleotide_identity.py -i <输入文件夹地址> -o <输出文件夹地址> -m TETRA -g
#亲缘关系较近的菌株计算ANI使用ANIm法，较远使用ANIb法
#ANIm: uses MUMmer (NUCmer) to align the input sequences
#ANIb: uses BLASTN+ to align 1020nt fragments of the input sequences
#ANIblastall: uses legacy BLASTN to align 10al20nt fragments of the input sequences
#TETRA: calculates tetranucleotide frequencies of each input sequence

结果

例：用ANIm方法，产生文件名为ANIm_alignment_coverage、ANIm_alignment_lengths、ANIm_hadamard、ANIm_percentage_identity、ANIm_similarity_errors的tab文件和热图。ANIm_percentage_identity是我们一般用到的ANI值。

全部参数

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -o OUTDIRNAME, --outdir OUTDIRNAME
                        Output directory (required)
  -i INDIRNAME, --indir INDIRNAME
                        Input directory name (required)
  -v, --verbose         Give verbose output
  -f, --force           Force file overwriting
  -s FRAGSIZE, --fragsize FRAGSIZE
                        Sequence fragment size for ANIb (default 1020)
  -l LOGFILE, --logfile LOGFILE
                        Logfile location
  --skip_nucmer         Skip NUCmer runs, for testing (e.g. if output already
                        present)
  --skip_blastn         Skip BLASTN runs, for testing (e.g. if output already
                        present)
  --noclobber           Don't nuke existing files
  --nocompress          Don't compress/delete the comparison output
  -g, --graphics        Generate heatmap of ANI 生成热图
  --gformat GFORMAT     Graphics output format(s) [pdf|png|jpg|svg] (default
                        pdf,png,eps meaning three file formats)
  --gmethod {mpl,seaborn}
                        Graphics output method (default mpl)
  --labels LABELS       Path to file containing sequence labels
  --classes CLASSES     Path to file containing sequence classes
  -m {ANIm,ANIb,ANIblastall,TETRA}, --method {ANIm,ANIb,ANIblastall,TETRA}
                        ANI method (default ANIm)
  --scheduler {multiprocessing,SGE}
                        Job scheduler (default multiprocessing, i.e. locally)
  --workers WORKERS     Number of worker processes for multiprocessing
                        (default zero, meaning use all available cores)
  --SGEgroupsize SGEGROUPSIZE
                        Number of jobs to place in an SGE array group (default
                        10000)
  --SGEargs SGEARGS     Additional arguments for qsub
  --maxmatch            Override MUMmer to allow all NUCmer matches
  --nucmer_exe NUCMER_EXE
                        Path to NUCmer executable
  --filter_exe FILTER_EXE
                        Path to delta-filter executable
  --blastn_exe BLASTN_EXE
                        Path to BLASTN+ executable
  --makeblastdb_exe MAKEBLASTDB_EXE
                        Path to BLAST+ makeblastdb executable
  --blastall_exe BLASTALL_EXE
                        Path to BLASTALL executable
  --formatdb_exe FORMATDB_EXE
                        Path to BLAST formatdb executable
  --write_excel         Write Excel format output tables
  --rerender            Rerender graphics output without recalculation
  --subsample SUBSAMPLE
                        Subsample a percentage [0-1] or specific number (1-n)
                        of input sequences
  --seed SEED           Set random seed for reproducible subsampling.
  --jobprefix JOBPREFIX
                        Prefix for SGE jobs (default ANI).

小黑生信笔记

关注

11
点赞
踩
8

收藏

觉得还不错? 一键收藏
1
评论
pyani安装及应用-计算ANI

平均核苷酸相似度（Average Nucleotide Identity，ANI）指两个微生物基因组同源片段之间平均的碱基相似度，是在核苷酸水平比较两个基因组亲缘关系的指标，在近缘物种之间有较高的区分度。计算原理（链接）和计算方法汇总（链接）可参见已发布内容。
复制链接

扫一扫