NCBI中对所有原核生物ANI计算的统计结果简单讲解

NCBI中对所有原核生物ANI计算的统计结果简单讲解

来龙去脉还没搞清楚,就先从结果切入。放上一个计算结果的链接https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/
里面有一个文件ANI_report_prokaryotes.txt就是最终的统计结果。
根据README_ANI_report_prokaryotes.txt这个文件里面的介绍可以看到:
1.这个文件是时刻更新的
2.这里面包含了对于Genbank中所有提交的原核生物基因组的ANI信息
3.计算ANI的方法如这篇文章里面讲的一样。

ANI

ANI是average nucleotide identity,也就是平均核苷酸相似度,是在核苷酸水平比较两个基因组亲缘关系的指标。ANI被定义为两个微生物基因组同源片段之间平均的碱基相似度,他的特点是在近缘物种之间有较高的区分度。[1]

就结果本身而言

先从说明文件中了解一下结果文件这些列分别表示什么:

0~8列,组装序列基本信息
0.序列组装的GenBank登录号1.组装序列所使用的RefSeq2.组装序列所对应的分类编号3.组装序列所对应的物种的的分类编号【当这个序列是在亚种层面组装,或者是从一个有自己分类学编号的较老品种中得到时会与前一列【2】编号不同】4.与【2】对应的,组装序列所对应的分类名称5.与【3】对应的,组装序列所对应的物种名6.组装名,对于本次序列组装的识别符7.如果组装序列来自于模式株,则对于它的type进行分类,分为“type”, “neotype”, “pathovar”, “reftype”, “syntype”, “suspected-type”。如果不是来自于模式株则为“na”8.组装序列被排除在RefSeq外的理由。如果组装序列非常可靠则为"na"
genbank-accessionrefseq-accessiontaxidspecies-taxidorganism-namespecies-nameassembly-nameassembly-type-categoryexcluded-from-refseq

【7】的补充说明:
type - the sequences in the genome assembly were derived from type material

neotype - the sequences in the genome assembly were derived from neotype material

pathovar - the sequences in the genome assembly were derived from pathovar
material

reftype - the sequences in the genome assembly were derived from reference
material where type material never was available and is not likely to ever be available

syntype - the sequences in the genome assembly were derived from synonym type material

suspected-type - the type is one of the types listed above but because it does
not match other type-strain assemblies for the same species, or cannot be vetted for some other reason, it is not used to make taxid changes even though it is used to generate ANI data.

【7】【8】的补充说明:
Any type-strain assembly that is untrustworthy as type will have “na” in the assembly-type-category column.
一些从模式株中分离出的序列在【8】中有一些理由不被收录为RefSeq,并且这些理由使这个组装序列不可信,那么【7】中也会给这个序列标为"na"。

9~14列,declared-type-assembly匹配结果
9.这个物种中与该组装序列匹配最好的模式株组装序列,或者以"no-type"表示这个物种没有模式株组装的序列。如果这个组装序列来自于模式株,则是匹配最好的其他模式株组装序列,或者以"same"表示这个模式株只有这一个序列组装10.【9】中序列的分类名称11.对【9】中序列以与【7】相同的type分类方式进行标注。以"no-type"表示该物种没有模式株组装序列,或者以"na"表示这个组装序列就是唯一的模式株组装序列12.组装序列与该物种模式株组装序列的ANI。“na”表示这个物种没有模式株组装序列,或者【13】或【14】中<10%13.【9】中模式株组装序列对该组装序列的覆盖百分比14.该组装序列对【9】中模式株组装序列的覆盖百分比
declared-type-assemblydeclared-type-organism-namedeclared-type-categorydeclared-type-ANIdeclared-type-qcoveragedeclared-type-scoverage
15~24列,best-match-type-assembly匹配结果
15.根据ANI得到的最佳匹配模式株组装序列。“none-found“表示没有模式株组装序列和该组装序列匹配16.【15】中序列对应的物种的分类学标识符17.【15】中序列对应的物种名称18.与【7】中相同的方式标注【15】中的序列的type类别19.该组装序列与【15】中序列的ANI20.该组装序列被【15】中序列所覆盖的百分比21.【15】中序列被该组装序列所覆盖的百分比22.【15】中序列与该组装序列best match的情况23.24.综合【22】和【23】中的表述得到3个级别的分类检验等级。”ok”,“inconclusive”和“failed”
best-match-type-assemblybest-match-species-taxidbest-match-species-namebest-match-type-categorybest-match-type-ANIbest-match-type-qcoveragebest-match-type-scoveragebest-match-statuscommenttaxonomy-check-status

【22】的补充说明:
Values that indicate the species declared for the query assembly is OK:

  • species-match
  • the query assembly matches a type-strain assembly for the declared species.
  • subspecies-match
  • the query assembly matches a type-strain assembly for the declared species and both are the same subspecies.
  • synonym-match
  • the query assembly matches a type-strain assembly for a synonym of the
    declared species. A specialized synonymy list is used to handle difficult
    cases of typing.
  • derived-species-match
  • the query assembly matches a type-strain assembly for a subspecies of the declared species.
  • genus-match
  • the query assembly has an informal species name (usually “sp.” format), and the best-matching type-strain assembly shares the same genus.
  • approved-mismatch
  • the query assembly best matches a type-strain assembly from a different
    species above ANI threshold, but the mismatch was manually reviewed and the declared species was accepted.

Values that indicate the species declared for the query assembly is incorrect:

  • mismatch
  • 尽管这一物种有模式株的序列组装,但是该组装序列仍然匹配到了别的物种的模式株序列。the query assembly best matches a type-strain assembly from a different species, above ANI threshold, even though a type-strain assembly for the declared species is available. GenBank will address the mismatch when high coverage values provide high confidence in the mismatch result, i.e. query coverage and subject coverage are both over 80%.

Values that indicate the ANI data are inconclusive:

  • below-threshold-match
  • the query assembly matches a type-strain assembly for the declared species but the ANI is below the species ANI threshold.
  • below-threshold-mismatch
  • the query assembly best matches a type-strain assembly from a different
    species but the ANI is below the species ANI threshold.
  • low-coverage
  • the query assembly did not match the best-matching type-strain assembly above 10% query-coverage and/or 10% subject-coverage.

【23】的补充说明:

  • Assembly is the type-strain, no match is expected
  • the assembly is the only type-strain assembly for the species, hence it is
    expected that it may not match any other type-strain assembly.
  • Assembly is the type-strain, mismatch is within genus and expected
  • the assembly is the only type-strain assembly for the species, hence it is
    expected that its best match may be to a type-strain assembly from another species on the same genus but with ANI below 98%.
  • Assembly is type-strain, failed to match other type-strains on its species
  • a type-strain assembly is expected to match all other type-strain assemblies on the species.

【24】的补充说明:

OK
  • the ANI result is consistent with the declared species
    The best-match-status is species-match, subspecies-match,
    derived-species-match, synonym-match, genus-match, approved-mismatch, or the comment indicates either that the assembly is the type-strain and no match is expected, or that the assembly is the type-strain, the mismatch is within genus and is expected.
Inconclusive
  • the ANI result is inconclusive
    The best-match-status is low-coverage, below-threshold-match, below-threshold-mismatch, na, or the comment indicates that the assembly is a type-strain that failed to match other type-strains on its species.
Failed
  • the ANI result is inconsistent with the declared species The best-match-status is mismatch and the comment is na.

参考

1,基因组相似性计算:ANI,星空Idealist

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

wwwddd666

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值