物种分类结果文件 summary.tsv ,tsv文件即tab separated values ;即 “制表符分隔值”,tsv的拓展名有多种,还包括txt等等,都可以用excel来打开。
summary.tsv : Classifications provided by the GTDB-Tk are in the files <prefix>.bac120.summary.tsv and <prefix>.ar53.summary.tsv for bacterial and archaeal genomes, respectively. These are tab separated files with the following columns:
-
user_genome: Unique identifier of query genome taken from the FASTA file of the genome.
-
user_genome:从基因组的 FASTA 文件中获取的查询基因组的唯一标识符。
-
classification: GTDB taxonomy string inferred by the GTDB-Tk. An unassigned species (i.e.,
s__
) indicates that the query genome is either i) placed outside a named genus or ii) the ANI to the closest intra-genus reference genome with an AF >=0.65 is not within the species-specific ANI circumscription radius. -
分类:由 GTDB-Tk 推断的 GTDB 分类字符串。未分配的物种(即s__)表示查询基因组是i)放置在命名属之外或ii)AF>=0.65的最接近属内参考基因组的ANI不在物种特异性ANI限制半径内。
-
fastani_reference: indicates the accession number of the reference genome (species) to which a user genome was assigned based on ANI and AF. ANI values are only calculated when a query genome is placed within a defined genus and are evaluated for all reference genomes in that genus.
-
fastani_reference:表示基于ANI和AF分配用户基因组的参考基因组(物种)的入库号。仅当将查询基因组放置在定义的属中并针对该属中的所有参考基因组进行评估时,才会计算 ANI 值。
-
fastani_reference_radius: indicates the species-specific ANI circumscription radius of the reference genomes used to determine if a query genome should be classified to the same species as the reference.
-
fastani_reference_radius:表示参考基因组的物种特异性ANI限制半径,用于确定查询基因组是否应分类为与参考相同的物种。
-
fastani_taxonomy: indicates the GTDB taxonomy of the above reference genome.
-
fastani_taxonomy:表示上述参考基因组的GTDB分类。
-
fastani_ani: indicates the ANI between the query and above reference genome.
-
表示查询与上述参考基因组之间的 ANI。
-
fastani_af: indicates the alignment fraction (AF) between the query and above reference genome.
-
表示查询与上述参考基因组之间的比对分数 (AF)。
-
closest_placement_reference: indicates the accession number of the reference genome when a genome is placed on a terminal branch.
-
表示当一个基因组被放在一个终端分支上时,参考基因组的加入号。
-
closest_placement_taxonomy: indicates the GTDB taxonomy of the above reference genome.
-
表示上述参考基因组的GTDB分类法。
-
closest_placement_ani: indicates the ANI between the query and above reference genome.
-
表示查询和上述参考基因组之间的ANI。
-
closest_placement_af: indicates the alignment fraction (AF) between the query and above reference genome.
-
表示查询基因组和上述参考基因组之间的排列比例(AF)。
-
pplacer_taxonomy: indicates the pplacer taxonomy of the query genome.
-
表示查询基因组的pplacer分类法。
-
classification_method: indicates the rule used to classify the genome. This field will be one of: i) ANI, indicating a species assignement was based solely on the calculated ANI and AF with a reference genome; ii) ANI/Placement, indicating a species assignment was made based on both ANI and the placement of the genome in the reference tree; iii) taxonomic classification fully defined by topology, indicating that the classification could be determine based solely on the genome’s position in the reference tree; or iv) taxonomic novelty determined using RED, indicating that the relative evolutionary divergence (RED) and placement of the genome in the reference tree were used to determine the classification.
-
表示用于对基因组进行分类的规则。这个领域将是以下之一:i)ANI,表示一个物种的分配完全基于计算出的ANI和AF与参考基因组;ii)ANI/Placement,表示一个物种的分配是基于ANI和基因组在参考树中的位置;iii)完全由拓扑学定义的分类,表示分类可以完全基于基因组在参考树中的位置来确定;或者iv)使用RED确定的分类学新颖性,表示相对进化的分歧
-
note: provides additional information regarding the classification of the genome. Currently this field is only filled out when a species determination is made and indicates if the placement of the genome in the reference tree and closest reference according to ANI/AF are the same (congruent) or different (incongruent).
-
提供了关于基因组分类的额外信息。目前,这个字段只在进行物种测定时填写,并表明根据ANI/AF,基因组在参考树中的位置和最接近的参考文献是否相同(一致)或不同(不一致)。
-
other_related_references: lists up to the 100 closest reference genomes based on ANI. ANI calculations are only performed between a query genome and reference genomes in the same genus.
-
列出了基于ANI的100个最接近的参考基因组。ANI计算只在同一属的查询基因组和参考基因组之间进行。
-
msa_percent: indicates the percentage of the MSA spanned by the genome (i.e. percentage of columns with an amino acid).
-
表示基因组所跨越的MSA的百分比(即有氨基酸的列的百分比)。
-
red_value: indicates, when required, the relative evolutionary divergence (RED) for a query genome. RED is not calculated when a query genome can be classified based on ANI.
-
表示,当需要时,查询基因组的相对进化分歧(RED)。当查询基因组可以根据ANI进行分类时,RED不被计算。
-
warnings: indicates unusual characteristics of the query genome that may impact the taxonomic assignment.
-
表示查询基因组的不寻常特征,可能会影响分类学的分配。
Produced by