在看疾病与基因组的关系时发现了该数据库,其中有一些分值的计算很有趣,特此记录下来供后续参考学习。
本文资料来自于DisGeNET - a database of gene-disease associations 的About页面。
DisGeNET Metrics DisGeNET指标
We have developed two scores to rank the gene-disease, and the variant-disease associations according to their level of evidence. These scores range from 0 to 1, and take into account the number and type of sources (level of curation, model organisms), and the number of publications supporting the association.
我们开发了两个分数(two scores)来对基因疾病(gene-disease)进行排序(rank),并根据(according to)其证据水平(level of evidence)对变异性疾病(variant-disease)关联(associations)进行排序。这些分数范围从0到1(range from 0 to 1),并考虑到(take into account)来源的数量和类型(治疗水平、模式生物)以及支持该关联的出版物数量。
GDA Score GDA得分
The DisGeNET Score (S) for GDAs is computed according to:
GDAs的DisGeNET分数根据以下公式计算:otherwise[ˈʌðəwaɪz]否则,不然,除此以外

where:
-
N sources i is the number of CURATED sources supporting a GDA Nsourcesi是支持GDA的CURATED来源的数量
i ∈ CGI, CLINGEN, GENOMICS ENGLAND, CTD, PSYGENET, ORPHANET, UNIPROT i属于各种数据库
where:
-
j ∈ Rat, Mouse from RGD, MGD, and CTD j属于大鼠,小鼠
where:
-
k ∈ HPO, CLINVAR, GWASCAT, GWASDB
where:
-
Npubs is the number of publications supporting a GDA in the sources LHGDN and BEFREE N pubs是来源LHGDN和BEFREE中支持GDA的出版物数量
#整体算下来该得分最高为1,貌似是越高的各种数据库和文献支持度相对越高。
Distribution of the DisGeNET score for GDAs according to the number of sources reporting the association 根据报告关联的来源数量,GDA的DisGeNET得分分布(图略)
VDA Score VDA得分
The DisGeNET Score (S) for VDAs is computed according to:
VDA的DisGeNET分数根据以下公式计算:
where:
-
Nsourcesi is the number of CURATED sources supporting a VDA
i ∈ UNIPROT,CLINVAR, GWASCAT, GWASDB
where:
-
Npubs is the number of publication supporting a VDA in the source in BeFree BeFree貌似是一个该数据库支持者开发的检索套路?
Distribution of the DisGeNET Score for VDAs according to the number of sources reporting the association 根据报告关联的来源数量,VDA的DisGeNET得分分布(图略)
Disease Specificity Index 疾病特异性指数
There are genes (or variants) that are associated wiht multiple diseases (e.g. TNF) while others are associated with a small set of diseases or even to a single disease. The Disease Specificity Index (DSI) is a measure of this property of the genes (and variants). It reflects if a gene (or variant) is associated to several or fewer diseases. It is computed according to:
有些基因(或变异)与多种疾病(如肿瘤坏死因子)相关,而另一些基因则与一小部分疾病(a small set of diseases)甚至单独的一种疾病(a single disease)相关。疾病特异性指数(Disease Specificity Index,DSI)是衡量(measure)基因(和变异)这种特性的一个指标。它反映(reflects)了一个基因(或变异)是否与几种或更少的疾病相关。根据(according to)以下公式计算:
where:
- - N d is the number of diseases associated to the gene/variant #N d是与基因/变体相关的疾病数量
- N T is the total number of diseases in DisGeNET #N T是DisGeNET中的疾病总数
The DSI ranges from 0.25 to 1. Example: TNF, associated to more than 1,500 diseases, has a DSI of 0.263, while HCN2 is associated to one disease, with a DSI of 1.
DSI范围从0.25到1。例如:与1500多种疾病相关的TNF的DSI为0.263,而HCN2与一种疾病相关,DSI为1。 说明DSI越小的基因或变异与越多的疾病相关?
If the DSI is empty, it implies that