CCKS 2021:表型-药物-分子多层次知识图谱的链接预测
1. 数据集介绍(不包括测试集)
-
schema.json
: 包含的是所有的实体类型(6种),以及所有的关系类型(7种){ "entity_type": ["disease","drug","symptom","gene/protein", "gene_ontology","pathway"], "relationships": [ ["disease","associated_with","symptom"], ["disease","disease_mapped_to_gene","gene/protein"], ... ] }
-
entities.json
:包含全部的实体,并且每一类实体在一个列表中{ "drug": ["DB08694", "DB04326", ...], "disease":["C0877661","C0155010",...], ... }
-
relationships.json
:包含所有的三元组{"relationships": [ ["DB00855","treats","C0020649"], ["DB00855","treats","C0037274"], ... ] }
-
link_prediction.json
:待补全的三元组(缺失头实体或者尾实体){ "relationships": [ ["DB07774","treats","?"], ["DB01041","treats","?"], ... ] }
2. 评价指标
- MRR:对于一个待补全的三元组,若正确答案排在第n位,那么该三元组的得分就是1/n
M R R = 1 / ∣ Q ∣ ∗ ∑ i = 1 ∣ Q ∣ ( 1 / r a n k i ) MRR = 1/|Q| * \sum_{i=1}^{|Q|}(1/rank_i) MRR=1/∣Q∣∗i=1∑∣Q∣(1/ranki)
其中Q表示link_prediction.json,|Q|表示待补全的三元组的总数
3. 提交结果格式
-
针对每个部分缺失的实体关系对所预测的对应top 10头或尾缺失实体队列集合(按可能性从高到低排序)
{ "results": [ [A1, B1, C1, D1, …], [A2, B2, C2, D2, …], …… ] }
4. 模型
~~待更新