知识表示学习常用数据集

dataset#relation#entity# triple(train/valild/test)
WN111138696112581     2609    10544
WN181840943141442     5000     5000
FB131375043316232     5908     23733
FB15K134514951483142     50000     59071
FB1M233821*10^617.5*10^6    50000     177404
FB5M1192538532219193556     5000     59071

WN11

WN11所包含的11种关系

  • 出处(SE)

Bordes A, Weston J, Collobert R, et al. Learning Structured Embeddings of Knowledge Bases[C]//AAAI. 2011, 6(1): 6. PDF

  • 特点
  • As WordNet is composed of words with different meanings, here we term its entities as the concatenation of the word and an number indicating which sense it refers to i.e. auto_1 is the entityencoding the first meaning of the word “auto”.
  • 举例:(_auto_1, _has_instance, _s_u_v_1)

WN18

数据集
WN18所包含的18种关系

  • 出处(SME)

Bordes A, Glorot X, Weston J, et al. A semantic matching energy function for learning with multi-relational data[J]. Machine Learning, 2014, 94(2): 233-259. PDF

  • 特点
  • entities (termed synsets) correspond to senses, and relation types define lexical relations between those senses.
  • As WordNet is composed of words with different meanings, we describe its entities by the concatenation of the word, its part-of-speech tag (‘NN’ for noun, ‘VB’ for verb, ‘JJ’ for adjective and ‘RB’ for adverb) and a digit indicating which sense it refers to i.e. _score_NN_1 is the entity encoding the first meaning of the noun “score”. This version of WordNet is different from that used in Bordes et al. (2011) because the original data has been preprocessed differently: this version contains less entities but more relation types.
  • 举例: (_score_NN_1, _hypernym, _evaluation_NN_1)

FB13

FB13所包含的13种关系

  • 出处(SE)

Bordes A, Weston J, Collobert R, et al. Learning Structured Embeddings of Knowledge Bases[C]//AAAI. 2011, 6(1): 6. PDF

  • 特点

举例:(_marylin_monroe, _profession, _actress)

FB15K

数据集

  • 出处(TransE)

Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[C]//Advances in neural information processing systems. 2013: 2787-2795. PDF

  • 特点
  • To make a small data set to experiment on we selected the subset of entities that are also present in the Wikilinks database and that also have at least 100 mentions in Freebase (for both entities and relationships). We also removed relationships like ’!/people/person/nationality’ which just reverses the head and tail compared to the relationship ’/people/person/nationality’.

FB1M

  • 出处(TransE)

Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[C]//Advances in neural information processing systems. 2013: 2787-2795. PDF

  • 特点

We also wanted to have large-scale data in order to test TransE at scale. Hence, we created another data set from Freebase, by selecting the most frequently occurring 1 million entities.

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值