知识表示学习常用数据集

最新推荐文章于 2024-09-28 10:17:30 发布

Una_zh

最新推荐文章于 2024-09-28 10:17:30 发布

阅读量7k

点赞数 3

分类专栏：知识图谱

本文链接：https://blog.csdn.net/jingOlivia/article/details/85142789

版权

知识图谱专栏收录该内容

1 篇文章 0 订阅

订阅专栏

dataset	#relation	#entity	# triple(train/valild/test)
WN11	11	38696	112581 2609 10544
WN18	18	40943	141442 5000 5000
FB13	13	75043	316232 5908 23733
FB15K	1345	14951	483142 50000 59071
FB1M	23382	1*10^6	17.5*10^6 50000 177404
FB5M	1192	5385322	19193556 5000 59071

WN11

WN11所包含的11种关系

出处（SE）

Bordes A, Weston J, Collobert R, et al. Learning Structured Embeddings of Knowledge Bases[C]//AAAI. 2011, 6(1): 6. PDF

特点

As WordNet is composed of words with different meanings, here we term its entities as the concatenation of the word and an number indicating which sense it refers to i.e. auto_1 is the entityencoding the first meaning of the word “auto”.
举例：(_auto_1, _has_instance, _s_u_v_1)

WN18

数据集
WN18所包含的18种关系

出处（SME）

Bordes A, Glorot X, Weston J, et al. A semantic matching energy function for learning with multi-relational data[J]. Machine Learning, 2014, 94(2): 233-259. PDF

特点

entities (termed synsets) correspond to senses, and relation types define lexical relations between those senses.
As WordNet is composed of words with different meanings, we describe its entities by the concatenation of the word, its part-of-speech tag (‘NN’ for noun, ‘VB’ for verb, ‘JJ’ for adjective and ‘RB’ for adverb) and a digit indicating which sense it refers to i.e. _score_NN_1 is the entity encoding the first meaning of the noun “score”. This version of WordNet is different from that used in Bordes et al. (2011) because the original data has been preprocessed differently: this version contains less entities but more relation types.
举例： (_score_NN_1, _hypernym, _evaluation_NN_1)

FB13

FB13所包含的13种关系

出处（SE）

Bordes A, Weston J, Collobert R, et al. Learning Structured Embeddings of Knowledge Bases[C]//AAAI. 2011, 6(1): 6. PDF

特点

举例：(_marylin_monroe, _profession, _actress)

FB15K

数据集

出处（TransE）

Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[C]//Advances in neural information processing systems. 2013: 2787-2795. PDF

特点

To make a small data set to experiment on we selected the subset of entities that are also present in the Wikilinks database and that also have at least 100 mentions in Freebase (for both entities and relationships). We also removed relationships like ’!/people/person/nationality’ which just reverses the head and tail compared to the relationship ’/people/person/nationality’.

FB1M

出处（TransE）

Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[C]//Advances in neural information processing systems. 2013: 2787-2795. PDF

特点

We also wanted to have large-scale data in order to test TransE at scale. Hence, we created another data set from Freebase, by selecting the most frequently occurring 1 million entities.