一篇文章包含
{
"vertexset":[]
"labels":[]
"title":""
"sents":[[]...[]]
}
其中vertexset包含
"pos"为一个词性的索引,例[0:3]--0,1,2
"type"为这个词的词性类型
有person, location, organization, time, number, miscellaneous(其他实体)
PER LOC ORG TIME NUM MISC
"sent_id"为该实体所在的句子索引
"vertexSet": [
[
{
"pos": [
0,
3
],
"type": "PER",
"sent_id": 0,
"name": "Miguel Riofrio Sánchez"
},
其中labels包含 (***有实体共指问题
"r"为关系类型
"h"为头实体在vertexset中的索引
"t"为尾实体在vertexset中的索引
"evidence"为支持实例的句子索引
"labels": [
{
"r": "P607",
"h": 1,
"t": 3,
"evidence": [
0
]
},
例:
sent_id = 0:
Lark Force was an Australian Army formation established in March 1941 during World War II for service in New Britain and New Ireland
"r" = P607:
"P607": "conflict",
"h" = 1:
[
{
"name": "Australian Army",
"pos": [
4,
6
],
"sent_id": 0,
"type": "ORG"
}
]
"t" = 3:
[
{
"name": "World War II",
"pos": [
12,
15
],
"sent_id": 0,
"type": "MISC"
}
]
```
Data Format:
{
'title',
'sents': [
[word in sent 0],
[word in sent 1]
]
'vertexSet': [
[
{ 'name': mention_name,
'sent_id': mention in which sentence,
'pos': postion of mention in a sentence,
'type': NER_type}
{anthor mention}
],
[anthoer entity]
]
'labels': [
{
'h': idx of head entity in vertexSet,
't': idx of tail entity in vertexSet,
'r': relation,
'evidence': evidence sentences' id
}
]
}
```