Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction
解决问题:解决DS的错误标注问题和长尾分布,在预训练模型的基础上,提高了关系抽取的稳定性
流程:
语料NYT10(2005–2006 reserved for training
and 2007 for testing. We use the version of the
dataset pre-processed by Lin et al. (2016))
training data
contains 522,611 sentences, 281,270 entity pairs
and 18,252 relational facts. The test data contains
172,448 sentences, 96,678 entity pairs and 1,950
relational facts. There are 53 relation types,including NA
预训练模型:GPT
Adam(β1 = 0.9, β2 = 0.999)
batch_size=8
lr=6.25e-5
warm up
3 epochs
applied residual and attention dropout with a rate of 0.1, and classifier dropout with a rate of 0.2.
loss function ?
seq:关系arg,头实体,seq,尾实体,句子,clf