【Part one: Introduction】 Relation Extraction with Distant Supervision(DS)

1. Relation extraction

Extracting relation facts from sentence.

Sentence

Relation

1. Steve Jobs and Wozniak co-founded Apple in 1976.

Founder

2. Washington D.C. is the capital of United states.

CapitalOf


2. Previous method

Training a relational extractor with manually labeled supervised dataset.


3. Problem: 

1) The human annotation is costly.

2) Limited by the number of relation and data size.

 

4. Distant supervision in RE:

Mintz et al. applying the DS method to RE task for the first time. The DS method

extracts training instances by aligning KB with text.Two steps:

1) Find the target relation and its associated entity pair in KB.

2) Extract the sentences containing this entity pair in the text.

 


5. Challenge for DS:

1) Finding the fit KB for open domain relation extraction.

3) Error propagation caused by feature engineering using NLP tools.

2) Wrong label problem. (Following).

 

6. Wrong label:

Extract sentences in text based on the assumption: If two entities have a relationship in a known knowledge base, then all sentences that mention these two entities will express that relationship in some way. So the training data are labeled automatically as follows: for a triplet r(e1,e2)1 in the KB, all sentences that mention both entities e1 and e2 are regarded as the training instances of relation r.

But a sentence that mentions two entities may not express the relation which links them in a KB. It is possible that the two entities may just appear in the same  

sentence because they are related to the same topic.

When the entity pair does not have any relationship , it is defined as NA .

 

In the figure,sentences S2 and S4 both mention Nevada and Las Vegas, but they do not express the relation ( /location/location/contains).

 

7. Related work:

In order to solve the problems in DS, previous work has proposed many methods that will be introduced in the second part.

And these jobs use a public dataset by aligning Freebase relations with the New York Times(NYT) corpus. The dataset has two versions:

1) The original version ( Riedel et al.2010)

2) The filtered version (Zeng et al.)

A)Remove duplicated sentences in each bag.

B)Remove sentences with more than 40 tokens between two entities.

C)Remove sentences with entity names that are substrings of other entity  names in Freebase.

 

 

The download URL:  http://iesl.cs.umass.edu/riedel/ecml/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值