1. Relation extraction:
Extracting relation facts from sentence.
Sentence | Relation |
1. Steve Jobs and Wozniak co-founded Apple in 1976. | Founder |
2. Washington D.C. is the capital of United states. | CapitalOf |
2. Previous method:
Training a relational extractor with manually labeled supervised dataset.
3. Problem:
1) The human annotation is costly.
2) Limited by the number of relation and data size.
4. Distant supervision in RE:
Mintz et al. applying the DS method to RE task for the first time. The DS method
extracts training instances by aligning KB with text.Two steps:
1) Find the target relation and its associated entity pair in KB.
2) Extract the sentences containing this entity pair in the text.
5. Challenge for DS:
1) Finding the fit KB for open domain relation extraction.
3) Error propagation caused by feature engineering using NLP tools.
2) Wrong label problem. (Following).
6. Wrong label:
Extract sentences in text based on the assumption: If two entities have a relationship in a known knowledge base, then all sentences that mention these two entities will express that relationship in some way. So the training data are labeled automatically as follows: for a triplet r(e1,e2)1 in the KB, all sentences that mention both entities e1 and e2 are regarded as the training instances of relation r.
But a sentence that mentions two entities may not express the relation which links them in a KB. It is possible that the two entities may just appear in the same
sentence because they are related to the same topic.
When the entity pair does not have any relationship , it is defined as NA .
In the figure,sentences S2 and S4 both mention Nevada and Las Vegas, but they do not express the relation ( /location/location/contains).
7. Related work:
In order to solve the problems in DS, previous work has proposed many methods that will be introduced in the second part.
And these jobs use a public dataset by aligning Freebase relations with the New York Times(NYT) corpus. The dataset has two versions:
1) The original version ( Riedel et al.2010)
2) The filtered version (Zeng et al.)
A)Remove duplicated sentences in each bag.
B)Remove sentences with more than 40 tokens between two entities.
C)Remove sentences with entity names that are substrings of other entity names in Freebase.
The download URL: http://iesl.cs.umass.edu/riedel/ecml/