关系抽取:SemEval2010 Task8数据集

任务描述

SemEval2010 Task8详细信息请参考官方文档

任务:

对于给定了的句子和两个做了标注的名词,从给定的关系清单中选出最合适的关系。

关系清单(9+1)如下所示:

关系定义例子

Cause-Effect

(因果关系)

Cause-Effect(X, Y)  is true for a sentence S that mentions entities X and Y if and only if

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails that X is the cause of Y, or that X causes/makes/produces/emits/... Y.

"A person infected with a particular <e1>flu</e1> <e2>virus</e2> strain develops an antibody against that virus."

Cause-Effect(e2, e1)

Comment: flu is a state, virus is the causal agent, thus (a) is satisfied; the virus is actively involved in causing flu and thus (c) is satisfied.

Instrument-Agency

Instrument-Agency(X, Y) is true of a sentence S that mentions entities X and Y if and only if:

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails the fact that X is the instrument (tool) of Y or, equivalently, that Y uses X.

"A person infected with a particular <e1>flu</e1> <e2>virus</e2> strain develops an antibody against that virus."

Cause-Effect(e2, e1)

Comment: flu is a state, virus is the causal agent, thus (a) is satisfied; the virus is actively involved in causing flu and thus (c) is satisfied.

Product-Producer

(生产与被生产之间的关系)

Product-Producer (X, Y) is true for a sentence S that mentions entities X and Y if and only if:

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails the fact that X is a product of Y, or Y produces X.

"The <e1>honey</e1> <e2>bee</e2> is the third insect genome published by scientists, after a lab workhorse, the fruit fly, and a health menace, the mosquito."

Product-Producer(e1, e2)

Comment: This is a typical example of Product-Producer. Honey is a tangible concrete object (c), and the bee is actively involved in producing it (a).

Content-Container

Content-Container(X, Y) is true for a sentence S that mentions entities X and Y if and only if

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails that X is or was (usually temporarily) stored or carried inside Y.

"The <e1>apples</e1> are in the <e2>basket</e2>."

Content-Container(e1, e2)

Comment: This is a prototypical example of Content-Container.

Entity-Origin

Entity-Origin(X, Y) is true for a sentence S that mentions the entities X and Y if and only if

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails that Y is the origin of an entity X (rather than its location), and X is coming or derived from that origin.

"Under state law, minors are not permitted to have <e1>grain</e1> <e2>alcohol</e2>, even if a parent provides it to their children."

Entity-Origin(e2, e1)

Comment: This is a prototypical example of a material Entity-Origin relation. Restriction (b.4) applies.

Entity-Destination

Entity-Destination(X, Y) is true for a sentence S that mentions the entities X and Y if and only if:

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails the fact that Y is the destination of X in the sense of X moving (in a physical or abstract sense) toward Y.

"The<e1>boy</e1> ran into the school <e2>cafeteria</e2>."

Entity-Destination(e1,e2)

Comment: school cafeteria is a spatial/geographical destination.

Component - Whole

Component-Whole (X,Y) is true for a sentence S that mentions entities X and Y if and only if:

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails that X is a component of Y;

(3) X has a functional relation with Y. In other words, X has an operating or usable purpose within Y.

We don't need Einstein's quantum mechanics to understand why each <e1>hand</e1> has 5 <e2>fingers</e2>, and not 4 or 6.

Component-Whole(e2, e1)

Comment: Fingers are functional, integral parts of the hand.

Member-Collection

Member-Collection(X, Y) is true for a sentence S that mentions entities X and Y if and only if:

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails the fact that X is a member of Y.

"Italian playing cards most commonly consist of a <e1>deck</e1> of 40 <e2>cards</e2>."

Member-Collection(e2, e1)

Comment: A deck is a collection of cards, cards are different and separable from the deck, not functional to the deck.

Message-Topic

Message-Topic(X, Y) is true for a sentence S that mentions the entities X and Y if and only if:

 

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

 

(2) the situation described in S entails the fact that X is a communicative message containing information about Y.

"The recommendations contained the following key <e1>points</e1> about the <e2>new politics</e2> of the government."

Message-Topic(e1, e2)

Comment: politics is the topic of the key points.

Other当句子中实体之前不满足前九种关系时,将标签设置为Other 

各类数据的占比如下图所示:

                      

数据集

  1. Trial Dataset:试验数据集于2009年8月30日发布,它包含前五个关系的数据。但是,其中也包含了一些其他四种关系的引用,  这些数据在试验数据集上可以被视为Other关系,而不必多加处理。
  2. Training Dataset:训练集包含8000个样例,涵盖上文提到的9+1中关系。
  3. Development Dataset:没有提供官方开发集,但是参与者可以使用该部分训练数据集来调整期参数,如使用交叉验证。
  4. Test Dataset:测试集包含2717个样例,涵盖上文提到的9+1中关系,于2010年3月18日发布。
  5. WordNet senses提示:和SemEval-2007 Task 4不同,此处不提供人工标注的WordNet senses,会使得任务更加真实。

SemEval-2010 Task 8 VS SemEval-2007 Task 4

  • l相比2007中对于每一种关系提供一个单独的数据集和一个对应的二分类任务,2010仅仅提供一个单独的多类别数据集。
  • l分类任务
  • l候选的实体仍然会提供,但是评测系统需要去决策实体在关系中的槽位。
  • lWordNet senses query strings将不再提供。
  • l数据集中数据量大了很多(超过10000条标记的句子)。
  • l关系的集合也变大了

难点

关系清单种中两组相近的关系:

l1

  • lComponent-Whole
  • lMember-Collection
  • l都是Part-Whole的特殊情况

l2

  • lContent-Container
  • lEntity-Origin
  • lEntity-Destination
  • l可以通过考虑所表达的状态是静态的还是动态的进行区分

 

相关推荐
©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页