第十五讲 共指解析
Coreference Resolution
Idea: Identify all noun phrases that refer #说白了就是要搞清楚每个名词短语指代的是谁 比如 John loves his wife. He prepares breakfirst for her everyday. 我们知道his,He都指代(co-refer)的是John.
None phrases refer to entities in the world, many pairs of noun phrases co-refer, some nested inside others.
Coreference Resolution在机器翻译、文本理解等方面都有一定的应用。
Evaluation
Precision/Recall
这俩评价指标还挺常见的。
Precision:准确率,也叫查准率,是模型判定为正例且判定正确的样本占模型判定为正例的样本的比例
P
=
T
P
T
P
+
F
P
P = \frac {TP} {TP+FP}
P=TP+FPTP
Recall: 召回率,也叫查全率,是说模型判定为正例且正确的样本占真正为正例的样本的比例
R
=
T
P
T
P
+
F
N
R = \frac {TP} {TP+FN}
R=TP+FNTP
Kinds of Reference
- Referring expressions
- John Smith
- President Smith
- the president
- Free variables
- Smith saw his salary increase
- Bound variables
- The dancer hurt herself
#Free variable是说,这个变量并不一定指代和它最近的名词,而是依赖于具体的上下文。比如上例中,his salary可能是Smith的,但是如果我们在上文中加上一句’John works hard recently’,那么这个salary其实也可以的是Jhon的; 而Bound variables则十分明确,例子中的herself就是严格地依赖于句子中之前提到过的名词dancer.
- Not all NPs are referring
- No dancer twisted her knee.
- It is raining.
Coreference: two mentions refer to the same entity
Anaphora: A term(anaphor) refers to another term(anecedent) and the interpretation of anaphor is in some way determined by the interpretation of anecedent. Traditionally the anecedent came first.
Cataphora: anecedent did not come first. And we call the first term cataphor.
Kinds of Coreference Models
- Mention Pair models
把共指看成是二元连接的集合,每两对进行一次判别,判断其是否共指 - Mention ranking models
- 给出一个词,我们想看它和哪些词共指,文中可以有若干个mentions,我们对它们进行排序,然后给出结果
- Entity-Mention models
- 给出具体的entity而不是链接