指代消解_论文理解《Improving Coreference Resolution by Learning Entity-Level Distributed Representations》

最新推荐文章于 2023-10-28 19:02:35 发布

iteewxs

最新推荐文章于 2023-10-28 19:02:35 发布

阅读量1.1k

点赞数 2

分类专栏： nlp 文章标签：指代消解 nlp 指代消歧 coreference resolution

本文链接：https://blog.csdn.net/weixin_42538265/article/details/97937372

版权

nlp 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

论文《Improving Coreference Resolution by Learning Entity-Level Distributed Representations》

段落：

System Architecture
Building Representations
3.1. Mention-Pair Encoder
3.2. Cluster-Pair Encoder
Mention-Ranking Model
Cluster-Ranking Model
5.1. Cluster-Ranking Policy Network
5.2. Easy-First Cluster Ranking
5.3 Deep Learning to Search
Experiments and Results
6.1. Mention-Ranking Model Experiments
6.2. Cluster-Ranking Model Experiments

Abstract

A long-standing challenge in coreference resolution has been the incorporation of entity-level information – features defined over clusters of mentions instead of mention pairs. We present a neural network based coreference system that pro- duces high-dimensional vector representations for pairs of coreference clusters. Using these representations, our system learns when combining clusters is desirable. We train the system with a learning-to-search algorithm that teaches it which local decisions (cluster merges) will lead to a high-scoring final corefer-ence partition. The system substantially outperforms the current state-of-the-art on the English and Chinese portions of the CoNLL 2012 Shared Task dataset despite using few hand-engineered features.

指代消解中长期存在的挑战是实体级信息的整合 - 在mention集群而不是提及对上定义的特征。我们提出了一种基于神经网络的指代消解系统，该系统可以为共参照簇对生成高维矢量表示。使用这些表示，我们的系统可以了解何时组合集群是可行的。我们使用学习搜索算法训练系统，该算法教会哪些本地决策（群集合并）将导致高分最终核心分区。尽管使用了少量手工设计的功能，该系统在CoNLL 2012共享任务数据集的英文和中文部分上大大优于当前最新技术水平。

Coreference resolution, the task of identifying which mentions in a text refer to the same real- world entity, is fundamentally a clustering prob-lem. However, many recent state-of-the-art coref- erence systems operate solely by linking pairs of mentions together (Durrett and Klein, 2013; Martschat and Strube, 2015; Wiseman et al., 2015).

指代消解，即识别文本中mention哪些参考同一现实世界实体的任务，从根本上说是一个聚类问题。然而，许多最近最先进的核心系统仅通过将一对提及的方式联系起来。

An alternative approach is to use agglomera- tive clustering, treating each mention as a single- ton cluster at the outset and then repeatedly merg- ing clusters of mentions deemed to be referring to the same entity. Such systems can take advan- tage of entity-level information, i.e., features be- tween clusters of mentions instead of between just two mentions. As an example for why this is use- ful, it is clear that the clusters {Bill Clinton} and{Clinton, she} are not referring to the same entity, but it is ambiguous whether the pair of mentions Bill Clinton and Clinton are coreferent.

另一种方法是使用聚集聚类，在开始时将每个mention视为一个单一的聚类，然后反复合并被认为是指同一实体的提及聚类。这样的系统可以利用实体级信息，即提及的集群之间的特征，而不是仅仅两个mention之间的特征。作为一个有用的原因的一个例子，很明显，集群{Bill Clinton}和{Clinton, she}并不是指同一个实体，但是提到比尔克林顿和克林顿是否具有共识是不明确的。

Previous work has incorporated entity-level in- formation through features that capture hard con- straints like having gender or number agreement between clusters (Raghunathan et al., 2010; Dur- rett et al., 2013). In this work, we instead train a deep neural network to build distributed represen- tations of pairs of coreference clusters. This cap- tures entity-level information with a large number of learned, continuous features instead of a small number of hand-crafted categorical ones.

以前的工作已经通过捕获硬性约束的特征结合了实体级信息，例如在集群之间具有性别或数量一致性。在这项工作中，我们改为训练深度神经网络来构建成对的共参聚类的分布式表示。这可以捕获实体级信息，其中包含大量学习的连续特征，而不是少量手工制作的分类特征。

Our system uses little manual feature engineer- ing, which means it is easily extended to multiple languages. We evaluate our system on the English and Chinese portions of the CoNLL 2012 Shared Task dataset. The cluster-ranking model signifi- cantly outperforms a mention-ranking model that does not use entity-level information. We also show that using an easy-first strategy improves the performance of the cluster-ranking model. Our fi- nal system achieves CoNLL F1 scores of 65.29 for English and 63.66 for Chinese, substantially out- performing other state-of-the-art systems.

我们的系统几乎不使用手动特征工程，这意味着它可以轻松扩展到多种语言。我们在CoNLL 2012共享任务数据集的英文和中文部分评估我们的系统。群集排名模型显着优于不使用实体级信息的mention排名模型。我们还表明，使用简单优先策略可以提高集群排名模型的性能。我们的最终系统使英语的CoNLL F1得分为65.29，中国的得分为63.66，远远超过其他最先进的系统。

2 System Architecture

尝试了基于长短期内存递归神经网络的模型，但发现由于对训练数据的过度拟合，这些模型在端到端共参照系统中使用时表现稍差。

Our cluster-ranking model is a single neural network that learns which coreference cluster merges are desirable. However, it is helpful to think of the network as being composed of distinct sub- networks. The mention-pair encoder produces distributed representations for pairs of mentions by passing relevant features through a feedforward neural network. The cluster-pair encoder produces distributed representations for pairs of clus- ters by applying a pooling operation over the rep- resentations of relevant mention pairs, i.e., pairs where one mention is in each cluster. The cluster- ranking model then scores pairs of clusters by passing their representations through a single neu- ral network layer.

我们的集群排名模型是一个单一的神经网络，可以了解哪些共享集群合并是可取的。但是，将网络视为由不同的子网络组成是有帮助的。提及对编码器通过前馈神经网络传递相关特征来为提及对产生分布式表示。群集对编码器通过对相关mention对的表示应用池化操作来生成群集对的分布式表示，即，在每个群集中mention一对的对。然后，群集排名模型通过将其表示传递到单个神经网络层来对群集对进行评分。

We also train a mention-ranking model that scores pairs of mentions by passing their repre- sentations through a single neural network layer. Its parameters are used to initialize the cluster- ranking model, and the scores it produces are used to prune which candidate cluster merges the cluster-ranking model considers, allowing the cluster-ranking model to run much faster. The sys- tem architecture is summarized in Figure 1.

我们还训练了一种提及排名模型，通过将他们的表示传递到单个神经网络层来对mention的对提示进行评分。其参数用于初始化集群排序模型，其生成的分数用于修剪哪些候选集群合并集群排名模型考虑，允许集群排名模型运行得更快。系统架构总结在figure 1中。

常见的指代消解算法有 Mention Pair、Mention Rank、Entity Mention

1. Mention Pair：将所有的指代词（短语）与所有被指代的词（短语）视作一系列的pair，对每个pair二分类决策成立与否；

2. Mention Rank：显式的将mention作为query，对所有的candidate做rank；

3. Entity Mention：一种更优雅的模型，不再是考虑 mention 与 mention 间的关系，而是考虑 mention 与 mention 聚类体之间的关系，在判断某一 mention 与某一聚类体是否有指代关系时采用投票方式决定。方法：找出所有的entity及其对话上下文，根据对话上下文聚类，在同一个类中的mention消解为同一个entity。

3 Building Representations

In this section, we describe the neural networks producing distributed representations of pairs of mentions and pairs of coreference clusters. We assume that a set of mentions has already been extracted from each document using a method such as the one in Raghunathan et al. (2010).

在本节中，我们描述神经网络，它们产生了成对mention和共参数集对的分布式表示。我们假设已经从每个文档中提取了一组mention。

3.1 Mention-Pair Encoder

mention是主体or主体的指代词
Given a mention m and candidate antecedent a, the mention-pair encoder produces a distributed representation of the pair rm(a, m) 2 Rd with a feedforward neural network, which is shown in Figure 2. The candidate antecedent may be any mention that occurs before m in the document or NA, indicating that m has no antecedent. We also experimented with models based on Long Short-Term Memory recurrent neural networks (Hochreiter and Schmidhuber, 1997), but found these to perform slightly worse when used in an end-to-end coreference system due to heavy overfitting to the training data.

给定提及m和候选先行词a，提及对编码器产生具有前馈神经网络的对rm（a，m）2 Rd的分布式表示，如Figure 2所示。候选先行词可以是任何提及发生在文件中的m或NA之前，表明m没有先行词。我们还尝试了基于长短期内存递归神经网络的模型（Hochreiter和Schmidhuber，1997），但发现由于对训练数据的过度拟合，这些模型在端到端共参照系统中使用时表现稍差。

我的理解：

Neural Mention-ranking 模型结构主体部分为多层的前反馈神经网络：
[1]首先是输入层将指代词（mention）特征、候选前指词（Candidate Antecedent）即指代词出现前的词特征、指导词所在句子特征以及其他特征例如距离特征、连接关系特征等等做向量拼接（concate）处理作为模型的输入 h0。

[2]隐藏层采用 Relu 作为激活函数，其中隐藏层共 3 层，

Input Layer. For each mention, the model extracts various words and groups of words that are fed into the neural network. Each word is represented by a vector w. Each group of words is represented by the average of the vectors of each word in the group. For each mention and pair of mentions, a small number of binary features and distance fea- tures are also extracted. Distances and mention lengths are binned into one of the buckets [0,1,2,3,4,5-7,8-15,16-31,32-63,64+] and then encoded in a one-hot vector in addition to be- ing included as continuous features. The full set of features is as follows:

输入层。对于每次mention，该模型都会提取输入神经网络的各种单词和单词组。
每个单词由向量w表示。每组单词由组中每个单词的向量的平均值表示。对于每个mention和一对mention，还提取少量二进制特征和距离特征。距离和mention长度被分成一个区域[0,1,2,3,4,5-7,8-15,16-31,32-63,64 +]，然后编码为one-hot vector除了作为连续特征包括在内。全套功能如下：

Embedding Features: Word embeddings of the head word, dependency parent, first word, last word, two preceding words, and two following words of the mention. Averaged word embed- dings of the five preceding words, five following words, all words in the mention, all words in the mention’s sentence, and all words in the mention’s document.

嵌入特征：头部单词，依赖性父级，第一个单词，最后一个单词，前两个单词以及后面提到的两个单词的单词嵌入。前五个单词的平均单词嵌入，后面五个单词，mention中的所有单词，mention句子中的所有单词以及mention文档中的所有单词。

Additional Mention Features: The type of the mention (pronoun, nominal, proper, or list), the mention’s position (index of the mention divided by the number of mentions in the document), whether the mentions is contained in another mention, and the length of the mention in words.

附加mention特征：mention的类型（代词，名义，正确或列表），mention的位置（mention的索引除以文档中提及的数量），mention是否包含在另一个mention中，和文字mention的长度。

Document Genre: The genre of the mention’s doc- ument (broadcast news, newswire, web data, etc.).
文档类型：提及文档的类型（广播新闻，新闻专线，网络数据等）。

Distance Features: The distance between the men- tions in sentences, the distance between the men- tions in intervening mentions, and whether the mentions overlap.
距离特征：句子中的作品之间的距离，介入提及中的作品之间的距离，以及提及是否重叠。

Speaker Features: Whether the mentions have the same speaker and whether one mention is the other mention’s speaker as determined by string match- ing rules from Raghunathan et al. (2010).
演讲者特点：提及是否具有相同的发言者，是否提及的另一个是由Raghunathan等人的字符串匹配规则确定的另一个提及者。（2010年）。

String Matching Features: Head match, exact string match, and partial string match.
字符串匹配功能：头匹配，精确字符串匹配和部分字符串匹配。

The vectors for all of these features are concate- nated to produce an I -dimensional vector h0 , the input to the neural network. If a = NA, the fea- tures defined over mention pairs are not included. For this case, we train a separate network with an identical architecture to the pair network except for the input layer to produce anaphoricity scores.
所有这些特征的向量被连接起来以产生I维向量h0，即神经网络的输入。
如果a = NA，则不包括在提及对上定义的特征。对于这种情况，我们训练一个具有相同架构的独立网络到对网络，除了输入层以产生回指分数。

Our set of hand-engineered features is much smaller than the dozens of complex features typ- ically used in coreference systems. However, we found these features were crucial for getting good model performance. See Section 6.1 for a feature ablation study.

我们的手工设计功能比共同使用的数十种复杂功能要小得多。但是，我们发现这些功能对于获得良好的模型性能至关重要。有关特征消融研究，请参阅第6.1节。

3.2 Cluster-Pair Encoder

Given two clusters of mentions ci {mi1,mi2,…,mi|ci|}andcj ={mj1,mj2,…,mj|cj|}, the cluster-pair encoder produces a distributed representation rc(ci,cj) 2 R2d. The architecture of the encoder is summarized in Figure 3.
The cluster-pair encoder first combines the information contained in the matrix of mention-pair representations Rm (ci , cj ) = [rm(mi1, mj1), rm(mi1, mj2), …, rm(mi|ci|, mj|cj |)] to produce rc (ci , cj ). This is done by applying a pooling operation. In particular it concatenates the results of max-pooling and average-pooling, which we found to be slightly more effective than using either one alone:

鉴于两个群集提及ci {mi1,mi2，…,mi | ci |}和cj = {mj1,mj2，…,mj | cj |}，群集对编码器产生分布式表示rc（ci,cj）2 R2d。编码器的架构总结在Figure 3中。
簇对编码器首先组合提及对表示矩阵中包含的信息Rm（ci,cj）= [rm(mi1,mj1),rm(mi1,mj2),…,rm(mi|ci,mj|cj )]生成rc（ci，cj）。这是通过池化层操作完成的。特别是它连接了最大池和平均池的结果，我们发现它比单独使用任何一个更有效：

4 Mention-Ranking Model

Mention Rank：显式的将mention作为query，对所有的candidate做rank；
实体排序模型是根据当前 mention 与之前 mention 的指代概率来构建训练实例和组成指代链。

Rather than training a cluster-ranking model from scratch, we first train a mention-ranking model that assigns each mention its highest scoring candidate antecedent. There are two key advantages of doing this. First, it serves as pretraining for the cluster-ranking model; in particular the mention- ranking model learns effective weights for the mention-pair encoder. Second, the scores pro- duced by the mention-ranking model are used to provide a measure of which coreference decisions are easy (allowing for an easy-first clustering strat- egy) and which decisions are clearly wrong (these decisions can be pruned away, significantly reduc- ing the search space of the cluster-ranking model).

我们不是从头开始训练cluster排名模型，而是首先训练一个mention排名模型，该模型将每个mention分配给最高得分的先行语。这样做有两个主要优点。首先，它作为集群排序模型的预训练;特别是mention排名模型学习提及对编码器的有效权重。其次，mention排名模型产生的分数用于衡量哪些共指决策很容易（允许简单的群集策略）以及哪些决策明显错误（这些决策可以被删除），显着减少集群排名模型的搜索空间）。

The mention-ranking model assigns a score sm(a,m) to a mention m and candidate antecedent a representing their compatibility for coreference. This is produced by applying a single fully connected layer of size one to the repre- sentation rm(a, m) produced by the mention-pair encoder:

mention排名模型将分数sm（a，m）分配给mention m和候选先行词a，表示它们对共同参照的兼容性。这是通过将一个大小为1的全连接层应用于由提及对编码器产生的表示rm（a，m）而产生的：

where Wm is a 1xd weight matrix. At test time, the mention-ranking model links each mention with its highest scoring candidate antecedent.

其中 Wm 是1xd权重矩阵。在测试时，mention排名模型将每个mention与其得分最高的候选先行者联系起来。

Training Objective. We train the mention-ranking model with the slack-rescaled max-margin training objective from Wiseman et al.(2015), which encourages separation between the highest scoring true and false antecedents of the current mention. Suppose the training set consists of N mentions m1,m2,…,mN. Let A(mi) de-note the set of candidate antecedents of a mention mi (i.e., mentions preceding mi and NA), and T (mi) denote the set of true antecedents of mi(i.e., mentions preceding mi that are coreferent with it or {NA} if m has no antecedent). Let tˆ i be the highest scoring true antecedent of mention mi:

训练目标。我们使用Wiseman等人（2015）的松弛重新调整的最大边缘训练目标来训练mention排名模型，该目标鼓励在当前mention的最高得分真假前提之间进行分离。假设训练集由N个mention m1，m2，…，mN组成。让A（mi）注释提及mi的候选前因集合（即，在mi和NA之前提及），并且T（mi）表示mi的真实前因的集合（即，在mi之前提及的mi。如果m没有前因，则为{NA}。让ti 成为mention mi的最高得分真实前因：