Multi-instance Multi-label Learning for Relation Extraction-2012

Abstract

Distant supervision for relation extraction (RE) – gathering training data by aligning a database of facts with text – is an efficient approach to scale RE to thousands of different relations. However, this introduces a challeng ing learning scenario where the relation ex pressed by a pair of entities found in a sentence is unknown. For example, a sentence containing Balzac and France may express BornIn or Died, an unknown relation, or no re lation at all. Because of this, traditional super vised learning, which assumes that each example is explicitly mapped to a label, is not appropriate. We propose a novel approach to multi-instance multi-label learning for RE, which jointly models all the instances of a pair of entities in text and all their labels using a graphical model with latent variables. Our model performs competitively on two difficult domains.

关系提取 (RE) 的远程监督——通过将事实数据库文本对齐来收集训练数据——是将 RE 扩展到数千种不同关系的有效方法。 然而,这引入了一个具有挑战性的学习场景,其中在句子中发现的一对实体所表达的关系是未知的。 例如,包含 Balzac 和 France 的句子可能表示 BornIn 或 Died、未知关系或根本没有关系。 正因为如此,传统的监督学习,假设每个例子都明确地映射到一个标签,是不合适的。 我们提出了一种用于RE多实例多标签学习的新方法,该方法使用具有潜在变量的图形模型对文本中一对实体的所有实例及其所有标签进行联合建模。 我们的模型在两个困难的领域具有竞争力。

1 Introduction

Information extraction (IE), defined as the task of extracting structured information (e.g., events, bi nary relations, etc.) from free text, has received renewed interest in the “big data” era, when petabytes of natural-language text containing thousands of different structure types are readily available. How ever, traditional supervised methods are unlikely to scale in this context, as training data is either limited or nonexistent for most of these structures. One of the most promising approaches to IE that addresses this limitation is distant supervision, which generates training data automatically by aligning a database of facts with text (Craven and Kumlien, 1999; Bunescu and Mooney, 2007).

信息提取 (IE),定义为从自由文本中提取结构化信息(例如,事件、二元关系等)的任务,在“大数据”时代重新引起了人们的兴趣,当 PB 级自然语言文本包含 数以千计的不同结构类型随时可用。 然而,传统的监督方法不太可能在这种情况下扩展,因为大多数这些结构的训练数据要么有限,要么不存在。 解决这一限制的 IE 最有前途的方法之一是远程监督,它通过将事实数据库与文本对齐来自动生成训练数据(Craven 和 Kumlien,1999 年;Bunescu 和 Mooney,2007 年)。

In this paper we focus on distant supervision for relation extraction (RE), a subproblem of IE that ad dresses the extraction of labeled relations between two named entities. Figure 1 shows a simple example for a RE domain with two labels. Distant supervision introduces two modeling challenges, which we highlight in the table. The first challenge is that some training examples obtained through this heuristic are not valid, e.g., the last sentence in Figure 1 is not a correct example for any of the known labels for the tuple. The percentage of such false positives can be quite high. For example, Riedel et al. (2010) report up to 31% of false positives in a corpus that matches Freebase relations with New York Times articles. The second challenge is that the same pair of entities may have multiple labels and it is unclear which label is instantiated by any textual mention of the given tuple. For example, in Figure 1, the tuple (Barack Obama, United States) has two valid labels: BornIn and EmployedBy, each (latently) instantiated in different sentences. In the Riedel corpus, 7.5% of the entity tuples in the training partition have more than one label.

在本文中,我们专注于关系提取 (RE) 的远程监督,这是 IE 的一个子问题,用于提取两个命名实体之间的标记关系。图 1 显示了一个带有两个标签的 RE 域的简单示例。远程监督引入了两个建模挑战,我们在表中强调了这些挑战。第一个挑战是通过这种启发式方法获得的一些训练示例无效,例如,图 1 中的最后一句话对于元组的任何已知标签都不是正确的示例。这种误报的百分比可能相当高。例如,里德尔等人。 (2010) 在将 Freebase 关系与纽约时报文章相匹配的语料库中报告了高达 31% 的误报第二个挑战是同一对实体可能有多个标签,并且不清楚哪个标签被给定元组的任何文本提及实例化。例如,在图 1 中,元组(美国巴拉克奥巴马)有两个有效标签:BornIn 和 EmployedBy,每个标签(潜在地)在不同的句子中实例化。在 Riedel 语料库中,训练分区中 7.5% 的实体元组有多个标签

 We summarize this multi-instance multi-label (MIML) learning problem in Figure 2. In this pa per we propose a novel graphical model, which we called MIML-RE, that targets MIML learning for relation extraction. Our work makes the following contributions: (a) To our knowledge, MIML-RE is the first RE approach that jointly models both multiple instances (by modeling the latent labels assigned to instances) and multiple labels (by providing a simple method to capture dependencies between labels). For example, our model learns that certain labels tend to be generated jointly while others cannot be jointly assigned to the same tuple. (b) We show that MIML-RE performs competitively on two difficult domains.

我们在图 2 中总结了这个多实例多标签 (MIML) 学习问题。在本文中,我们提出了一种新的图形模型,我们称之为 MIML-RE,它针对 MIML 学习进行关系提取。 我们的工作做出了以下贡献:(a) 据我们所知,MIML-RE 是第一个对多个实例(通过对分配给实例的潜在标签进行建模)和多个标签(通过提供一种简单的方法来捕获依赖关系)联合建模的 RE 方法 标签之间)。 例如,我们的模型了解到某些标签往往是联合生成的,而其他标签则不能联合分配给同一个元组。 (b) 我们展示了 MIML-RE 在两个困难的领域上的表现

 多实例多标签学习概述。 相比之下,在传统的监督学习中,每个对象有一个实例和一个标签。 对于关系提取,对象是两个命名实体的元组。 文本中每次提及此元组都会生成一个不同的实例

2 Related Work

Distant supervision for IE was introduced by Craven and Kumlien (1999), who focused on the ex traction of binary relations between proteins and cells/tissues/diseases/drugs using the Yeast Protein Database as a source of distant supervision. Since then, the approach grew in popularity (Bunescu and Mooney, 2007; Bellare and McCallum, 2007; Wu and Weld, 2007; Mintz et al., 2009; Riedel et al., 2010; Hoffmann et al., 2011; Nguyen and Moschitti, 2011; Sun et al., 2011; Surdeanu et al., 2011a). However, most of these approaches make one or more approximations in learning. For example, most proposals heuristically transform distant super vision to traditional supervised learning (i.e., single instance single-label) (Bellare and McCallum, 2007; Wu and Weld, 2007; Mintz et al., 2009; Nguyen and Moschitti, 2011; Sun et al., 2011; Surdeanu et al., 2011a). Bunescu and Mooney (2007) and Riedel et al. (2010) model distant supervision for relation extraction as a multi-instance single-label problem, which allows multiple mentions for the same tuple but disallows more than one label per object. Our work is closest to Hoffmann et al. (2011). They address the same problem we do (binary relation extraction) with a MIML model, but they make two approximations. First, they use a deterministic model that aggregates latent instance labels into a set of labels for the corresponding tuple by OR-ing the classification results. We use instead an object level classifier that is trained jointly with the classifier that assigns latent labels to instances and can capture dependencies between labels. Second, they use a Perceptron-style additive parameter update approach, whereas we train in a Bayesian framework. We show in Section 5 that these approximations generally have a negative impact on performance.

IE 的远程监督是由 Craven 和 Kumlien (1999) 引入的,他们专注于使用酵母蛋白数据库作为远程监督的来源提取蛋白质和细胞/组织/疾病/药物之间的二元关系。从那时起,这种方法越来越受欢迎(Bunescu 和 Mooney,2007;Bellare 和 McCallum,2007;Wu 和 Weld,2007;Mintz 等,2009;Riedel 等,2010;Hoffmann 等,2011;Nguyen 和Moschitti,2011;Sun 等,2011;Surdeanu 等,2011a)。然而,这些方法中的大多数都在学习中做出了一种或多种近似。例如,大多数提议启发式地将远程超级视觉转换为传统的监督学习(即单实例单标签)(Bellare 和 McCallum,2007;Wu 和 Weld,2007;Mintz 等,2009;Nguyen 和 Moschitti,2011;Sun等,2011;Surdeanu 等,2011a)。 Bunescu 和 Mooney (2007) 和 Riedel 等人。 (2010) 将关系提取的远程监督建模为多实例单标签问题,它允许对同一元组进行多次提及,但不允许每个对象有多个标签。我们的工作最接近霍夫曼等人。 (2011)。它们解决了我们使用 MIML 模型所做的相同问题(二元关系提取),但它们做了两个近似。首先,他们使用确定性模型,通过对分类结果进行 OR 运算,将潜在实例标签聚合为对应元组的一组标签。我们使用一个对象级分类器,该分类器与分类器联合训练,该分类器为实例分配潜在标签并可以捕获标签之间的依赖关系。其次,他们使用感知器风格的附加参数更新方法,而我们在贝叶斯框架中进行训练。我们在第 5 节中表明,这些近似通常对性能有负面影响。

MIML learning has been used in fields other than natural language processing. For example, Zhou and Zhang (2007) use MIML for scene classification. In this problem, each image may be assigned multiple labels corresponding to the different scenes captured. Furthermore, each image contains a set of patches, which forms the bag of instances assigned to the given object (image). Zhou and Zhang pro[1]pose two algorithms that reduce the MIML problem to a more traditional supervised learning task. In one algorithm, for example, they convert the task to a multi-instance single-label problem by creating a separate bag for each label. Due to this, the proposed approach cannot model inter-label dependencies. Moreover, the authors make a series of approximations, e.g., they assume that each instance in a bag shares the bag’s overall label. We instead model all these issues explicitly in our approach.

MIML 学习已用于自然语言处理以外的领域。 例如,Zhou and Zhang (2007) 使用 MIML 进行场景分类。 在这个问题中,每个图像可能会被分配多个标签,对应于捕获的不同场景。 此外,每个图像都包含一组补丁,它们形成了分配给给定对象(图像)的实例包。 Zhou 和 Zhang pro 提出了两种算法,将 MIML 问题简化为更传统的监督学习任务。 例如,在一种算法中,他们通过为每个标签创建一个单独的袋子,将任务转换为多实例单标签问题。 因此,所提出的方法无法对标签间依赖性进行建模。 此外,作者进行了一系列近似,例如,他们假设包中的每个实例都共享包的整体标签。 相反,我们在我们的方法中明确地对所有这些问题进行建模。

In general, our approach belongs to the category of models that learn in the presence of incomplete or incorrect labels. There has been interest among machine learning researchers in the general problem of noisy data, especially in the area of instance-based learning. Brodley and Friedl (1999) summarize past approaches and present a simple, all-purpose method to filter out incorrect data before training. While potentially applicable to our problem, this approach is completely general and cannot incorporate our domain-specific knowledge about how the noisy data is generated.

一般来说,我们的方法属于在存在不完整不正确标签的情况下学习的模型类别。 机器学习研究人员对噪声数据的一般问题很感兴趣,尤其是在基于实例的学习领域。 Brodley 和 Friedl (1999) 总结了过去的方法,并提出了一种简单的通用方法,可以在训练前过滤掉不正确的数据。 虽然可能适用于我们的问题,但这种方法是完全通用的,不能结合我们关于如何生成噪声数据的特定领域知识。

3 Distant Supervision for Relation Extraction 关系抽取的远程监督

Here we focus on distant supervision for the extraction of relations between two entities. We define a relation as the construct r(e1, e2), where r is the relation name, e.g., BornIn in Figure 1, and e1 and e2 are two entity names, e.g., Barack Obama and United States. Note that there are entity tuples (e1, e2) that participate in multiple relations, r1, . . . , ri . In other words, the tuple (e1, e2) is the object illustrated in Figure 2 and the different relation names are the labels. We define an entity mention as a sequence of text tokens that matches the corresponding entity name in some text, and relation mention (for a given relation r(e1, e2)) as a pair of entity mentions of e1 and e2 in the same sentence. Relation mentions thus correspond to the instances in Figure 2.1 As the latter definition indicates, we focus on the extraction of relations expressed in a single sentence. Furthermore, we assume that entity mentions are extracted by a different process, such as a named entity recognizer.

在这里,我们专注于提取两个实体之间关系的远程监督。我们将关系定义为构造 r(e1, e2),其中 r 是关系名称,例如图 1 中的 BornIn,e1 和 e2 是两个实体名称,例如 Barack Obama 和 United States。请注意,存在参与多个关系的实体元组 (e1, e2),r1,  . . , ri  .换句话说,元组 (e1, e2) 是图 2 中所示的对象,不同的关系名称是标签。我们将实体mention 定义为与某些文本中对应的实体名称匹配的文本标记序列,并将关系mention(对于给定的关系r(e1, e2))定义为同一句子中e1 和e2 的一对实体mention .因此,关系提及对应于图 2 中的实例,正如后一个定义所示,我们专注于提取单个句子中表达的关系。此外,我们假设实体提及是由不同的过程提取的,例如命名实体识别器。

We define the task of relation extraction as a function that takes as input a document collection (C), a set of entity mentions extracted from C (E), a set of known relation labels (L) and an extraction model, and outputs a set of relations (R) such that any of the relations extracted is supported by at least one sentence in C. To train the extraction model, we use a database of relations (D) that are instantiated at least once in C. Using distant supervision, D is aligned with sentences in C, producing relation mentions for all relations in D.

我们将关系提取的任务定义为一个函数,该函数将文档集合(C)、从 C(E)中提取的一组实体提及、一组已知的关系标签(L)和一个提取模型作为输入,并输出一个 一组关系 (R),使得提取的任何关系至少得到 C 中的一个句子的支持。为了训练提取模型,我们使用在 C 中至少实例化一次的关系数据库 (D)。使用远程监督 , D 与 C 中的句子对齐,为 D 中的所有关系产生关系提及。

4 Model

Our model assumes that each relation mention involving an entity pair has exactly one label, but allows the pair to exhibit multiple labels across different mentions. Since we do not know the actual relation label of a mention in the distantly supervised setting, we model it using a latent variable z that can take one of the k pre-specified relation labels as well as an additional NIL label, if no relation is expressed by the corresponding mention. We model the multiple relation labels an entity pair can assume using a multi-label classifier that takes as input the latent relation types of the all the mentions involving that pair. The two-layer hierarchical model is shown graphically in Figure 3, and is described more formally below. The model includes one multi-class classifier (for z) and a set of binary classifiers (for each yj ). The z classifier assigns latent labels from L to individual relation mentions or NIL if no relation is expressed by the mention. Each yj classifier decides if relation j holds for the given entity tuple, using the mention-level classifications as input. Specifically, in the figure:

我们的模型假设涉及实体对的每个关系提及只有一个标签,但允许该对在不同的提及中展示多个标签。由于我们不知道远程监督设置中提及的实际关系标签,因此我们使用潜在变量 z 对其进行建模,该变量可以采用 k 个预先指定的关系标签之一以及一个额外的 NIL 标签(如果没有关系)由相应的提及表示。我们使用多标签分类器实体对可以假设的多个关系标签进行建模,该分类器将涉及该对的所有提及项的潜在关系类型作为输入。图 3 中以图形方式显示了两层分层模型,下面将对其进行更正式的描述。该模型包括一个多类分类器(对于 z)和一组二元分类器(对于每个 yj )。 z 分类器将 L 中的潜在标签分配给单个关系提及或 NIL,如果提及没有表达任何关系。每个 yj 分类器使用提及级别的分类作为输入,决定关系 j 是否适用于给定的实体元组。具体来说,如图:

 

• n is the number of distinct entity tuples in D; 
• Mi is the set of mentions for the ith entity pair;
• x is a sentence and z is the latent relation classification for that sentence; 
• wz is the weight vector for the multi-class mention-level classifier; 
• k is the number of known relation labels in L; 
• yj is the top-level classification decision for the entity pair as to whether the jth relation holds; 
• wj is the weight vector for the binary top-level classifier for the jth relation

• n 是 D 中不同实体元组的数量;
• Mi 是第 i 个实体对的提及集;
• x 是一个句子,z 是该句子的潜在关系分类;
• wz 是多类提及级别分类器的权重向量;
• k 是 L 中已知关系标签的数量;
• yj 是实体对关于第j 个关系是否成立的顶级分类决策;
• wj 是第 j 个关系的二元顶级分类器的权重向量

Additionally, we define Pi (Ni) as the set of all known positive (negative) relation labels for the ith entity tuple. In this paper, we construct Ni as L \Pi , but, in general, other scenarios are possible. For example, both Sun et al. (2011) and Surdeanu et al. (2011a) proposed models where Ni for the ith tuple (e1, e2) is defined as: {rj | rj (e1, ek) ∈ D, ek 6= e2, rj ∈/ Pi}, which is a subset of L \Pi . That is, entity e2 is considered a negative example for relation rj (in the context of entity e1) only if rj exists in the training data with a different value. The addition of the object-level layer (for y) is an important contribution of this work. This layer can capture information that cannot be modeled by the mention-level classifier. For example, it can learn that two relation labels (e.g., BornIn and SpouseOf) cannot be generated jointly for the same entity tuple. So, if the z classifier outputs both these labels for different mentions of the same tuple, the y layer can cancel one of them. Furthermore, the y classifiers can learn when two labels tend to appear jointly, e.g., CapitalOf and Contained between two locations, and use this occurrence as positive reinforcement for these labels. We discuss the features that implement these ideas in Section 5.

此外,我们将 Pi (Ni) 定义为第 i 个实体元组的所有已知正(负)关系标签的集合。在本文中,我们将 Ni 构造为 L \Pi ,但一般来说,其他情况也是可能的。例如,Sun 等人。 (2011) 和 Surdeanu 等人。 (2011a) 提出的模型,其中第 i 个元组 (e1, e2) 的 Ni 定义为:

,是L \Pi 的子集。也就是说,只有当 rj 存在于具有不同值的训练数据中时,实体 e2 才被视为关系 rj 的反例(在实体 e1 的上下文中)。添加对象级层(对于 y)是这项工作的重要贡献。该层可以捕获提及级别分类器无法建模的信息。例如,它可以了解到不能为同一个实体元组联合生成两个关系标签(例如 BornIn 和 SpouseOf)。因此,如果 z 分类器为同一元组的不同提及输出这两个标签,则 y 层可以取消其中一个。此外,y 分类器可以学习何时两个标签倾向于共同出现,例如两个位置之间的 CapitalOf 和 Contained,并将这种出现用作这些标签的正强化。我们将在第 5 节中讨论实现这些想法的功能。

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: "few-shot object detection with attention-rpn and multi-relation detector" 是一种使用注意力机制的少样本目标检测方法。它通过使用 Attention-RPN(Region Proposal Network)和 Multi-Relation Detector 来实现对目标的检测。 Attention-RPN 可以在提议区域中识别关键部位,而 Multi-Relation Detector 则可以在少量样本中识别目标并定位它们。这种方法在训练和测试时都需要少量样本,因此可以减少模型的训练时间和资源消耗。 ### 回答2: 随着人工智能技术的不断发展,目标检测的研究也得到了越来越多的关注。其中,Few-shot object detection with attention-rpn and multi-relation detector是目前在目标检测领域上的一个最新研究成果。那这个算法是什么呢? 针对目前目标检测领域中的一大难点——少样本学习,此研究提出了一种基于RPN(region proposal network)和注意力机制的多关系检测算法,使得模型只需使用少量的训练数据,就能在未见过的类别中达到较高的检测准确率。 具体来说,该算法通过在RPN中引入注意力交互模块来提供精细的检测区域,同时通过设计多组关系特征提取器,能够有效处理不同目标类别之间的相互关系。在训练阶段,该算法将训练数据集划分为meta-train和meta-test集合,然后在较小的meta-train集合中学习关系特征提取器和注意力交互模块,最后在meta-test集合的未知类别中进行目标检测。 综合以上基本思路,该算法通过引入注意力机制和多关系特征提取器来实现Few-shot object detection。该算法在目前的Few-shot目标检测基准测试数据集上进行了实验证明,实现了较高的检测准确率,在很大程度上解决了少样本学习的问题。未来,这个技术还需要进一步实践和推广,使得得到更广泛的使用。 ### 回答3: 本文介绍了一种基于注意力机制RPN(Attention-RPN)和多关系检测器(Multi-Relation Detector)的小样本目标检测技术(Few-shot Object Detection)。该技术可以利用预训练的模型来辅助小样本检测任务,并可以适应新的目标类别。 本文中的Attention-RPN是一种针对小样本学习的改进版本,它可以通过选择性的关注训练数据中的重要区域来提高小样本的性能。同时,Attention-RPN还可以利用先前训练模型的知识来指导小样本的训练过程,从而提高检测结果的准确性。 而多关系检测器则是一种可以检测目标之间关系的模型。通过学习目标之间的关系,可以更好地理解图像中的场景,并且可以更准确地定位和分类目标。本文中的多关系检测器采用了一种新的模型结构,其中用到了一种称为Transformers的自注意力机制,它可以自适应地聚焦于任务中的关键区域,从而提高检测性能。 在实验中,本文采用了COCO、VOC和miniImagenet等数据集进行测试。结果表明,本文所提出的Few-shot Object Detection技术可以在少量样本的情况下取得好的检测结果。同时,Attention-RPN和Multi-Relation Detector也能分别提高小样本和多样本的检测性能,证明它们是十分有效的模型改进方式。 综上所述,本文提出了一种新的小样本目标检测技术,并通过Attention-RPN和Multi-Relation Detector的改进来提高检测性能。该技术对于具有高效率和精度要求的目标检测任务具有十分重要的意义,可能对未来的计算机视觉研究和工业应用产生积极的影响。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值