Extracting Simplified Statements for Factual Question Generation

提取简化语句以生成事实问题

摘要

我们解决了阅读材料中从语言复杂的句子自动生成简明事实问题的问题。我们讨论了复杂句中出现的语义和语用问题,然后提出了从同位语、从句和其他结构中提取简化句的算法。我们推测,我们的方法是有用的,作为一个更大问题产生过程的初步步骤。实验结果表明,该方法比文本压缩算法更适用于实际问题的生成。

介绍

     本文讨论了问题生成(QG),目的是为解释性文本(如百科全书文章和教科书)中的事实信息创建阅读评估。我们相信,QG有潜力帮助教师有效评估学生从阅读材料中获得的基本事实知识,从而使教师能够专注于更复杂的问题和学习活动。我们还相信,对功能性QG的研究可以为更复杂阅读问题的产生以及其他应用问题的研究提供信息,其中一些问题在[17]中有描述。(This paper addresses question generation (QG) for the purpose of creating reading assessments about the factual information that is present in expository texts such as encyclopedia articles and textbooks. We believe that QG has the potential to help teachers efficiently assess students’ acquisition of basic factual knowledge from reading materials, thereby enabling teachers to focus on more complex questions and learning activities. We also believe that research on facual QG can inform research on the generation of more complex reading questions as well as questions for other applications, some of which are described in [17].)

   关于阅读材料的事实性问题,无论是老师还是学生,通常都是简短的,针对的是单一的信息。例如,在微软研究问答语料库1中,由学生提出的问题和回答问题的百科全书文章的摘录组成,问题的平均长度为7.2个单词。然而,文本中的句子通常要长得多,因为各种结构,如连词、从句和同位语,使作者能够在一个句子中传达多个信息。考虑例1,它包含一个非限制性同位语(国家最高领导人)、一个分词短语(返回…)和两个由so连接的从句。2问题的一个重要问题是如何从这些复杂的句子中产生简明的问题(Factual questions about reading materials, either from teachers or students, are usually short and targeted at a single piece of information. For example, in the Microsoft Research Question-Answering Corpus, 1 which consists of questions generated by students and excerpts from encyclopedia articles that answer them, the average length of questions is 7.2 words. However, sentences in texts are typically much longer on average because various constructions such as conjunctions, subordinate clauses, and appositives enable writers to convey multiple pieces of information in a single sentence. Consider Ex. 1, which contains a non-restrictive appositive (the country’s paramount leader), a participial phrase (returning ...), and two clauses conjoined by so. 2 An important issue in QG is how to generate concise questions from these sorts of complex sentences.)

一种可能的解决方案是使用文本压缩技术[11,6]。压缩的目标是把一个可能很长很复杂的句子作为输入,并生成一个简短的版本作为输出,以传递输入中的主要信息。然而,如上所述,句子通常传达多个信息片段。在QG中,我们可能不仅要生成关于主子句中的信息的问题,还要生成关于嵌入在各种嵌套结构中的信息的问题。例如,从例1中的两个句子中,我们可以生成以下句子,以便QG系统生成更全面的问题集:(One possible solution is to use techniques from text compression [11,6]. The goal of compression is to take as input a possibly long and complex sentence, and produce as output a single shortened version that conveys the main piece of information in the input. However, as noted above, sentences often convey multiple pieces of information. In QG, we may want to generate questions not just about the information in the main clause, but also about the information embedded in various nested constructions. For example, from the two sentences in Ex. 1, we might produce the following sentences to allow a QG system to generate a more comprehensive set of questions: )

例子:

(1) Prime Minister Vladimir V. Putin, the country’s paramount leader, cut short a trip to Siberia, returning to Moscow to oversee the federal response.Mr. Putin built his reputation in part on his success at suppressing terrorism, so the attacks could be considered a challenge to his stature.

结果:

(2) Prime Minister Vladimir V. Putin is the country’s paramount leader.
(3) Prime Minister Vladimir V. Putin cut short a trip to Siberia.
(4) Prime Minister Vladimir V. Putin returned to Moscow to oversee the federal response.
(5) Mr. Putin built his reputation in part on his success at suppressing terrorism.
(6) The attacks could be considered a challenge to his stature.

本文提出了一种从复杂句子中提取多个、简单的、句法上和语义上正确的事实陈述的方法,这些简单的句子可以很容易地转化为疑问句。已经有了一些框架,通过提取的声明性句子的转换生成问题,特别是[8,9]。我们的方法特别适合这种方法的句子提取阶段,但也可以作为其他QG系统的预处理步骤。4.我们的方法在语言上受到语义和语用现象研究的推动。我们提出了一个小规模的研究结果,我们的系统输出质量,并与文本压缩方法进行比较[6]。(In this paper, we present a method for extracting multiple, simple, syntactically and semantically correct factual statements from complex sentences.These simple sentences can be readily transformed into questions. There already exist frameworks where questions are generated through transformations of extracted declarative sentences, notably [8,9]. Our method fits particularly well into sentence-extraction stage of that approach, but could also serve as a preprocessing step for other QG systems. 4 Our method is linguistically motivated by research on semantic and pragmatic phenomena. We present the results of a small-scale study of the quality of the output of our system and compare to a text compression approach [6].)

2.Extraction of Textual Entailments

从复杂的输入句中提取简单句的任务本质上是生成可能的句子的特定子集的任务,读者在阅读输入后会认为这是真的。也就是说,我们的目标是通过识别文本继承任务中继承的非正式定义,生成一组受限的文本继承[7,1]。虽然对于一个输入句来说,生成正确的输出是很重要的,但是我们不满足于只保留意义的输出。相反,我们寻求像exs这样的短句。2-6可能会导致简明的问题。
我们提取保留意义的简化句子的方法取决于两种语言现象:语义蕴涵和预设(这两种现象都被我们纳入了非正式的文本蕴涵概念)
。(The task of extracting simple sentences from a complex input sentence is essentially the task of generating a particular subset of the possible sentences that a reader would assume to be true after reading the input. That is, we aim to generate a restricted set of textual entailments, using the informal definition of entailment from the recognizing textual entailment task [7,1]. While it is important to generate outputs that are true given an input sentence, we are not satisfied with outputs that merely preserve meaning. Rather, we seek short sentences such as Exs. 2–6 that are likely to lead to concise questions.Our method for extracting meaning-preserving, simplified sentences depends on two linguistic phenomena: semantic entailment and presupposition (both of which we subsume into the informal notion of textual entailment).)

3 Extraction and Simplification by Semantic Entailment

语义蕴涵的定义如下:语义蕴涵b如果且仅当a为真时,b也为真[12]。本节描述了如何通过删除附加修饰语和语篇连接词以及通过拆分从句和动词短语的连词,从复杂句子中提取简化。这些转换保持了原输入句的真实性条件,同时产生了更简洁的句子,从中可以生成问题。(Semantic entailment is defined as follows: A semantically entails B if and only if for every situation in which A is true, B is also true [12]. This section describes how we extract simplifications from complex sentences by removing adjunct modifiers and discourse connectives and by splitting conjunctions of clauses and verb phrases. These transformations preserve the truth conditions of the original input sentence, while producing more concise sentences from which questions can be generated.)

3.1Removing Discourse Markers and Adjunct Modifiers

许多句子可以通过从从句、动词短语和名词短语中删除某些附加修饰语来简化。例如,从例7中,我们可以通过删除话语标记和限制与欧洲贸易的相关条款来提取例8。(Many sentences can be simplified by removing certain adjunct modifiers from clauses, verb phrases, and noun phrases. For example, from Ex. 7, we can extract Ex. 8 by removing the discourse marker however and the relative clause which restricted trade with Europe.)

(7) However, Jefferson did not believe the Embargo Act, which restricted trade with Europe, would hurt the American economy.
(8) Jefferson did not believe the Embargo Act would hurt the American economy.

例8在例7为真的所有情况下都是真的,因此在语义上是必然的。然而,话语标记物,例如,不影响其本身的真实情况,而是用来告知读者当前句子如何与前面的话语相关。附加修饰语确实传达意义,但同样不影响语义蕴涵。当然,许多附件提供了有用的信息,我们应该为以后的QG步骤保留这些信息。例如,介词短语识别事件发生的位置和时间,可以分别作为问题发生的时间和地点的目标。(Ex. 8 is true in all situations where Ex. 7 is true, and is therefore semantically entailed. Discourse markers such as however do not affect truth conditions in and of themselves, but rather serve to inform a reader about how the current sentence relates to the preceding discourse. Adjunct modifiers do convey meaning, but again do not affect semantic entailment. Of course, many adjuncts provide useful information that we should preserve for later QG steps. For example, prepositional phrases that identify the locations and times at which events occurred can be the target of where and when questions, respectively.)

我们的算法(见第5节)通过删除以下附加词和话语标记来提取简化的包含语句:

我们只删除被逗号抵消的动词短语修饰语(例如,当我和他交谈时,我们简化了,约翰很忙)。但当我和约翰谈话时,他并不忙。)是否删除其他附加的口头修饰语是一个具有挑战性的问题。对于QG来说,从很长的句子中删除其他附加修饰语可能很有用,但决定删除哪个附加修饰语以及保留哪个附加修饰语则留给以后的工作。

3.2 Splitting Conjunctions

我们还拆分了从句和动词短语的连词,如上面的例5和例6所示。在大多数情况下,这些连词中的连词是由原句所限定的。例如,如果约翰周一学习,周二去公园,那么约翰周一学习和约翰周二去公园都是必须的。不包含连词的例外情况包括:我们不拆分的带或和或或或或或或或或或或的连词;以及在向下单调上下文中的连词,如否定或非事实动词(关于这种现象的讨论,见[14,第6章]),我们拆分(我们将对这种相对罕见的情况的适当处理留给将来)。e工作)。(
We also split conjunctions of clauses and verb phrases, as in Ex. 5 and Ex. 6 above. In most cases, the conjuncts in these conjunctions are entailed by the original sentence. For example, given John studied on Monday but went to the park on Tuesday, both John studied on Monday and John went to the park on Tuesday are entailed. Exceptions where conjuncts are not entailed include the following: conjunctions with or and nor, which we do not split; and conjunctions within downward monotone contexts such as negations or non factive verbs (see[14, Chapter 6] for a discussion of this phenomenon), which we do split (we leave proper handling of this relatively rare case to future work). )

 

结论

本文提出了一种从句法复杂句中提取简单陈述句的算法。我们通过识别与QG相关的语义和语用问题来激励我们的提取方法。我们对我们的方法进行了评估,结果表明,它比标准文本压缩基线更适合提取格式良好的需求,我们的系统可以集成到现有的QG框架中。当我们将正式的、外在的评估放在QG系统中以供将来的工作时,我们报告我们已经将我们的方法与[8,9]所描述的QG系统结合起来。集成的QG系统成功地简化了句子并将其转换为问题。例如,从引言中的例1开始,它产生如下简明的问题:(In this paper, we presented an algorithm for extracting simplified declarative sentences from syntactially complex sentences. We motivated our extraction approach by identifying semantic and pragmatic issues that are relevant for QG. We evaluated our approach and showed that it is more suitable for extraction of well-formed entailments than a standard text compression baseline.Our system can be integrated into existing QG frameworks. While we leave formal, extrinsic evaluations within a QG system to future work, we end by reporting that we have incorporated our approach with the QG system described by [8,9]. The integrated QG system successfully simplifies sentences and transforms them into questions. For example, from Ex. 1 in the introduction, it produces concise questions such as the following:)

 

例子:

(1) Prime Minister Vladimir V. Putin, the country’s paramount leader, cut short a trip to Siberia, returning to Moscow to oversee the federal response.Mr. Putin built his reputation in part on his success at suppressing terrorism, so the attacks could be considered a challenge to his stature.

结果:

(2) Prime Minister Vladimir V. Putin is the country’s paramount leader.
(3) Prime Minister Vladimir V. Putin cut short a trip to Siberia.
(4) Prime Minister Vladimir V. Putin returned to Moscow to oversee the federal response.
(5) Mr. Putin built his reputation in part on his success at suppressing terrorism.
(6) The attacks could be considered a challenge to his stature.

问题:

(11) What did Prime Minister Vladimir V. Putin return to Moscow to oversee?
(12) Who cut short a trip to Siberia?
(13) Who was the country’s paramount leader?
(14) Who built his reputation in part on his success at suppressing terrorism?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值