nlp365的第121天nlp论文摘要摘要的概念指针网络

内置AI NLP365(INSIDE AI NLP365)

Project #NLP365 (+1) is where I document my NLP learning journey every single day in 2020. Feel free to check out what I have been learning over the last 262 days here. At the end of this article, you can find previous papers summary grouped by NLP areas :)

项目#NLP365(+1)是我记录我的NLP的学习之旅的每一天在2020年随时检查出什么,我一直在学习,在过去262天这里。 在本文的结尾,您可以找到按NLP领域分组的以前的论文摘要:)

Today’s NLP paper is Concept Pointer Network for Abstractive Summarization. Below are the key takeaways of the research paper.

今天的NLP论文是抽象摘要的概念指针网络。 以下是研究论文的主要内容。

目标与贡献 (Objective and Contribution)

Proposed Concept Pointer Network for abstractive summarisation, which uses knowledge-based and context-aware conceptualisations to derive a set of candidate concepts. The model would then choose between the concept set and the original source text when generating abstractive summaries. Both automatic and human evaluation were conducted on generated summaries.

拟议的概念指针网络,用于抽象性摘要,该网络使用基于知识和上下文感知的概念来推导一组候选概念。 然后,模型将在生成抽象摘要时在概念集和原始源文本之间进行选择。 对生成的摘要进行自动和人工评估。

The proposed Concept Pointer Network doesn’t simply copy text from the source document, it would also generate new abstract concepts from human knowledge as shown below:

提议的概念指针网络不仅可以从源文档中复制文本,还可以从人类知识中生成新的抽象概念,如下所示:

Image for post
The Concept Pointer Network Framework [1]
概念指针网络框架[1]

On top of our novel model architecture, we also proposed a distant supervised learning technique to allow our model to adapt to different datasets. Both automatic and human evaluation shown strong improvement over SOTA baselines.

在我们新颖的模型架构之上,我们还提出了一种远程监督学习技术,以使我们的模型能够适应不同的数据集。 自动评估和人工评估都显示出在SOTA基准之上的强大改进。

拟议模型 (The Proposed Model)

Our model architecture consists of two modules:

我们的模型架构包含两个模块:

  1. Encoder-Decoder

    编码器-解码器
  2. Concept Pointer Generator

    概念指针生成器
Image for post
The architecture of Concept Pointer Generator [1]
概念指针生成器的体系结构[1]

编码器-解码器 (Encoder-Decoder)

The encoder-decoder framework consists of a two-layer bidirectional LSTM-RNN encoder and an one-layer LSTM-RNN decoder with attention mechanism. Each word in the input sequence is represented by the concatenation of the forward and backward hidden states. The context vector is computed by applying the attention mechanism over the hidden state representations. This context vector is feed to our decoder, where it will use the context vector to determine the probability to generating new words (p_gen) from our vocabulary distribution.

编码器-解码器框架由两层双向LSTM-RNN编码器和一层具有注意机制的LSTM-RNN解码器组成。 输入序列中的每个单词都由向前和向后隐藏状态的串联表示。 通过将注意力机制应用于隐藏状态表示来计算上下文向量。 该上下文向量被馈送到我们的解码器,在解码器中,它将使用上下文向量来确定从我们的词汇分布中生成新单词(p_gen)的可能性。

概念指针生成器 (Concept Pointer Generator)

Firstly, we use the Microsoft Concept Graph to map a word to its related concepts. This knowledge base covers a huge concept space and the relationships between concepts and entities are probabilistic depending on how strongly related they are. Essentially, the concept graph will take in the word and estimates the probability that this word belongs to a particular concept, p(c|x). With probabilities, this means that given each word, the concept graph will have a set of concept candidates (with different confidence level) that it believes the word belongs to. In order for our model to select the right concept candidate, for example, distinguishing between fruit and company concept for the word “apple”, we will use the context vector from the encoder-decoder framework.

首先,我们使用Microsoft概念图将单词映射到其相关概念。 该知识库涵盖了巨大的概念空间,并且概念和实体之间的关系是概率性的,这取决于它们之间的紧密关联程度。 本质上,概念图将吸收单词并估计该单词属于特定概念的概率p(c | x)。 有了概率,这意味着给定每个单词,概念图将具有它认为该单词所属的一组概念候选(具有不同的置信度)。 为了使我们的模型选择合适的概念候选者,例如,区分“苹果”一词的水果和公司概念,我们将使用编码器-解码器框架中的上下文向量。

We will use the context vector to update the concept distribution. We compute the updated weights by feeding the current hidden state, the context vector, and the current concept candidate into a softmax classifier. This updated weight is then added to the existing concept probability to factor in the context of the input sequence, allowing us to derive the context-aware concept probability.

我们将使用上下文向量来更新概念分布。 我们通过将当前的隐藏状态,上下文向量和当前的候选概念馈入softmax分类器来计算更新后的权重。 然后将此更新的权重添加到现有概念概率中,以考虑输入序列的上下文,从而使我们能够导出上下文感知的概念概率。

Our concept pointer network, consists of the normal pointer to the source document as well as a concept pointer to relevant concepts given the source document. The concept pointer is scaled element-wise by the attention distribution and are added to the normal pointer (attention distribution). This would be the copy distribution where the model copies from and it includes concept distribution on top of the usual text distribution over original source document.

我们的概念指针网络包括指向源文档的普通指针以及指向给定源文档的相关概念的概念指针。 概念指针通过注意力分布逐元素缩放,并添加到普通指针(注意力分布)中。 这将是模型从中进行复制的副本分发,它包括在原始源文档的常规文本分发之上的概念分发。

远程监督以适应模型 (Distant supervision for model adaption)

If the summary-document pairs of our training set are different than to the testing set, our model would perform poorly. To counter this, we would want to retrain our model to lower this dissimilarity in our final loss. To do so, we need labels to indicate how close our training set is to our test set. In order to create these labels, we use KL divergence between each training reference summary and a set of documents from the test set. In other words, the training pairs are distantly-labelled. The representations of both reference summaries and documents are computed by summing the constituent word embeddings. This KL divergence loss function is included in the training process and it measures the overall distance between the test set and each of our reference summary-document pairs. This allows us to determine whether our training set is relevant or irrelevant for model adaption.

如果我们的训练集的摘要文档对与测试集不同,那么我们的模型将表现不佳。 为了解决这个问题,我们希望重新训练我们的模型,以减少最终损失中的这种差异。 为此,我们需要标签来指示训练集与测试集的距离。 为了创建这些标签,我们在每个培训参考摘要和测试集中的一组文档之间使用KL散度。 换句话说,训练对被远距离标记。 参考摘要和文档的表示形式都是通过将构成词嵌入的总和来计算的。 该KL散度损失函数包含在训练过程中,它测量测试集与我们的每个参考摘要文档对之间的总距离。 这使我们能够确定训练集与模型适应性相关还是无关紧要。

实验设置和结果 (Experimental Setup and Results)

There are two evaluation datasets: Gigaword and DUC-2004. The evaluation metric is the ROUGE score.

有两个评估数据集:Gigaword和DUC-2004。 评估指标是ROUGE得分。

型号比较 (Models comparison)

There are 8 baseline models:

有8种基准模型:

  1. ABS+. Abstractive summarisation model

    ABS + 。 抽象总结模型

  2. Luong-NMT. LSTM encoder-decoder

    Luong-NMT 。 LSTM编码器-解码器

  3. RAS-Elman. CNN for encoder and RNN with attention for decoder

    RAS-Elman 。 编码器的CNN和解码器的RNN

  4. Seq2seq+att. BiLSTM encoder and LSTM with attention decoder

    Seq2seq + att 。 BiLSTM编码器和带注意解码器的LSTM

  5. Lvt5k-lsent. Uses temporal attention on decoder to reduce repetition in summary

    Lvt5k-ent 。 在解码器上使用时间上的注意力以减少重复

  6. SEASS. Uses selective gate to control information flowing from encoder to decoder

    SEASS 。 使用选择性门控制从编码器到解码器的信息流

  7. Pointer-generator. Normal PG

    指针生成器。 普通PG

  8. CGU. Uses convolutional gated unit and self-attention for encoding

    CGU 。 使用卷积门控单元和自我注意力进行编码

结果(Results)

Image for post
Table 1 — ROUGE Results and Comparisons between Concept Pointer and other benchmark models. Table 2 — Out-of-Vocabulary problem analysis. Table 3 — Measures of Abstractiveness [1]
表1 — ROUGE结果以及Concept Pointer和其他基准模型之间的比较。 表2 —词汇外问题分析。 表3 —抽象性度量[1]

In table 1, our concept pointer outperformed all the baseline models on all metrics except RG-2 on Gigaword (CGU scored the highest). In table 2, we show that the summaries generated by concept pointer has the lowest percentage of UNK words, alleviated the OOV problem. In table 3, we showcase the abstractiveness of our generated summaries. We show that the summaries generated by our concept pointer has a relatively high abstractiveness level and it’s close to the reference summary level.

在表1中,除Gigaword上的RG-2(CGU得分最高)外,我们的概念指标在所有指标上均胜过所有基线模型。 在表2中,我们显示了概念指针生成的摘要中UNK词的百分比最低,从而减轻了OOV问题。 在表3中,我们展示了所生成摘要的抽象性。 我们表明,由概念指针生成的摘要具有相对较高的抽象性级别,并且接近于参考摘要级别。

We experimented with two different training strategies: Reinforcement learning (RL) and Distant supervision (DS). Both training strategies applied to concept pointer outperformed the normal concept pointer. Furthermore, in the DUC-2004 dataset, concept pointer + DS outperformed concept pointer + RL consistently, showcasing the effect of distant supervision for better model adaption.

我们尝试了两种不同的培训策略:强化学习(RL)和远程监督(DS)。 应用于概念指针的两种训练策略均优于常规概念指针。 此外,在DUC-2004数据集中,概念指针+ DS始终优于概念指针+ RL,展示了远程监管对更好地适应模型的效果。

情境感知概念化 (Context-aware conceptualisation)

We want to measure the impact of concept update strategy and so we have experimented with different number of concept candidates. The results are as shown below. There are only small variation in ROUGE scores between different number of concept candidates.

我们想衡量概念更新策略的影响,因此我们尝试了不同数量的概念候选者。 结果如下。 不同数量的概念候选者之间的ROUGE得分差异很小。

Image for post
ROUGE results on Gigaword and DUC-2004 datasets [1]
Gigaword和DUC-2004数据集上的ROUGE结果[1]

人工评估 (Human evaluations)

We conducted human evaluations where each volunteer has to answer the following questions:

我们进行了人工评估,每个志愿者必须回答以下问题:

  1. Abstraction — How appropriate are the abstract concepts in the summary?

    抽象-摘要中的抽象概念是否合适?

  2. Overall Quality — How readable, relevant, and informative is the summary?

    总体质量-摘要的可读性,相关性和信息性如何?

We randomly selected 20 examples, each with three different summaries (from three models) and score how often does each type of summary gets pick. The results are shown below, which showcase the concept pointer network outperformed both the seq2seq model and pointer generator. The generated summaries seems to be fluent and informative, however, it’s still not as abstractive as human reference summaries.

我们随机选择了20个示例,每个示例都有三个不同的摘要(来自三个模型),并对每种摘要的选择频率进行评分。 结果如下所示,它展示了概念指针网络的性能优于seq2seq模型和指针生成器。 生成的摘要似乎很流利,内容丰富,但是,它仍然不像人类参考摘要那样抽象。

Image for post
Human evaluation on abstraction and overall quality [1]
人工评估抽象和整体质量[1]

结论与未来工作 (Conclusion and Future Work)

On top of our novel model architecture, we also proposed a distant supervised learning technique to allow our model to adapt to different datasets. Both automatic and human evaluation shown strong improvement over SOTA baselines.

在我们新颖的模型架构之上,我们还提出了一种远程监督学习技术,以使我们的模型能够适应不同的数据集。 自动评估和人工评估都显示出在SOTA基准之上的强大改进。

资源: (Source:)

[1] Wang, W., Gao, Y., Huang, H.Y. and Zhou, Y., 2019, November. Concept pointer network for abstractive summarization. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3067–3076).

[1]王威,高Y,黄慧慧,周Y,2019年11月。 概念性摘要抽象的指针网络。 在《 2019年自然语言处理经验方法会议》和第9届国际自然语言处理联合会议(EMNLP- IJCNLP)的会议记录中(第3067至3076页)。

Originally published at https://ryanong.co.uk on April 30, 2020.

最初于2020年4月30日https://ryanong.co.uk发布

方面提取/基于方面的情感分析 (Aspect Extraction / Aspect-based Sentiment Analysis)

总结 (Summarisation)

其他 (Others)

翻译自: https://towardsdatascience.com/day-121-of-nlp365-nlp-papers-summary-concept-pointer-network-for-abstractive-summarization-cd55e577f6de

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值