图形化文本编辑器_使用图形转换器从知识图生成文本

最新推荐文章于 2024-04-17 10:10:53 发布

weixin_26704853

最新推荐文章于 2024-04-17 10:10:53 发布

阅读量549

点赞数

文章标签： python 自然语言处理 java 人工智能 nlp

原文链接：https://towardsdatascience.com/text-generation-from-knowledge-graphs-with-graph-transformers-c84156ddd446

版权

图形化文本编辑器

A summary of the structure

结构总结

This 2019 paper is a bit of an anachronism, given the speed of transformer model development and the world-changing impact of GPT-3. Yet, as someone with a cognitive science background, I enjoy models which, by their structure rather than raw computational power, could help to peel back the curtain on cognitive processes. This model is an attempt to solve the problem of representing long-term dependencies.

鉴于变压器模型开发的速度以及GPT-3的世界变化影响力，这份2019年的论文有点过时了。但是，作为具有认知科学背景的人，我喜欢一些模型，这些模型通过其结构而不是原始的计算能力可以帮助拉开认知过程的帷幕。该模型是为了解决表示长期依赖关系的问题。

Language generation broadly consists of two components, planning and realisation. Realisation is purely the task of generating grammatical text, regardless of the meaning, relation to other text, or overall sense of it. Planning is the process of ensuring that long-term dependencies between entities in the text are resolved, and that items relate to each other semantically in the proper way. In this next phrase, the key entities are John, London, England, Bartender.

语言生成大致包括计划和实现两部分。实现纯粹是生成语法文本的任务，而不管其含义，与其他文本的关系或整体含义。规划是确保解决文本中实体之间的长期依赖关系，以及确保各项以适当方式在语义上相互关联的过程。在接下来的短语中，关键实体是约翰，伦敦，英国，酒保。

Planning is managing those entities and the relations between them, and realisation is generating the phrase “John, who works as a bartender, lives in London, the capital of England” (Moryossef, 2019).

规划正在管理这些实体及其之间的关系，而实现则产生了一个短语“约翰，作为调酒师，居住在英国首都伦敦”(Moryossef，2019年)。

GraphWriter generates an abstract from the words in the title and the constructed knowledge graph. Contributions of this paper include:

GraphWriter从标题中的单词和构造的知识图生成摘要。本文的贡献包括：

A new graph transformer encoder that applies the sequence transformer to graph structured inputs
一种新的图形转换器编码器，将序列转换器应用于图形结构化的输入
Shows how IE output can be transformed into a connected unlabeled graph for use in attention based encoders
说明如何将IE输出转换为连接的未标记图，以用于基于注意力的编码器
A dataset of knowledge graphs paired with scientific texts for further study
知识图与科学文本配对的数据集，以供进一步研究

Before the input goes into the encoder (more on that later), it has to be arranged in the right way. Input for this model goes in two channels, the title, and a knowledge graph of the entities and relations.

在输入信号进入编码器之前(稍后会详细介绍)，必须以正确的方式进行排列。该模型的输入有两个通道，即标题，实体和关系的知识图。

数据集 (Dataset)

For this, the AGENDA dataset was introduced — based off 40k paper titles and abstracts taken from the top 12 AI conferences, taken from the Semantic Scholar Corpus. (Ammar et al, 2018) After the graph creation and preprocessing below, the dataset was fed to the model. The full repo, including the dataset, is available here.

为此，引入了AGENDA数据集-基于来自语义学者语料库的前12个AI会议的40k论文标题和摘要。 (Ammar et al，2018)在下面的图形创建和预处理之后，数据集被馈送到模型中。完整的仓库(包括数据集)可在此处获得。

图形预处理： (Graph Pre-Processing:)

Creation of a knowledge graph:

创建知识图：

The NER/IE SciIE system of Luan et al (2017) is applied, which extracts entities and labels them, as well as creates coreference annotations
应用了Luan等人(2017)的NER / IE SciIE系统，该系统提取实体并对其进行标记，并创建共引用注释
These annotations are then collapsed into single labelled edges
然后将这些注释折叠为单个标记的边缘

Image for post — Koncel-Kedziorski (Allen Institute) | 2019

3. Each graph is then converted to a connected graph using an added Global Node that all other nodes are connected to.

3.然后，使用添加的所有其他节点都连接到的全局节点将每个图转换为连接图。

4. Each labelled edge is then replaced with two nodes representing the relation in each direction, and the new edges are unlabeled. This allows the graph to be represented as an adjacency matrix, which is a necessary precondition for easy processing.

4.然后，将每个标记的边替换为代表每个方向上的关系的两个节点，并且未标记新的边。这允许将图形表示为邻接矩阵，这是易于处理的必要前提。

One of the key features in the formation of this graph is the addition of a global node G which all entities are connected to, which transforms the disconnected graphs into connected graphs. Thus, the end product is a connected, unlabelled, bipartite graph.

该图形成中的关键特征之一是添加了所有实体都连接到的全局节点G，该节点将断开连接的图转换为连接的图。因此，最终产品是连接的，未标记的二部图。

模型架构： (Model Architecture:)

This model uses encoder-decoder architecture, with the unlabeled, bipartite graph, and the title, as inputs.

该模型使用编码器-解码器体系结构，并以未标记的二等图和标题作为输入。

Title encoder: The title is encoded with a BiLSTM, using 512-dimensional embeddings. No pre-trained word embeddings were used for this.

标题编码器 ：使用512维嵌入，使用BiLSTM对标题进行编码。没有为此使用预训练的单词嵌入。

Graph encoder:

图形编码器 ：

The first step is that each vertex representation vi is contextualized by attending to all other vertices in v’s neighbourhood.

第一步是，通过关注v邻域中的所有其他顶点，将每个顶点表示vi关联起来。

The graph encoder then creates a set of vertex embeddings by concatenating the products of attention weights resulting from N attention heads.

然后，图形编码器通过将N个关注头所产生的关注权重的乘积级联来创建一组顶点嵌入。

These embeddings are then augmented with “block” networks, consisting of multiple stacked blocks of two layers each, with the form:

然后用“块”网络扩充这些嵌入，这些“块”网络由多个堆叠的块组成，每个块有两层，其形式为：

The end result is a list of entities, relations, and their context with the global node, called graph contextualized vertex encodings.

最终结果是实体，关系及其与全局节点的上下文的列表，称为图上下文化顶点编码 。

解码器(图形和标题) (Decoder (graph and title))

The decoder is attention-based, with a copy mechanism for copying input from the entities in the knowledge graph, and words in the title. It also uses a hidden state ht at each timestep t.

解码器是基于注意力的，具有复制机制，用于复制知识图中实体的输入和标题中的单词。它还在每个时间步长t使用隐藏状态h t 。

From the encodings created with the encoder, context vectors Cg and Cs are computed using multi-headed attention

从使用编码器创建的编码中，使用多头注意力计算上下文向量Cg和Cs

Here, Cs (title context vector) is computed the same way as Cg. The two are concatenated together to make Ct, the total context vector.

在此，Cs(标题上下文向量)的计算方法与Cg相同。两者串联在一起构成Ct，即整体上下文向量。

From this context vector, the probability of copying a token from a word in the title or an entity name is computed from the total context vector multiplied by Wcopy.

根据该上下文向量，从总上下文向量乘以Wcopy计算出从标题中的单词或实体名称复制令牌的概率。

The final next-token probability distribution is:

最终的下一令牌概率分布为：

实验 (Experiments)

GraphWriter gets compared to GAT (graph transformer replaced with graph attention), EntityWriter (like GraphWriter, but does not use the graph structure), Rewriter (Wang et al, 2018) (only uses title encodings).

将GraphWriter与GAT(用图注意力代替图形转换器)，EntityWriter(类似于GraphWriter，但不使用图结构)，重写器(Wang等人，2018)(仅使用标题编码)进行比较。

Human and automatic evaluation metrics used:

使用的人工和自动评估指标：

Human evaluation: Domain experts judging abstracts for whether they fit the title. Best-Worst scaling (Louviere and Woodworth, 1991)
人工评估：领域专家会判断摘要是否适合标题。最差的缩放比例(Louviere和Woodworth，1991年)
BLEU (Papineni et al, 2002)
BLEU(Papineni等，2002)
METEOR (Denkowski and Lavie, 2014) (precision and recall over the unigram frequencies of the generated output versus the original abstracts)
流星(Denkowski和Lavie，2014年)(相对于原始摘要，生成的输出的单字图率的精确度和召回率)

automatic evaluation:

自动评估：

human evaluation:

人工评估：

There was a particular thing about this paper that I found mildly disappointing; the ‘subject matter experts’ were undergrad CS students. While undergrads can know a lot by the time they graduate, I don’t know what familiarity they typically had with academic papers in computer science.

我发现这篇论文有一个特别令人失望的地方： “主题专家”是本科生CS学生。虽然本科生到毕业时就知道很多，但我不知道他们通常对计算机科学学术论文的熟悉程度。

What I thought to be most telling about the human judgments of abstract quality was that while the human output wasn’t always deemed to be the best, it was never the worst. That led me to suspect that there was something about the generated abstracts that the relatively unseasoned undergrads couldn’t quite put their fingers on, but nonetheless tipped them off that something was a bit amiss.

对于人类对抽象质量的判断，我认为最能说明问题的是，尽管人类的产出并不一定总是被认为是最好的，但从来都不是最坏的。这使我怀疑生成的摘要中有些东西，相对较无经验的本科生无法完全动手，但仍然提示他们有些不对劲。

The question, after looking at the decoding mechanism used for generation, is how much was ultimately copied versus generated? For this, I emailed the author, Rik Koncel-Kedziorski, for copies of the generated abstracts and the original documents, which he graciously provided.

在查看用于生成的解码机制后，问题是最终复制了多少还是生成了多少？为此，我通过电子邮件发送给作者Rik Koncel-Kedziorski，以获取他所提供的摘要和原始文档的副本。

Original abstract:

原始摘要 ：

we present a learning architecture for lexical semantic classification problems that supplements task-specific training data with background data encoding general '' world knowledge '' . the learning architecture compiles knowledge contained in a dictionary-ontology into additional training data , and integrates task-specific and background data through a novel hierarchical learning architecture . experiments on a word sense disambiguation task provide empirical evidence that this '' hierarchical learning architecture '' outperforms a state-of-the-art standard '' flat '' one .

The GraphWriter-generated abstract:

GraphWriter生成的摘要 ：

in this paper , we present a novel learning architecture for lexical semantic classification problems , in which a learning architecture can be trained on a large amount of training data . in the proposed learning architecture , a hierarchical learning architecture is learned to select the most informative examples from the source domain to the target domain . the learning architecture is trained in a hierarchical learning architecture , where the background data is learned from the training data in the target domain . the learning architecture is trained on data from the source domain and a target domain in the target domain . experimental results show that the proposed learning architecture is effective in improving the performance of lexical semantic classification problems .

分析： (Analysis:)

One of the key questions investigated here was whether knowledge helps, in some explicitly encoded form, as opposed to something to be just absorbed by a model with many parameters. In this case, it did. Rewriter, an LSTM-based model that just used title words to generate abstracts, performed the worst on every evaluation metric chosen. EntityWriter used more information than GraphWriter, and was an effective control in that it used the same entities extracted, but without the context provided by the graph. It performed better than no knowledge used in any form, but was still outperformed by a model which used the context created by the graph.

此处研究的关键问题之一是，知识是否以某种显式编码的形式有所帮助，而不是仅仅由具有许多参数的模型吸收的事物。在这种情况下，它做到了。 Rewriter是一个基于LSTM的模型，仅使用标题词来生成摘要，在选择的每个评估指标上表现最差。 EntityWriter比GraphWriter使用更多的信息，并且是有效的控件，因为它使用提取的相同实体，但没有图提供的上下文。它的表现要好于没有任何形式的知识，但仍优于使用图创建的上下文的模型。

I think it’s important here to not view GraphWriter as being in direct competition with GPT-3 for plausible human output, it clearly isn’t, and the never-experts I unscientifically polled didn’t have difficulty telling which one was human-generated vs which one was clearly not. But, that wasn’t the test, or the goal. The test was whether a system which used structured domain knowledge would do better than one that didn’t, such as EntityWriter, and in that case, it did.

我认为在这里重要的是，不要认为GraphWriter与GPT-3在可能的人类产出方面存在直接竞争，这显然不是事实，而且我以不科学的方式调查过的永不过时的专家也很难判断哪个是人为产生的vs显然不是。但这不是测试，也不是目标。测试的目的是，使用结构化域知识的系统是否会比不使用结构化领域知识的系统(例如EntityWriter)做得更好，在这种情况下，它会做得更好。