【论文笔记】在CommonsenseQA 上追平人类：通过External Attention 增强 Self-Attention

Elffffffff

已于 2022-09-24 11:01:09 修改

阅读量1.7k

点赞数

文章标签：自然语言处理

于 2022-04-06 16:37:05 首次发布

本文链接：https://blog.csdn.net/elf1110/article/details/123984875

版权

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

论文链接：https://www.microsoft.com/en-us/research/uploads/prod/2021/12/CSQA_KEAR.pdf

Abstract

目前大多数都专注在self-attention 和Transformer架构来提升性能。

本文：使用外部attention机制来增强Transformer架构，将上下文与外部知识结合。将外部信息整合到预测过程。提出了Knowledgeable External Attention for commonsense Reasoning (KEAR)常识推理的知识外部注意，在开放的 CommonsenseQA 研究基准上达到人类同等水平，准确度为 89.4%，而人类准确度为 88.9%

Introduction

Transformer模型开发过程中规模较大的模型往往具有更好地学习能力，尤其是与大规模数据结合。但大量研究表明，这些巨大模型的相应理解和生成能力仍然落后于人类（Bommasani et al., 2021）。此外，这些模型的庞大规模已经在利用、部署、解释和环境影响方面带来了严重的实际挑战，因此基于Transformer的NLP建模的“放大”方法受到质疑。

self-attention机制旨在让模型更好地分析输入数据内部结构，并训练模型使其参数掌握并记住训练数据的所有内容和模式。当给模型一个新的输入 X 时，参数中隐含存储的相关信息知识被激活，以促进对 X 的分析。这可以部分解释为什么用更多数据预训练的较大模型在性能上具有优势。

虽然Transformer模型通过self-attention机制look inward处理输入，但本文通过为模型提供各种来源的相关上下文和知识来使模型look outward；然后让模型对输入进行self-attention，同时计算对知识的外部关注(如图1)

（上下文和知识通常可以以非参数和符号的方式存储（例如，纯文本、知识图和字典条目））

Figure 1: Our proposed method of Knowledgeable External Attention for commonsense Reasoning (KEAR).Related knowledge is retrieved from external sources, e.g., knowledge graph, dictionary and training data, using the input as key and then integrated with the input. While additional external attention layers can be added to the Transformer blocks, we adopt text-level concatenation for external attention, incurring no structural change to the model architecture.

给定一个常识性问题和一个选择，从三个外部来源检索知识：知识图（ConceptNet）、字典（Wiktionary）和标记的训练数据（CommonsenseQA 和 16 个相关的 QA 数据集）。

检索到的知识直接附加到输入并发送到语言模型，而无需对底层架构进行修改。

本文方法优于commonsense reasoning（常识推理）

首先，外部注意力极大地减少了我们系统对大规模模型的依赖，即通过高达 1.5B 参数的模型实现了人类平等。其次，外部信息是通过信息检索和单词匹配等计算效率高的方法获得的，增加了很少主模型的计算成本。最后，输入和知识的文本级连接不会导致 Transformer 模型发生变化，使现有系统能够轻松采用这种新的外部注意机制。

外部注意力的另一个好处是，由于相关知识存储在模型之外，可以轻松地更新知识源以改变其模型。