iclr 2022 Compositional attention: Disentangling search and retrieval

最新推荐文章于 2024-09-13 15:55:51 发布

文三路张同学

最新推荐文章于 2024-09-13 15:55:51 发布

阅读量308

点赞数

分类专栏：我的科研之路~ 文章标签：人工智能深度学习

本文链接：https://blog.csdn.net/qq_36160277/article/details/128631385

版权

我的科研之路~ 同时被 2 个专栏收录

46 篇文章 2 订阅

订阅专栏

论文

20 篇文章 3 订阅

订阅专栏

文章指出了Multi-headAttention中的参数冗余问题，特别是在搜索和检索操作中。为解决此问题，提出了CompositionalAttention，它允许更灵活地组合搜索和检索，从而提高性能并减少冗余。通过动态选择价值矩阵，CompositionalAttention能更好地处理OODGeneralization任务。

摘要由CSDN通过智能技术生成

Mittal S, Raparthy S C, Rish I, et al. Compositional attention: Disentangling search and retrieval[J]. arXiv preprint arXiv:2110.09419, 2021.

Motivation

作者认为，在multi-head attention中存在redundant parameters的问题。比如在下面这张图中，对于retrieve location的操作在multi-head attention中进行了两次，这造成了参数的冗余。

为了解决这个问题，They propose Compositional Attention, where the search and retrieval operations can be flexibly composed: the key-query search mechanism is no longer bound to a fifixed value retrieval matrix, instead it is dynamically selected from a shared pool of value matrices accessible by several compositional attention heads. This results in increased flexibility and improved performance.

主要贡献

这篇文章的主要贡献有三点：

指出了multi-head attention中的不足
提出了解决方法Compositional attention
利用自己的方法解决了multi-head attention中出现的问题，并能够很好地解决OOD Generalization中出现的问题。
讨论了一下Compositional attention的计算复杂度

Multi-head attention中的不足

Key-Value Attention: Given a set of queries and key-value pairs, key-value attention computes a scaled cosine similarity metric between each query and the set of keys. This similarity score determines the contribution of each value in the output for the corresponding query.

你看，在multi-head attention中，不也是先计算query与key之间的映射，然后再通过value进行检索的吗？为什么这样的方法就会造成信息的冗余呢？

我们可以先把multi-head attention分为两个阶段：search、 retrieval。