注意上的注意：用于视觉问答的框架《Attention on Attention: Architectures for VQA》

最新推荐文章于 2022-07-13 16:57:37 发布

Tiám青年

最新推荐文章于 2022-07-13 16:57:37 发布

阅读量1k

点赞数

分类专栏： VQA 计算机视觉

本文链接：https://blog.csdn.net/xiasli123/article/details/103938396

版权

本文详细介绍了视觉问答领域的研究，提出了一种新框架，通过13种注意力机制和简化分类器，提升了模型性能，验证集准确率达到了64.78%，超过先前的63.15%。文章讨论了网络架构、实验分析和结论，对理解VQA模型的优化有深入指导。

摘要由CSDN通过智能技术生成

这是视觉问答论文阅读的系列笔记之一，本文有点长，请耐心阅读，定会有收货。如有不足，随时欢迎交流和探讨。

一、文献摘要介绍

Visual Question Answering (VQA) is an increasingly popular topic in deep learning research, requiring coordination of natural language processing and computer vision modules into a single architecture. We build upon the model which placed first in the VQA Challenge by developing thirteen new attention mechanisms and introducing a simplified classifier. We performed 300 GPU hours of extensive hyperparameter and architecture searches and were able to achieve an evaluation score of 64.78%, outperforming the existing state-of-the-art single model’s validation score of 63.15%.

作者认为视觉问答（VQA）是深度学习研究中越来越受欢迎的主题，它要求将自然语言处理和计算机视觉模块协调成一个单一的体系结构。通过开发13种新的注意力机制并引入简化的分类器，我们基于在VQA挑战赛中首屈一指的模型。我们执行了300个GPU小时的广泛超参数和架构搜索，能够获得64.78％的评估分数，超过了现有的最新单模型的验证分数63.15％。