论文阅读 [TPAMI-2022] Plenty is Plague: Fine-Grained Learning for Visual Question Answering

最新推荐文章于 2024-08-19 00:02:30 发布

北岭狼人

最新推荐文章于 2024-08-19 00:02:30 发布

阅读量151

点赞数

文章标签： TPAMI 计算机视觉机器学习深度学习人工智能

本文链接：https://blog.csdn.net/weixin_42155685/article/details/124053264

版权

论文阅读 [TPAMI-2022] Plenty is Plague: Fine-Grained Learning for Visual Question Answering

论文搜索(studyai.com)

搜索论文: Plenty is Plague: Fine-Grained Learning for Visual Question Answering

搜索论文: http://www.studyai.com/search/whole-site/?q=Plenty+is+Plague:+Fine-Grained+Learning+for+Visual+Question+Answering

关键字(Keywords)

Training; Visualization; Knowledge discovery; Redundancy; Data models; Feature extraction; Training data; Fine-grained learning; visual question answering

机器学习; 机器视觉; 自然语言处理; 强化学习

Actor-Critic; 细粒度视觉; NLP问答; 视觉问答; 噪声标签

摘要(Abstract)

Visual Question Answering (VQA) has attracted extensive research focus recently.

视觉问答（VQA）是近年来研究的热点。.

Along with the ever-increasing data scale and model complexity, the enormous training cost has become an emerging challenge for VQA.

随着数据规模和模型复杂性的不断增加，巨大的培训成本已成为VQA面临的新挑战。.

In this article, we show such a massive training cost is indeed plague.

在本文中，我们展示了如此巨大的培训成本确实是一场瘟疫。.

In contrast, a fine-grained design of the learning paradigm can be extremely beneficial in terms of both training efficiency and model accuracy.

相比之下，学习范式的细粒度设计在训练效率和模型准确性方面都非常有益。.

In particular, we argue that there exist two essential and unexplored issues in the existing VQA training paradigm that randomly samples data in each epoch, namely, the “difficulty diversity” and the “label redundancy”.

特别是，我们认为，在现有的VQA训练范式中，存在两个基本且未被探索的问题，即“难度多样性”和“标签冗余”。.

Concretely, “difficulty diversity” refers to the varying difficulty levels of different question types, while “label redundancy” refers to the redundant and noisy labels contained in individual question type.

具体来说，“难度多样性”指的是不同问题类型的不同难度水平，“标签冗余”指的是单个问题类型中包含的冗余和嘈杂的标签。.

To tackle these two issues, in this article we propose a fine-grained VQA learning paradigm with an actor-critic based learning agent, termed FG-A1C.

为了解决这两个问题，在本文中，我们提出了一种细粒度的VQA学习范式，其中包含一个基于演员-评论家的学习代理，称为FG-A1C。.

Instead of using all training data from scratch, FG-A1C includes a learning agent that adaptively and intelligently schedules the most difficult question types in each training epoch.

FG-A1C没有从头开始使用所有训练数据，而是包含了一个学习代理，它可以在每个训练阶段自适应地、智能地安排最难的问题类型。.

Subsequently, two curriculum learning based schemes are further designed to identify the most useful data to be learned within each inidividual question type.

随后，进一步设计了两个基于课程学习的方案，以确定每个问题类型中需要学习的最有用的数据。.

We conduct extensive experiments on the VQA2.0 and VQA-CP v2 datasets, which demonstrate the significant benefits of our approach.

我们在VQA2上进行了广泛的实验。0和VQA-CP v2数据集，它们展示了我们方法的显著优势。.

For instance, on VQA-CP v2, with less than 75 percent of the training data, our learning paradigms can help the model achieves better performance than using the whole dataset.

例如，在VQA-CP v2上，使用不到75%的训练数据，我们的学习范例可以帮助模型获得比使用整个数据集更好的性能。.

Meanwhile, we also shows the effectivenesss of our method in guiding data labeling.

同时，我们也展示了我们的方法在指导数据标注方面的有效性。.

Finally, the proposed paradigm can be seamlessly integrated with any cutting-edge VQA models, without modifying their structures…

最后，所提出的范例可以与任何前沿VQA模型无缝集成，而无需修改其结构。。.

作者(Authors)

[‘Yiyi Zhou’, ‘Rongrong Ji’, ‘Xiaoshuai Sun’, ‘Jinsong Su’, ‘Deyu Meng’, ‘Yue Gao’, ‘Chunhua Shen’]

北岭狼人

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
论文阅读 [TPAMI-2022] Plenty is Plague: Fine-Grained Learning for Visual Question Answering

论文阅读 [TPAMI-2022] Plenty is Plague: Fine-Grained Learning for Visual Question Answering论文搜索(studyai.com)搜索论文: Plenty is Plague: Fine-Grained Learning for Visual Question Answering搜索论文: http://www.studyai.com/search/whole-site/?q=Plenty+is+Plague:+Fine-G
复制链接

扫一扫