论文阅读 [TPAMI-2022] Plenty is Plague: Fine-Grained Learning for Visual Question Answering

论文阅读 [TPAMI-2022] Plenty is Plague: Fine-Grained Learning for Visual Question Answering

论文搜索(studyai.com)

搜索论文: Plenty is Plague: Fine-Grained Learning for Visual Question Answering

搜索论文: http://www.studyai.com/search/whole-site/?q=Plenty+is+Plague:+Fine-Grained+Learning+for+Visual+Question+Answering

关键字(Keywords)

Training; Visualization; Knowledge discovery; Redundancy; Data models; Feature extraction; Training data; Fine-grained learning; visual question answering

机器学习; 机器视觉; 自然语言处理; 强化学习

Actor-Critic; 细粒度视觉; NLP问答; 视觉问答; 噪声标签

摘要(Abstract)

Visual Question Answering (VQA) has attracted extensive research focus recently.

视觉问答(VQA)是近年来研究的热点。.

Along with the ever-increasing data scale and model complexity, the enormous training cost has become an emerging challenge for VQA.

随着数据规模和模型复杂性的不断增加,巨大的培训成本已成为VQA面临的新挑战。.

In this article, we show such a massive training cost is indeed plague.

在本文中,我们展示了如此巨大的培训成本确实是一场瘟疫。.

In contrast, a fine-grained design of the learning paradigm can be extremely beneficial in terms of both training efficiency and model accuracy.

相比之下,学习范式的细粒度设计在训练效率和模型准确性方面都非常有益。.

In particular, we argue that there exist two essential and unexplored issues in the existing VQA training paradigm that randomly samples data in each epoch, namely, the “difficulty diversity” and the “label redundancy”.

特别是,我们认为,在现有的VQA训练范式中,存在两个基本且未被探索的问题,即“难度多样性”和“标签冗余”。.

Concretely, “difficulty diversity” refers to the varying difficulty levels of different question types, while “label redundancy” refers to the redundant and noisy labels contained in individual question type.

具体来说,“难度多样性”指的是不同问题类型的不同难度水平,“标签冗余”指的是单个问题类型中包含的冗余和嘈杂的标签。.

To tackle these two issues, in this article we propose a fine-grained VQA learning paradigm with an actor-critic based learning agent, termed FG-A1C.

为了解决这两个问题,在本文中,我们提出了一种细粒度的VQA学习范式,其中包含一个基于演员-评论家的学习代理,称为FG-A1C。.

Instead of using all training data from scratch, FG-A1C includes a learning agent that adaptively and intelligently schedules the most difficult question types in each training epoch.

FG-A1C没有从头开始使用所有训练数据,而是包含了一个学习代理,它可以在每个训练阶段自适应地、智能地安排最难的问题类型。.

Subsequently, two curriculum learning based schemes are further designed to identify the most useful data to be learned within each inidividual question type.

随后,进一步设计了两个基于课程学习的方案,以确定每个问题类型中需要学习的最有用的数据。.

We conduct extensive experiments on the VQA2.0 and VQA-CP v2 datasets, which demonstrate the significant benefits of our approach.

我们在VQA2上进行了广泛的实验。0和VQA-CP v2数据集,它们展示了我们方法的显著优势。.

For instance, on VQA-CP v2, with less than 75 percent of the training data, our learning paradigms can help the model achieves better performance than using the whole dataset.

例如,在VQA-CP v2上,使用不到75%的训练数据,我们的学习范例可以帮助模型获得比使用整个数据集更好的性能。.

Meanwhile, we also shows the effectivenesss of our method in guiding data labeling.

同时,我们也展示了我们的方法在指导数据标注方面的有效性。.

Finally, the proposed paradigm can be seamlessly integrated with any cutting-edge VQA models, without modifying their structures…

最后,所提出的范例可以与任何前沿VQA模型无缝集成,而无需修改其结构。。.

作者(Authors)

[‘Yiyi Zhou’, ‘Rongrong Ji’, ‘Xiaoshuai Sun’, ‘Jinsong Su’, ‘Deyu Meng’, ‘Yue Gao’, ‘Chunhua Shen’]

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值