【视觉问答】Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem

原文标题: Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem
原文代码: 暂无
发布年度: 2022
发布期刊: TPAMI


摘要

Several studies have recently pointed that existing Visual Question Answering (VQA) models heavily suffer from the language prior problem, which refers to capturing superficial statistical correlations between the question type and the answer whereas ignoring the image contents. Numerous efforts have been dedicated to strengthen the image dependency by creating the delicate models or introducing the extra visual annotations. However, these methods cannot sufficiently explore how the visual cues explicitly affect the learned answer representation, which is vital for language reliance alleviation. Moreover, they generally emphasize the class-level discrimination of the learned answer representation, which overlooks the more fine-grained instance-level patterns and demands further optimization. In this paper, we propose a novel collaborative learning scheme from the viewpoint of visual perturbation calibration, which can better investigate the fine-grained visual effects and mitigate the language prior problem by learning the instance-level characteristics. Specifically, we devise a visual controller to construct two sorts of curated images with different perturbation extents, based on which the collaborative learning of intra-instance invariance and inter-instance discrimination is implemented by two well-designed discriminators. Besides, we implement the information bottleneck modulator on latent space for further bias alleviation and representation calibration. We impose our visual perturbation-aware framework to three orthodox baselines and the experimental results on two diagnostic VQA-CP benchmark datasets evidently demonstrate its effectiveness. In addition, we also justify its robustness on the balanced VQA benchmark.


背景

现有的 VQA 模型存在问题(即问题类型)和答案之间存在表面捷径的问题。结果,视觉作用被削弱,VQA退化为纯粹的语言匹配问题。 目前主流方法通过增强图像依赖性来缓解语言先验问题,可以分为两类:基于视觉注释的方法和无视觉注释的方法。前者明确利用外部视觉注释来指导视觉内容的学习,但收集人工注释既昂贵又耗时。相反,无视觉注释的方法成为范式。
但仍然存在以下限制:
1)使用视觉增强策略减少对语言的依赖,但无法充分确定这些视觉线索如何影响学习到的答案表示。
2)现有方法通常强调答案表示的类间区分,而忽视了更精细的内部表示,如实例间判别性和实例内不变性,因此可能导致较差的性能。

创新点

本文提出了一个简单而有效的视觉扰动感知校准框架,用于减轻VQA中的语言依赖。这是首次尝试从实例级判别特征表示的角度克服语言先验问题。

  • 本文设计了四个组件,包括掩模扰动控制器、信息瓶颈调制器、类感知判别器和关系感知判别器。在视觉扰动控制器中,在原始图像特征的基础上自动构建两种具有不同扰动程度的手动生成的图像特征。当遇到硬扰动图像特征和原始图像特征时,类感知鉴别器负责通过区分它们的语义差异来捕获实例间判别。当遇到软扰动图像,关系感知鉴别器致力于学习实例内不变相关性。为了使学习到的潜在表示包含最少的足够信息并免受输入偏差的影响,进一步应用变分信息瓶颈调制器来更好地促进两个判别器的学习。
  • 视觉扰动感知学习策略与模型无关,可以轻松地合并到现有最先进的 VQA 模型中,以减少语言依赖并提高其推理性能。

VQA任务定义:

给定一批由 B 个图像 Vi、问题 Qi 和真实答案集 Ai 三元组组成的数据样本,表示为 B = { V i , Q i , A i } i = 1 B B = \{V_i, Q_i, A_i \}^B_{i=1} B={ Vi,Qi,Ai}i=1B,VQA 模型旨在学习映射函数 H v q a H_{vqa} Hvqa产生准确的答案。它通常涉及三个部分:视觉和文本编码器、VQA 基础模型和答案分类器。

作为视觉和文本编码器,每个图像 v i v_i vi通过预训练的 Faster-RCNN 模型 U v U_v Uv 编码为视觉嵌入矩阵 V i = U v ( v i ) V_i = U_v(v_i) Vi=Uv(vi

  • 23
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
For macroscopically anisotropic media in which the variations in the phase stiffness tensor are small, formal solutions to the boundary-value problem have been developed in the form of perturbation series (Dederichs and Zeller, 1973; Gubernatis and Krumhansl, 1975 ; Willis, 1981). Due to the nature of the integral operator, one must contend with conditionally convergent integrals. One approach to this problem is to carry out a “renormalization” procedure which amounts to identifying physically what the conditionally convergent terms ought to contribute and replacing them by convergent terms that make this contribution (McCoy, 1979). For the special case of macroscopically isotropic media, the first few terms of this perturbation expansion have been explicitly given in terms of certain statistical correlation functions for both three-dimensional media (Beran and Molyneux, 1966 ; Milton and Phan-Thien, 1982) and two-dimensional media (Silnutzer, 1972 ; Milton, 1982). A drawback of all of these classical perturbation expansions is that they are only valid for media in which the moduli of the phases are nearly the same, albeit applicable for arbitrary volume fractions. In this paper we develop new, exact perturbation expansions for the effective stiffness tensor of macroscopically anisotropic composite media consisting of two isotropic phases by introducing an integral equation for the so-called “cavity” strain field. The expansions are not formal but rather the nth-order tensor coefficients are given explicitly in terms of integrals over products of certain tensor fields and a determinant involving n-point statistical correlation functions that render the integrals absolutely convergent in the infinite-volume limit. Thus, no renormalization analysis is required because the procedure used to solve the integral equation systematically leads to absolutely convergent integrals. Another useful feature of the expansions is that they converge rapidly for a class of dispersions for all volume fractions, even when the phase moduli differ significantly.
06-02

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值