VQA论文汇总

7 篇文章 1 订阅
7 篇文章 1 订阅

Awesome Text VQA

Text related VQA is a fine-grained direction of the VQA task, which only focuses on the question that requires to read the textual content shown in the input image.

Datasets

Dataset#Train+Val Img#Train+Val Que#Test Img#Test QueImage SourceLanguage
Text-VQA25,11939,6023,3535,734[1]EN
ST-VQA19,02726,3082,9934,163[2, 3, 4, 5, 6, 7, 8]EN
OCR-VQA186,775901,71720,797100,429[9]EN
EST-VQA17,04719,3624,0004,525[4, 5, 8, 10, 11, 12, 13]EN+CH
DOC-VQA11,48044,8121,2875,188[14]EN
VisualMRC7,96023,8542,2376,708self-collected webpage screenshotEN

Image Source:

[1] OpenImages: A public dataset for large-scale multi-label and multi-class image classification (v3) [dataset]

[2] Imagenet: A large-scale hierarchical image database [dataset]

[3] Vizwiz grand challenge: Answering visual questions from blind people [dataset]

[4] ICDAR 2013 robust reading competition [dataset]

[5] ICDAR 2015 competition on robust reading [dataset]

[6] Visual Genome: Connecting language and vision using crowdsourced dense image annotations [dataset]

[7] Image retrieval using textual cues [dataset]

[8] Coco-text: Dataset and benchmark for text detection and recognition in natural images [dataset]

[9] Judging a book by its cover [dataset]

[10] Total Text [dataset]

[11] SCUT-CTW1500 [dataset]

[12] MLT [dataset]

[13] Chinese Street View Text [dataset]

[14] UCSF Industry Document Library [dataset]

Related Challenges

ICDAR 2021 COMPETITION On Document Visual Question Answering (DocVQA) Submission Deadline: 31st March 2021 [Challenge]

Document Visual Question AnsweringCVPR 2020 Workshop on Text and Documents in the Deep Learning Era Submission Deadline: 30 April 2020 [Challenge]

Papers

2021

  • [VisualMRC] VisualMRC: Machine Reading Comprehension on Document Images (AAAI) [Paper][Project]
  • [SSBaseline] Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps (AAAI) [Paper][code]

2020

  • [SA-M4C] Spatially Aware MultimodalTransformers for TextVQA (ECCV) [Paper][Project][Code]
  • [EST-VQA] On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering (CVPR) [Paper]
  • [M4C] Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA (CVPR) [Paper][Project]
  • [LaAP-Net] Finding the Evidence: Localization-aware Answer Prediction for TextVisual Question Answering (COLING) [Paper]
  • [CRN] Cascade Reasoning Network for Text-basedVisual Question Answering (ACM MM) [Paper][Project]

2019

  • [Text-VQA/LoRRA] Towards VQA Models That Can Read (CVPR) [Paper][Code]
  • [ST-VQA] Scene Text Visual Question Answering (ICCV) [Paper]
  • [Text-KVQA] From Strings to Things: Knowledge-enabled VQA Modelthat can Read and Reason (ICCV) [Paper]
  • [OCR-VQA] OCR-VQA: Visual Question Answering by Reading Text in Images (ICDAR) [Paper]

Technical Reports

  • [TAP] TAP: Text-Aware Pre-training for Text-VQA and Text-Caption [Report]
  • [RUArt] RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering [Report]
  • [SMA] Structured Multimodal Attentions for TextVQA [Report][Slides][Video]
  • [DiagNet] DiagNet: Bridging Text and Image [Report][Code]
  • [DCD_ZJU] Winner of 2019 Text-VQA challenge [Slides]
  • [Schwail] Runner-up of 2019 Text-VQA challenge [Slides]

Benchmark

Acc. : Accuracy
I. E. : Image Encoder
Q. E. : Question Encoder
O. E. : OCR Token Encoder
Ensem. : Ensemble

Text-VQA

[official leaderboard(2019)]
[official leaderboard(2020)]

Y-C./J.MethodsAcc.I. E.Q. E.OCRO. E.OutputEnsem.
2019–CVPRLoRRA26.64Faster R-CNNGloVeRosetta-mlFastTextClassificationN
2019–N/ADCD_ZJU31.44Faster R-CNNBERTRosetta-mlFastTextClassificationY
2020–CVPRM4C40.46Faster R-CNN (ResNet-101)BERTRosetta-enFastTextDecoderN
2020–ChallengeXiangpeng40.77
2020–Challengecolab_buaa44.73
2020–ChallengeCVMLP(SAM)44.80
2020–ChallengeNWPU_Adelaide_Team(SMA)45.51Faster R-CNNBERTBDNGraph AttentionDecoderN
2020–ECCVSA-M4C44.6*Faster R-CNN (ResNext-152)BERTGoogle-OCRFastText+PHOCDecoderN
2020–arXivTAP53.97*Faster R-CNN (ResNext-152)BERTMicrosoft-OCRFastText+PHOCDecoderN

* Using external data for training.

ST-VQA

[official leaderboard]

T1 : Strongly Contextualised Task
T2 : Weakly Contextualised Task
T3 : Open Dictionary

Y-C./J.MethodsAcc. (T1/T2/T3)I. E.Q. E.OCRO. E.OutputEnsem.
2020–CVPRM4Cna/na/0.4621Faster R-CNN (ResNet-101)BERTRosetta-enFastTextDecoderN
2020–ChallengeSMA0.5081/0.3104/0.4659FasterBERTBDNGraph AttentionDecoderN
2020–ECCVSA-M4Cna/na/0.5042Faster R-CNN (ResNext-152)BERTGoogle-OCRFastText+PHOCDecoderN
2020–arXivTAPna/na/0.5967Faster R-CNN (ResNext-152)BERTMicrosoft-OCRFastText+PHOCDecoderN

OCR-VQA

Y-C./J.MethodsAcc.I. E.Q. E.OCRO. E.OutputEnsem.
2020–CVPRM4C63.9Faster R-CNNBERTRosetta-enFastTextDecoderN
  • 1
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值