论文
文章平均质量分 94
Vivinia_Vivinia
微信号:healer_healer
展开
-
论文-《From Recognition to Cognition: Visual Commonsense Reasoning》笔记
论文下载摘要(Abstract): Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer peo...原创 2019-11-29 18:40:35 · 1269 阅读 · 2 评论 -
论文-《GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering》
论文下载摘要(Abstract):We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings ofprevious VQA datasets. We have develop...原创 2019-11-26 11:46:36 · 2067 阅读 · 0 评论 -
论文-《Visual Question Answering as Reading Comprehension Hui》笔记
论文下载摘要:Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help ofcommon sen...原创 2019-11-17 11:47:40 · 462 阅读 · 0 评论 -
论文-《Answer Them All! Toward Universal Visual Question Answering Models》笔记
重点翻译拓展论文下载摘要:Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic dat...原创 2019-11-05 20:24:00 · 806 阅读 · 0 评论 -
论文-《Answer Them All! Toward Universal Visual Question Answering Models》重点翻译+扩展
The projector F is modeled as a 4-layer MLP with 1024 units with swish non-linear activation functions [45].投影F是一个四层MLP,有1024个单元,并且具有swish非线性激活函数。注释:1.MLP:感知机是神经网络(深度学习)的起源算法,是一个包含若干个输入和一个输出的...原创 2019-11-05 10:40:27 · 543 阅读 · 0 评论 -
论文-《MUREL: Multimodal Relational Reasoning for Visual Question Answering Remi》重点翻译+扩展
论文笔记Multimodal attentional networks are currently state-of-the-art models for Visual Question Answering (VQA) tasks involving real images.多模态注意力网络是目前最先进的涉及真实图像的VQA任务模型。In this paper, we propo...原创 2019-10-25 18:48:49 · 737 阅读 · 0 评论 -
论文-《MUREL: Multimodal Relational Reasoning for Visual Question Answering Remi》笔记
重点翻译拓展摘要: 如今在涉及真是图像的VQA任务中,多模态注意力网络时性能最好的,但是这种简单的机制不足以对复杂的推理特征或者高层次的任务进行建模。因此,我们提出了MuRel,一个能在真实图像中学习端到端推理的多模态关系网络。我们的贡献主要有两个:一是引入了MuRel单元,一种通过丰富的向量表示来对问题和图像区域间的交互进行自动推理,和对成对结合区域关系进行建模的结构;二是...原创 2019-10-26 08:28:33 · 1864 阅读 · 0 评论 -
论文-《Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering 》笔记
重点翻译拓展论文下载主题:我们提出将自顶向下和自底向上的注意力机制融合起来,从对象层面计算图像的显著区域。正文:1.新方法的概要过程:自底向上机制(基于Faster R-CNN)提取图像区域,每一个区域关联一个特征向量,自顶向下机制定义特征权重(就是该特征的重要程度)。2.VQA和Image captioning通常使用什么样的方法,有什么缺陷:VQA和Image ...原创 2019-10-17 08:49:26 · 496 阅读 · 0 评论 -
论文-《Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering 》重点翻译+扩展
论文笔记论文下载摘要Abstract Top-down:Top-down visual attention mechanisms have been usedextensively in image captioningand visual question answering (VQA) to enable deeper image understanding through f...原创 2019-10-16 17:23:13 · 768 阅读 · 0 评论 -
论文-《Visual Question Answering A tutorial》笔记
重点翻译拓展论文下载主题:本文主要介绍了该领域正在进行的工作以及基于深度学习的VQA如今的方法。正文:1.研究VQA的原因:(1)计算机视觉方面,需要根据算法从图像中提取高水平的数据并进行推理分析,VQA作为最初图灵测试或者图像字幕的替代任务出现。(2)VQA如果发展成熟,可以独立应用于生活。2.VQA答案形式:(1)开放式回答,包含了较为复杂句式,...原创 2019-10-11 20:59:53 · 500 阅读 · 0 评论 -
论文-《Visual Question Answering A tutorial》重点翻译+扩展
论文笔记论文下载摘要Abstract:Tremendous advances have been seen in the field of computer vision due to the success of deep learning, in particular on low- and midlevel tasks, such as image segmentation or...原创 2019-10-11 18:30:52 · 1540 阅读 · 0 评论