Visual grounding
文章平均质量分 90
小仙女呀灬
这个作者很懒,什么都没留下…
展开
-
论文:Language-Aware Fine-Grained Object Representation for Referring Expression Comprehension
作者AbstractReferring expression comprehension expects to accurately locate an object described by a language expression, which requires precise language-aware visual object representations. However, existing methods usually use rectangular object repres原创 2022-02-27 11:01:10 · 452 阅读 · 1 评论 -
论文:Linguistic Structure Guided Context Modeling for Referring Image Segmentation
作者AbstractReferring image segmentation aims to predict the foreground mask of the object referred by a natural language sentence. Multimodal context of the sentence is crucial to distinguish the referent from the background. Existing methods either ins原创 2021-12-19 11:09:23 · 490 阅读 · 0 评论 -
论文:A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
作者AbstractReferring expression comprehension aims to localize the object instance described by a natural language expression. Current referring expression methods have achieved good performance. However , none of them is able to achieve real-time infer原创 2021-12-17 00:46:26 · 427 阅读 · 0 评论 -
论文:Exploring Phrase Grounding without Training: Contextualisation and Extension to Text-Based Image
作者摘要Grounding phrases in images links the visual and the textual modalities and is useful for many image understanding and multimodal tasks. All known models heavily rely on annotated data and complex trainable systems to perform phrase grounding – exc原创 2021-12-14 00:45:03 · 605 阅读 · 0 评论 -
论文:Zero-Shot Grounding of Objects from Natural Language Queries
作者摘要A phrase grounding system localizes a particular object in an image referred to by a natural language query. In previous work, the phrases were restricted to have nouns that were encountered in training, we extend the task to Zero-Shot Grounding(ZS原创 2021-12-09 09:21:59 · 382 阅读 · 0 评论 -
论文:Real-Time Referring Expression Comprehension by Single-Stage Grounding Network
作者摘要In this paper , we propose a novel end-to-end model, namely Single-Stage Grounding network (SSG), to localize the referent given a referring expression within an image. Different from previous multi-stage models which rely on object proposals or de原创 2021-12-01 19:54:43 · 1002 阅读 · 0 评论 -
Onestage Grounding
1.Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding(2018 CVPR)论文地址:http://openaccess.thecvf.com/content_CVPR_2019/papers.代码:https://github.com/hassanhub/MultiGrounding.2.Real-Time Referring Expression Comprehension by Single-Stage原创 2021-11-30 11:07:43 · 269 阅读 · 0 评论 -
论文:Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding(2018CVPR)
作者摘要We address the problem of phrase grounding by learning a multi-level common semantic space shared by the textual and visual modalities. This common space is instantiated at multiple layers of a Deep Convolutional Neural Network by exploiting its fe原创 2021-11-29 20:46:34 · 600 阅读 · 0 评论 -
论文:Improving One-stage Visual Grounding by Recursive Sub-query Construction
作者摘要We improve one-stage visual grounding by addressing current limitations on grounding long and complex queries. Existing one-stage methods encode the entire language query as a single sentence embedding vector,e.g., taking the embedding from BERT or原创 2021-11-22 20:42:30 · 621 阅读 · 0 评论 -
论文:Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding
作者Abstract在本文中,我们处理弱监督引用表达式基础任务,用于根据查询语句定位图像中的引用对象,其中图像区域和查询之间的映射在训练阶段不可用。在传统的方法中,首先选择与引用表达式最匹配的对象区域,然后从所选区域重构查询语句,其中重构差作为反向传播的损失。然而,现有的方法忽略了匹配正确性未知的事实,近似地进行匹配和重构。为了克服这一局限性,本文设计了一个判别三元组作为解决方案的基础,通过该三元组,可以以非常可伸缩的方式将查询转换为一个或多个判别三元组。在区分性三元组的基础上,我们进一步提出了三元组原创 2021-11-17 00:26:03 · 464 阅读 · 0 评论 -
论文:Disentangled Motif-aware Graph Learning for Phrase Grounding
作者原创 2021-11-12 21:52:09 · 2747 阅读 · 2 评论 -
论文:Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
作者Abstract解决referring expression grounding的主流框架基于两个阶段的过程:1)使用目标检测器检测proposals 2)将所指对象与其中一个proposal联系起来。现有的两阶段解决方案大多侧重于基础步骤,其目的是使expression与proposal保持一致。在本文中,我们认为这些方法忽略了两个阶段中proposal的作用之间的明显不匹配:它们仅基于检测置信度(即,表达式不可知)生成proposal,希望proposal在表达式中包含所有正确的实例(即,表达原创 2021-11-09 20:06:22 · 802 阅读 · 0 评论 -
论文:TransVG: End-to-End Visual Grounding with Transformers
作者Abstract在本文中,我们提出了一个简洁而有效的基于转换的视觉基础框架,即TransVG,以解决将语言查询与图像上相应区域的基础任务。最先进的方法,包括两阶段或一阶段的方法,依赖于一个复杂的模块和手动设计的机制来执行查询推理和多模式融合。然而,在融合模块设计中,由于查询分解和图像场景图等机制的参与,使得模型很容易过度适应特定场景的数据集,限制了视觉语言环境之间的充分交互。为了避免这种警告,我们建议通过利用Transformer建立多模态对应关系,并通过经验证明,复杂的融合模块(例如,模块化注意原创 2021-11-02 10:34:47 · 3064 阅读 · 1 评论 -
论文:Visual Grounding with Transformers
作者摘要本文中,我们提出了一种基于transformer的可视接地方法。与以前的proposal and rank框架(严重依赖预训练对象检测器)或proposal free框架(通过融合文本嵌入来升级现成的单级检测器)不同,我们的方法构建在transformer编码器-解码器之上,独立于任何预训练检测器或单词嵌入模型。我们的方法被称为VGTR——带transformer的视觉接地,旨在在文本描述的指导下学习语义区分视觉特征,而不损害其定位能力。这种信息流使我们的VGTR在捕获视觉和语言模式的上下文级原创 2021-10-28 17:01:43 · 3140 阅读 · 0 评论 -
论文:MDETR - Modulated Detection for End-to-End Multi-Modal Understanding用于端到端多模态理解的调制检测
作者摘要原创 2021-10-24 11:16:25 · 682 阅读 · 0 评论