IQA: Visual Question Answering in Interactive Environments 心得体会

最新推荐文章于 2020-08-03 09:35:14 发布

智商25的憨憨

最新推荐文章于 2020-08-03 09:35:14 发布

阅读量662

点赞数

分类专栏：视觉问答

本文链接：https://blog.csdn.net/gxc19971128/article/details/104028577

版权

近日看了CVPR 2018的一篇论文，IQA：Visual question answering in interactive envionments，主要描述的是用一个代理与视频内容进行交互，回答基于视频的问题。之前未看过这方面的论文，并且网上也没有关于这篇论文的解说，所以在此记录一些个人心得体会，如有错误，还望各位老师给予批评指正！

人工智能社区的一个长期目标是创建能够在现实世界中执行手工任务并能通过自然语言与人类交流的代理人。例如，一个家用机器人可能会提出以下问题：我们需要买更多的牛奶吗？这将需要它导航到厨房，打开冰箱，看看牛奶罐里的牛奶，或者我们有多少盒饼干？这将需要代理导航到橱柜，打开其中的几个，并计算cookie盒的数量。为了实现这一目标，VQA这个有关视觉内容的问题受到了计算机视觉和自然语言处理的高度重视。虽然现如今VQA方面已经有了很大的进展，但研究主要集中在被动回答关于视觉的问题上，没有能力与生成内容的环境进行交互，一个只能被动回答问题的代理人在帮助人类完成任务的能力上是有限的。

一.文章概况

摘要：

We introduce Interactive Question Answering ( IQA ),the task of answering questions that require an autonomous agent to interact with a dynamic visual environment. IQA presents the agent with a scene and a question, like: “Are there any apples in the fridge?” The agent must navigate around the scene, acquire visual understanding of scene el ements, interact with objects (e.g. open refrigerators) and plan for a series of actions conditioned on the question. Popular reinforcement learning approaches with a single controller perform poorly on IQA owing to the large and diverse state space. We propose the Hierarchical Interac tive Memory Network ( HIMN ), consisting of a factorized set of controllers, allowing the system to operate at mul tiple levels of temporal abstraction. To evaluate HIMN , we introduce IQUAD V 1 , a new dataset built upon AI2- THOR [ 35 ], a simulated photo-realistic environment of con- fifigurable indoor scenes with interactive objects. IQUAD V 1 has 75,000 questions, each paired with a unique scene con- fifiguration. Our experiments show that our proposed model outperforms popular single controller based methods on IQUAD V 1

最低0.47元/天解锁文章

智商25的憨憨

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
IQA: Visual Question Answering in Interactive Environments 心得体会

近日看了CVPR 2018的一篇论文，IQA：Visual question answering in interactive envionments，主要描述的是用一个代理与视频内容进行交互，回答基于视频的问题。之前未看过这方面的论文，并且网上也没有关于这篇论文的解说，所以在此记录一些个人心得体会，如有错误，还望各位老师给予批评指正！人工智能社区的一个长期目标是...
复制链接

扫一扫