论文阅读 [TPAMI-2022] Space-Time Memory Networks for Video Object Segmentation With User Guidance

最新推荐文章于 2024-09-27 10:58:53 发布

智尊宝人工智能社区

最新推荐文章于 2024-09-27 10:58:53 发布

阅读量280

点赞数

文章标签：人工智能计算机视觉 CVPR 深度学习机器学习

本文链接：https://blog.csdn.net/weixin_42155685/article/details/123983608

版权

论文阅读 [TPAMI-2022] Space-Time Memory Networks for Video Object Segmentation With User Guidance

论文搜索(studyai.com)

搜索论文: Space-Time Memory Networks for Video Object Segmentation With User Guidance

搜索论文: http://www.studyai.com/search/whole-site/?q=Space-Time+Memory+Networks+for+Video+Object+Segmentation+With+User+Guidance

关键字(Keywords)

Object segmentation; Task analysis; Learning systems; Feature extraction; Runtime; Detectors; Visualization; Video object segmentation; user-guided video object segmentation; semi-supervised video object segmentation; interactive video object segmentation; memory ne

机器学习; 机器视觉

半监督学习; 检测分割

摘要(Abstract)

We propose a novel and unified solution for user-guided video object segmentation tasks.

我们提出了一种新的、统一的解决方案，用于用户引导的视频对象分割任务。.

In this work, we consider two scenarios of user-guided segmentation: semi-supervised and interactive segmentation.

在这项工作中，我们考虑两种情况下的用户引导分割：半监督和交互式分割。.

Due to the nature of the problem, available cues – video frame(s) with object masks (or scribbles) – become richer with the intermediate predictions (or additional user inputs).

由于问题的性质，可用的线索——带有对象遮罩（或涂鸦）的视频帧——随着中间预测（或额外的用户输入）的增加而变得更加丰富。.

However, the existing methods make it impossible to fully exploit this rich source of information.

然而，现有的方法不可能充分利用这一丰富的信息来源。.

We resolve the issue by leveraging memory networks and learning to read relevant information from all available sources.

我们通过利用内存网络和学习从所有可用来源读取相关信息来解决这个问题。.

In the semi-supervised scenario, the previous frames with object masks form an external memory, and the current frame as the query is segmented using the information in the memory.

在半监督场景中，带有对象掩码的前一帧形成外部存储器，而作为查询的当前帧使用存储器中的信息进行分割。.

Similarly, to work with user interactions, the frames that are given user inputs form the memory that guides segmentation.

类似地，为了处理用户交互，给定用户输入的帧形成引导分段的内存。.

Internally, the query and the memory are densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion.

在内部，查询和内存在特征空间中紧密匹配，以前馈方式覆盖所有时空像素位置。.

The abundant use of the guidance information allows us to better handle challenges such as appearance changes and occlusions.

导航信息的大量使用使我们能够更好地应对外观变化和遮挡等挑战。.

We validate our method on the latest benchmark sets and achieve state-of-the-art performance along with a fast runtime…

我们在最新的基准测试集上验证了我们的方法，并实现了最先进的性能和快速的运行时间。。.