[读论文]-Language as Queries for Referring Video Object Segmentation（R-VOS）有参考视频对象分割

计算机视觉-Archer

已于 2023-06-29 20:02:46 修改

阅读量1.1k

点赞数

分类专栏：读论文（SOD-COD-图像分割-Diffusion）文章标签：计算机视觉人工智能

于 2022-12-26 11:49:09 首次发布

本文链接：https://blog.csdn.net/zjc910997316/article/details/128442174

版权

读论文（SOD-COD-图像分割-Diffusion）专栏收录该内容

37 篇文章 36 订阅 ¥19.90 ¥99.00

订阅专栏

超级会员免费看

abstract

Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to segment the target object referred by a language expression in all video frames.
In this work, we propose a simple and unified framework built upon Transformer, termed ReferFormer.
It views the language as queries and directly attends to the most relevant regions in the video frames.
Concretely, we introduce a small set of object queries conditioned on the language as the input to the Transformer.
In this manner, all the queries are obligated to find the referred objects only.
They are eventually transformed into dynamic kernels which capture the crucial object-level information, and play the role of convolution filters to generate the segment

了解本专栏

订阅专栏解锁全文

超级会员免费看

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

计算机视觉-Archer

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
[读论文]-Language as Queries for Referring Video Object Segmentation（R-VOS）有参考视频对象分割

参考视频对象分割（R-VOS）是一项新兴的跨模态任务，旨在分割所有视频帧中由语言表达式引用的目标对象。
复制链接

扫一扫