Video Instance Segmentation

一、介绍

1、什么是图像实例分割?

In the image domain, the task of instance segmentation, i.e. simultaneous detection and segmentation of object instances in images, was first proposed by Hariharan et al. [11] and since then has attracted tremendous amount of attention in computer vision due to its importance.

2、什么是视频实例分割?

Different from image instance segmentation, the new problem aims at simultaneous detection, segmentation and tracking of object instances in videos.

3、视频实例分割的应用场景?

The new task opens up possibilities for applications which requires video-level object masks such as video editing, autonomous driving and augmented reality.
 

4、视频实例分割相对于图像实例分割有哪些挑战?

①Video instance segmentation is more challenging than image instance segmentation in that it not only requires instance segmentation on individual frames, but also the tracking of instances across frames.

②On the other hand, video content contains richer information than a single image such as motion pattern of different objects and temporal consistency, and thus provides more cues for object recognition and segmentation.

5、视频实例分割与现有的哪些任务相关?它们之间的区别又是什么?

〇——一文看懂视频实例分割任务VIS和VOS MOTS等的区别 - 知乎

例如:①Video object segmentation aims at segmenting and tracking objects in videos, but does not require recognition of object categories.

②Video object detection aims at detecting and tracking objects, but does not deal with object segmentation.

6、现有的视频分割数据集有哪些?

①CityScapes. ②DAVIS 2017. ③Youtube-vos.

7、视频实例分割任务数据集有什么要求?

Given a video, our task requires both the masks of all instances of a predefined category set and the instance identities across frames to be labeled.

8、大规模视频实例分割数据集Youtube-VIS有什么特点?

①The new dataset contains 2883 high-resolution YouTube videos, a 40-category label set including common objects such as person, animals and vehicles, 4883 unique video instances and 131k high-quality masks.

②数据集地址: https://youtube-vos.org/dataset/vis.

9、视频实例分割算法Mask Track RCNN有什么特点?

①Based upon Mask R-CNN which is a state-of-the-art method for image instance segmentation, a new branch is added to the framework for tracking instances across video frames.

②Predicted instances are stored to an external memory and matched with objects in later frames.

③算法地址: https://github.com/youtubevos/MaskTrackRCNN.

④The proposed video instance segmentation not only requires segmenting object instances in each frame, but also determining the correspondence of objects across frames.

10、你将会学到什么?

①In Section 2 we briefly state the difference between related tasks and our new task.

②In Section 3 we formally introduce the video instance segmentation problem and evaluation metrics.

③Our new dataset and algorithm is elaborated in Section 4 and 5 respectively.

④Finally, experimental results are presented in Section 6.

二、相关工作

1、Image Instance Segmentation

2、Video Object Tracking

①The detection-based tracking: Simultaneously detect and track video objects and usually take the “trackingby-detection” strategy. 例如:Multi-object tracking.
②The detectionfree tracking:Targets at tracking objects given their initial bounding boxes in the first frame. 例如:siamese networks.

③一个MOTS数据集:a multi-object tracking and segmentation dataset.

3、Video Object Detection

①Video object detection aims at detecting objects in videos.

②The evaluation metric is limited to per-frame detection and does not require joint object detection and tracking.

4、Video Semantic Segmentation

①Video semantic segmentation does not require explicit matching of object instances across frames.

②Algorithms consider the target objects as general objects and does not care about the semantic categories.

三、视频实例分割

  • 问题定义

  

  • 评价指标

The proposed IoU computes the spatial-temporal consistency of predicted and ground truth segmentations. If the algorithm detects object masks successfully, but fails to track the objects across frames, it will get a low IoU. 

四、YouTube-VIS

数据集采集过程:

数据集的数据分布:

数据集的意义:

It will serve as a useful benchmark for various pixel-level video understanding tasks.

五、Mask Track R-CNN

  • 思路:

①Add the forth branch together with an external memory to track object instances across frames.

②For inference, MaskTrack processes video frames sequentially in an online fashion.

  • 整体架构:

  • 跟踪分支的学习 :

  • 外部存储的更新策略

 

  • 跟踪分支的训练方法

 

  • 提高跟踪精度的后处理方法 

 

①Intuitively, the bounding box IoU correlates to the spatial relationship between instances, which is a strong prior in many cases.

②The category consistency also provides a very strong constraint because the category label of an instance should not change in a video.
 

  • 推理 

 

六、实验

  • 数据集划分

  • 实现细节 

  •  Baseline

We incorporate two types of algorithms for the baselines.

①The first type uses the object masks detected in the first frame of the video as initial guidance and applies video object segmentation algorithms to propagate the masks(mask propagation algorithms). 例如:OSMN、FEELVOS.

②The second type follows the “tracking-bydetection” idea which is very popular in the multi-object tracking task. The basic idea of this type of works is using image detection methods on each frame independently and then linking the detection across frames by various tracking methods. 例如:IoUTracker+、 OSMN、DeepSORT
、SeqTracker.

  • 主要结果

  •  离线方法与在线方法

①Offline method which requires instance segmentation results to be precomputed for all frames.

②The other methods including MaskTrack RCNN are online methods, which produce instance tracks sequentially.

  • Oracle Results

①图像级预测比跟踪分支的预测更重要。

②设计合适的时空特征可以提升图像级预测。 

七、结论

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值