[论文阅读 2019 CVPR 目标跟踪]Fast Online Object Tracking and Segmentation: A Unifying Approach

最新推荐文章于 2021-12-12 22:18:42 发布

lingqing97

最新推荐文章于 2021-12-12 22:18:42 发布

阅读量191

点赞数

分类专栏：论文阅读文章标签：目标跟踪深度学习人工智能机器学习

本文链接：https://blog.csdn.net/qq_39621037/article/details/115559296

版权

论文阅读专栏收录该内容

19 篇文章 5 订阅

订阅专栏

简介

paper:Fast Online Object Tracking and Segmentation: A Unifying Approach

code:foolwood/SiamMask

参考:[CVPR2019]我对Siamese网络的一点思考（SiamMask）

这篇论文提出了一个可同时用于单目标跟踪(SOT)和视频目标分割(VOS)的跟踪模型。当前的单目标跟踪算法普遍采用第一帧的bbox进行初始化，问题在于：有时候仅仅通过bbox并不能很准确地描述目标物体；此外当前的视频目标分割算法速度普遍偏慢，无法到达实时要求。

基于上述问题，作者试图突破单目标跟踪和视频目标分割的gap，所以提出了SiamMask。这个跟踪模型与通常的跟踪模型相比，最大的特点就是增加了一个预测mask的分支，当用于跟踪任务时，通过mask来定位目标(这使得该跟踪算法不在局限与初始的bbox)，此外mask又可以用于视频目标分割任务中。

We aim at retaining the offline trainability and online speed of these methods while at the same time significantly refining their representation of the target object, which is limited to a simple
axis-aligned bounding box

在这里插入图片描述

主要内容

在这里插入图片描述

上图是SiamMask的主要框架，如图所示，mask分支是该模型最大的特点。

Fully-convolutional Siamese networks

这篇论文借鉴了SiamFc的siamese结构和SiamRPN的RPN结构，并将两者融合一体。

其中，对于response map的计算采用SiamFc中的方式,即:

$g_{\theta}^{n}(z, x)=f_{\theta}(z) \star f_{\theta}(x)$

where $z$ is an examplar image and $x$ is the search image. And same as the SiamFc, the $z$ size is $127 \times 127 \times 3$ and the $x$ size is $255 \times 255 \times 3$

不同于SiamFc中的response map是一个二维的矩阵，这篇论文中采用了depth-wise cross correlation来替代cross correlation,从而得到大小为 $17 \times 17 \times 256$ 的response map.

此外这篇论文中，作者将response map中每个位置元素称为RoW(response of a condidate window).

we refer to each spatial element of the response map as response of a candidate window (RoW)

之后将response map输入到mask分支，bbox预测分支和score预测分支。

对于bbox预测分支和score预测分支，每个RoW对应k个anchor box 和与之对应的目标/背景得分.

Each RoW encodes a set of k anchor box proposals and corresponding object/background scores.

SiamMask

mask分支是本文的重点和亮点,对于mask分支，作者采用了两种方式进行处理:base版本和refinement版本。

mask分支可以表示为:

$m_{n}=h_{\phi}\left(g_{\theta}^{n}(z, x)\right)$

其中，对于base版本，采用两层 $1\times1$ 的卷积进行处理，其中第一层输出通道数为 $256$ ,第二层输出通道数为 $63^{2}$ 。最终输出结果就如上图所示，每个Row对应于一个大小为 $63 \times 63$ 的向量，这个向量即与每个Row相对应的mask特征。

对于refinement版本，作者借鉴了shapreMask中的思路提高mask的预测精度。

With the aim of producing a more accurate object mask, we follow the strategy of Learning to refine object segments, which merges low and high resolution features using multiple refinement modules made of upsampling layers and skip connections.

在refinement版本中，对于mask预测采用下图所示更为复杂的模型进行预测(具体可以参考原文和sharpMask):

在这里插入图片描述

Box generation

在这里插入图片描述

由于在SOT任务中需要给出预测的bbox，而这篇论文采用的是预测mask,所以作者考虑了三种根据mask生成bbox的方法:

axis-aligned bounding rectangle (Min-max)
rotated minimum bounding rectangle (MBR)
the optimisation strategy used for the automatic bounding box generation proposed in VOT-2016 (Opt)

在经过实验比较以后，作者认为采用MBR性价比最高。

Loss function

论文作者实现了three-branch和two-branch两个版本的SiamMask,其中two-branch没有bbox预测分支且score分支只预测对应的Row的分数.(three-branch中score分支代表目标/背景得分)

在训练时，对于mask分支，损失函数定义如下(通过下列函数可以看出只有正样本才会计算 $\mathcal{L}_{mask}$ ):

$\mathcal{L}_{mask}(\theta, \phi)=\sum_{n}\left(\frac{1+y_{n}}{2 w h} \sum_{i j} \log \left(1+e^{-c_{n}^{i j} m_{n}^{i j}}\right)\right)$

each RoW is labelled with a ground-truth binary label $y_n ∈ {±1}$ and also associated with a pixel-wise ground-truth mask $c_n$ of size $w \times h$ . Besides, a RoW is considered positive ( $y_n = 1$ ) if one of its anchor boxes has
IOU with the ground-truth box of at least $0.6$ and negative ( $y_n = −1$ ) otherwise.

最终，总的损失如下:

$\begin{array}{c} \mathcal{L}_{2 B}=\lambda_{1} \cdot \mathcal{L}_{m a s k}+\lambda_{2} \cdot \mathcal{L}_{s i m} \\ \mathcal{L}_{3 B}=\lambda_{1} \cdot \mathcal{L}_{\text {mask }}+\lambda_{2} \cdot \mathcal{L}_{\text {score }}+\lambda_{3} \cdot \mathcal{L}_{\text {box }} \end{array}$

在跟踪的时候，对于two-branch版本，输出的bbox是通过使用mask的Min-max box得到。而three-branch时则采用前面所讲的三种方法中最优的一种。

实验结果

在这里插入图片描述

小结

总的来说，这篇论文还是很具有创新性的，将SOT和VOS两个任务整合到了一起。此外，通过mask来得到bbox使得预测精度更高，这一点很值得思考：即仅靠bbox提供的信息并不全面。

lingqing97

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[论文阅读 2019 CVPR 目标跟踪]Fast Online Object Tracking and Segmentation: A Unifying Approach

简介paper:Fast Online Object Tracking and Segmentation: A Unifying Approachcode:foolwood/SiamMask参考:[CVPR2019]我对Siamese网络的一点思考（SiamMask）这篇论文提出了一个可同时用于单目标跟踪(SOT)和视频目标分割(VOS)的跟踪模型。当前的单目标跟踪算法普遍采用第一帧的bbox进行初始化，问题在于：有时候仅仅通过bbox并不能很准确地描述目标物体；此外当前的视频目标分割算法速度普遍偏
复制链接

扫一扫

专栏目录