【短文】粗读CVPR2019论文 Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

最新推荐文章于 2023-09-17 11:21:51 发布

mobai-ch

最新推荐文章于 2023-09-17 11:21:51 发布

阅读量914

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/qq_31622541/article/details/104066193

版权

机器学习专栏收录该内容

15 篇文章 0 订阅

订阅专栏

我接下来将以自己的理解以及文章的原内容，描述下文章的大概内容，如果有误解的内容，还请大家指出，大家共同学习。作者信息，文章和代码Github的链接在文章最下方分享给大家。

首先我来讲下作者大概做了什么。作者提出了一种无监督的MOT(多目标跟踪)方法，而无监督的方法是将输入的图像抽取特征之后根据不同的跟踪器来获取每个目标的信息并分层，最后把这些信息整合在一起进行重构，如果能够重构成原始图像，那么这个跟踪器也就是有效的了。

也就是说，作者的做法的核心思想就是，我们能设置I个跟踪器，然后跟踪器获取的信息又能分层，我们能够用RNN通过上一帧的状态来计算出下一帧的状态。其中每一帧采用了K个图层，然后每个图层里面又有I个对象在运动，我们在下一帧计算出这几个对象会运动成什么样，最后，又把这些个对象组合成新的画面和真实的新一帧对比。

这种做法就和做动画一样，作者的假设就是：一个好的追踪器，我们把第一帧的元素变成可追踪的动画元素，那他动起来后组合起来的画面就是接下来一帧帧的画面。

接下来，一切就为这个假设服务，使之成立就可以了，而这个假设中是用不到什么标记数据的，自然就成了无监督。

关键点：多目标跟踪，无监督学习，动画图层化

文章摘要：Online Multi-Object Tracking (MOT) from videos is a challenging computer vision task which has been extensively studied for decades. Most of the existing MOT algorithms are based on the Tracking-by-Detection (TBD) paradigm combined with popular machine learning approaches which largely reduce the human effort to tune algorithm parameters. However, the commonly used supervised learning approaches require the labeled data (e.g., bounding boxes), which is expensive for videos. Also, the TBD framework is usually suboptimal since it is not end-to-end, i.e., it considers the task as detection and tracking, but not jointly. To achieve both label-free and end-to-end learning of MOT, we propose a Tracking-by-Animation framework, where a differentiable neural model fifirst tracks objects from input frames and then animates these objects into reconstructed frames. Learning is then driven by the reconstruction error through backpropagation. We further propose a Reprioritized Attentive Tracking to improve the robustness of data association. Experiments conducted on both synthetic and real video datasets show the potential of the proposed model.

作者贡献：

1. We propose a Tracking-by-Animation (TBA) framework, where a differentiable neural model fifirst tracks objects from input frames and then animates these objects into reconstructed frames. Learning is then driven by the reconstruction error through backpropagation.

2. We propose a Reprioritized Attentive Tracking (RAT) to mitigate overfifitting and disrupted tracking, improving the robustness of data association.

3.We evaluate our model on two synthetic datasets (MNIST-MOT and Sprites-MOT) and one real dataset (DukeMTMC [49]), showing its potential.

计算过程主要是在第二部分和第三部分，第二部分主要将是怎么重构图像的以及怎么计算最终的损失的。而第三部分主要将怎么通过上一帧的状态计算出这一帧的每个目标的状态向量，最终来补充第二部分中的公式不完整的那一部分。

文章中的这张图很好的表现了第二部分中算法的计算过程。

伪代码的这一部分很好的解释了第三章的逻辑过程，同时里面也很好的介绍了RAT的思想，也就是作者在摘要里面提到的如何提升跟踪的稳定性，从而一定程度上缓解Introduction里面所介绍的不确定的目标数量，频繁遮挡，位置变换和背景噪声的问题，具体计算方法和思路大家可以看下3.1, 3.2和 3.3。

最终作者在Conclusion里讲了下自己之后要做的事情，尝试去使得模型能够在动态背景下有更好的表现。

论文地址：https://arxiv.org/pdf/1809.03137.pdf

论文代码：https://github.com/zhen-he/tracking-by-animation

论文作者信息：

Zhen He 1,2,3∗ Jian Li 2 Daxue Liu 2 Hangen He 2 David Barber 3,4

1 Academy of Military Medical Sciences

2 National University of Defense Technology

3 University College London

4 The Alan Turing Institute

mobai-ch

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
【短文】粗读CVPR2019论文 Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

我接下来将以自己的理解以及文章的原内容，描述下文章的大概内容，如果有误解的内容，还请大家指出，大家共同学习。作者信息，文章和代码Github的链接在文章最下方分享给大家。首先我来讲下作者大概做了什么。作者提出了一种无监督的MOT(多目标跟踪)方法，而无监督的方法是将输入的图像抽取特征之后根据不同的跟踪器来获取每个目标的信息并分层，最后把这些信息整合在一起进行重构，如果能够重构成原始图像，那么这...
复制链接

扫一扫

专栏目录