Online Object Tracking: A Benchmark
在线目标跟踪:基准
Abstract——摘要
Object tracking is one of the most important components in numerous applications of computer vision.
目标跟踪是许多计算机应用中最重要的部分之一。
While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art.
近些年通过努力在共享代码和数据集方面取得了许多进步,开发一个基准数据集的库来衡量目前的方法是非常重要。
After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform.
在简单回顾在线目标跟踪最近的发展之后,我们用不同的评价标准进行了大量的实验来了解这些算法的性能。
The test image sequences are annotated with different attributes for performance evaluation and analysis.
为了进行性能评估和分析,我们对这些测试数据集用不同的属性进行了标注。
By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.
通过分析大量的结果,我们识别了鲁棒跟踪的有效方法并在这个领域提供了将来潜在的研究方向。
1. Introduction——引言
段落1——目标跟踪的定义和目标的进展
Object tracking is one of the most important components in a wide range of applications in computer vision, such as surveillance, human computer interaction, and medical imaging [60, 12].
References——计算机视觉应用
[60] A. Yilmaz, O. Javed, and M. Shah. Object Tracking: A Survey. ACM Computing Surveys, 38(4):1–45, 2006.
[12] K. Cannons. A Review of Visual Tracking. Technical Report CSE2008-07, York University, Canada, 2008.
目标跟踪计算机视觉广泛应用中最重要的部分之一,这些应用例如监控,人机交互,医疗图像。
Given the initialized state (e.g., position and size) of a target object in a frame of a video, the goal of tracking is to estimate the states of the target in the subsequent frames.
给定视频帧中目标对象的初始状态(例如:位置和尺寸),跟踪的目标是在接下来的帧中评估目标的状态。
Although object tracking has been studied for several decades, and much progress has been made in recent years [28, 16, 47, 5, 40, 26, 19], it remains a very challenging problem.
References——目标跟踪中的进步
[28] M. Isard and A. Blake. CONDENSATION–Conditional Density Propagation for Visual Tracking. IJCV, 29(1):5–28, 1998.
[16] D. Comaniciu, V. Ramesh, and P. Meer. Kernel-Based Object Tracking. PAMI, 25(5):564–577, 2003.
[47] D. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental Learning for Robust Visual Tracking. IJCV, 77(1):125–141, 2008.
[5] B. Babenko, M.-H. Yang, and S. Belongie. Visual Tracking with Online Multiple Instance Learning. In CVPR, 2009.
[40] X. Mei and H. Ling. Robust Visual Tracking using L1 Minimization. In ICCV, 2009.
[26] S. Hare, A. Saffari, and P. H. S. Torr. Struck: Structured Output Tracking with Kernels. In ICCV, 2011.
[19] J. Fan, X. Shen, and Y. Wu. Scribble Tracker: A Matting-based Approach for Robust Tracking. PAMI, 34(8):1633–1644, 2012.
尽管目标跟踪已经研究了好多年,在近些年取得了许多进步,但它仍是一个非常具有挑战性的问题。
Numerous factors affect the performance of a tracking algorithm, such as illumination variation, occlusion, as well as background clutters, and there exists no single tracking approach that can successfully handle all scenarios.
许多因素会影响跟踪算法的性能,例如光照变化,遮挡,杂乱背景,没有一个跟踪方法能成功的处理所有场景。
Therefore,it is crucial to evaluate the performance of state-of-the-art trackers to demonstrate their strength and weakness and help identify future research directions in this field for designing more robust algorithms.
因此,评估目标的算法的性能的关键是证明它们的长处和弱点并有助于这个领域将来的研究方向,从而设计更鲁棒的算法。
段落2——跟踪数据集的介绍
For comprehensive performance evaluation, it is critical to collect a representative dataset.
进行复杂的性能评估的关键是收集有代表性的数据集。
There exist several datasets for visual tracking in the surveillance scenarios,such as the VIVID [14], CAVIAR [21], and PETS databases.
References——跟踪数据集
[14] R. Collins, X. Zhou, and S. K. Teh. An Open Source Tracking Testbed and Evaluation Web Site. In PETS, 2005.
[21] R. B. Fisher. The PETS04 Surveillance Ground-Truth Data Sets. In PETS, 2004.
目标有一些监控视频中进行目标跟踪的数据集,例如VIVID,CAVIAR
和PETS数据集。
However, the target objects are usually humans or cars of small size in these surveillance sequences, and the background is usually static.
然而,在这些监控序列中目标对象通常是小尺寸的人或车,背景通常是静态的。
Although some tracking datasets [47, 5, 33] for generic scenes are annotated with bounding box, most of them are not.
References——跟踪数据集
[47] D. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental Learning for Robust Visual Tracking. IJCV, 77(1):125–141, 2008.
[5] B. Babenko, M.-H. Yang, and S. Belongie. Visual Tracking with Online Multiple Instance Learning. In CVPR, 2009.
[33] J. Kwon and K. M. Lee. Visual Tracking Decomposition. In CVPR, 2010.
尽管一些一般场景的跟踪数据集标注了边界框,但大部分没有。
For sequences without labeled ground truth, it is difficult to evaluate tracking algorithms as the reported results are based on inconsistently annotated object locations.
对于没标注真实目标的序列,由于发表的结果是基于不同的标注位置的,因此评估跟踪算法是很困难的。
段落3——跟踪数据集和跟踪算法的介绍
Recently, more tracking source codes have been made publicly available, e.g., the OAB [22], IVT [47], MIL [5], L1 [40], and TLD [31] algorithms, which have been commonly used for evaluation.
References——可获得源码的跟踪算法
[22] H. Grabner, M. Grabner, and H. Bischof. Real-Time Tracking via On-line Boosting. In BMVC, 2006.
[47] D. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental Learning for Robust Visual Tracking. IJCV, 77(1):125–141, 2008.
[5] B. Babenko, M.-H. Yang, and S. Belongie. Visual Tracking with Online Multiple Instance Learning. In CVPR, 2009.
[40] X. Mei and H. Ling. Robust Visual Tracking using L1 Minimization. In ICCV, 2009.
[31] Z. Kalal, J. Matas, and K. Mikolajczyk. P-N Learning: Bootstrapping inary Classifiers by Structural Constraints. In CVPR, 2010.
最近,更多的跟踪源码是可公开获得的,例如,OAB,IVT,MIL,L1,TLD算法,这些通常是对评估有用的。
However, the input and output formats of most trackers are different and thus it is inconvenient for large scale performance evaluation.
然而,大部分跟踪器的输入输出格式是不同的,因此进行大规模性能评估是不便宜的。
In this work,we build a code library that includes most publicly available trackers and a test dataset with ground-truth annotations to facilitate the evaluation task.
在这个工作,我们建立一个包括大多数公开可获得的跟踪器的代码库和有利于评估工作的实际标注的测试数据集。
Additionally each sequence in the dataset is annotated with attributes that often affect tracking performance, such as occlusion, fast motion, and illumination variation.
此外,每个数据集序列标注了经常影响跟踪性能的属性,例如遮挡,快速运动和光照变化。
段落4——算法性能评估的标准
One common issue in assessing tracking algorithms is that the results are reported based on just a few sequences with different initial conditions or parameters.
在估计跟踪算法时一个共同的问题是发表的结果是基于一些用不同的初始化环境或参数的序列的。
Thus, the results do not provide the holistic view of these algorithms.
因此,这些结果没有提供这些算法的整体轮廓。
For fair and comprehensive performance evaluation, we propose to perturb the initial state spatially and temporally from the ground-truth target locations.
为了公平和复杂的性能评估,我们计划从空间和时间上扰乱实际目标位置的初始化状态。
While the robustness to initialization is a well-known problem in the field, it is seldom addressed in the literature.
虽然初始化的鲁棒性在这个领域是一个非常有名问题,但在文章中很少讲述。
To the best of our knowledge, this is the first comprehensive work to address and analyze the initialization problem of object tracking.
我们所知,这是第一个全面讲述和分析目标跟踪初始化问题的工作。
We use the precision plots based on location error metric and the success plots based on the overlap metric, to analyze the performance of each algorithm.
我们使用用基于位置错误度量的准确率图和基于重叠率度量的成功图来分析每一个算法的性能。
段落5——本文的贡献
The contribution of this work is three-fold:
本文贡献有三个方面:
Dataset. We build a tracking dataset with 50 fully annotated sequences to facilitate tracking evaluation.
数据集。为了方便跟踪评估我们建立一个跟踪数据集,包括五十个全面标注的序列。
Code library. We integrate most publicly available trackers in our code library with uniform input and output formats to facilitate large scale performance evaluation. At present,it includes 29 tracking algorithms.
代码库。我们在代码库中集成了大多数公开可获得的跟踪器用统一的输入输出格式,这有利于大规模的性能评估。目前,它包括29个跟踪算法。
Robustness evaluation. The initial bounding boxes for tracking are sampled spatially and temporally to evaluate the robustness and characteristics of trackers. Each tracker is extensively evaluated by analyzing more than 660,000 bounding box outputs.
鲁棒性评估。为了评估跟踪器的鲁棒性和特性,跟踪的初始化边界框是由时间和空间上采样得到的。每个跟踪器是通过分析超过660000个边界框输出来广泛评估的。
段落5——本文的贡献
This work mainly focuses on the online tracking of single target. The code library, annotated dataset and all the tracking results are available on the website http://visualtracking.net .
这个工作主要集中在单目标的在线跟踪上。这个代码库,标注的数据集和所有跟踪结果能在网站http://visualtracking.net上得到。
2. Related Work
In this section, we review recent algorithms for object tracking in terms of several main modules: target representation scheme, search mechanism, and model update.
在这一节,我们按照一些主要模块回顾了目标跟踪最近的一些算法:目标表示方案,搜索机制和模型更新。
In addition, some methods have been proposed that build on combing some trackers or mining context information.
此外,已经提出了一些建立在结合一些跟踪器上和挖掘上下文信息的的方法。
Representation Scheme. Object representation is one of major components in any visual tracker and numerous schemes have been presented [35].
References——目标表示方案
[35] X. Li, W. Hu, C. Shen, Z. Zhang, A. Dick, and A. Hengel. A Survey of Appearance Models in Visual Object Tracking. TIST, 2013, in press.
表示方案。目标表示是在任何视觉跟踪器中的一个主要部分并且已经提出了许多的方案。
Since the pioneering work of Lucas and Kanade [37, 8], holistic templates (raw intensity values) have been widely used for tracking [25, 39, 2].
References——整体模版的开创性工作
[37] B. D. Lucas and T. Kanade. An Iterative Image Registration Technique with An Application to Stereo Vision. In IJCAI, 1981.
[8] S. Baker and I. Matthews. Lucas-Kanade 20 Years On: A Unifying Framework. IJCV, 56(3):221–255, 2004.
References——整体模版
[25] G. D. Hager and P. N. Belhumeur. Efficient Region Tracking With Parametric Models of Geometry and Illumination. PAMI, 20(10):1025–1039, 1998.
[39] I. Matthews, T. Ishikawa, and S. Baker. The Template Update Problem. PAMI, 26(6):810–815, 2004.
[2] N. Alt, S. Hinterstoisser, and N. Navab. Rapid Selection of Reliable Templates for Visual Tracking. In CVPR, 2010.
由于Lucas和Kanade的开拓性工作,整体模版(原始强度值)已经在跟踪中得到了广泛应用。
Subsequently, subspace-based tracking approaches [11, 47] have been proposed to better account for appearance changes.
References——基于子空间的跟踪方法
[11] M. J. Black. EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation. IJCV, 26(1):63– 84, 1998.
[47] D. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental Learning for Robust Visual Tracking. IJCV, 77(1):125–141, 2008.
接下来,提出的基于子空间的跟踪方法更好的考虑了外观变化。
Furthermore, Mei and Ling [40] proposed a tracking approach based on sparse representation to handle the corrupted appearance and recently it has been further improved [41, 57, 64, 10, 55, 42].
References——基于稀疏表示的跟踪
[40] X. Mei and H. Ling. Robust Visual Tracking using L1 Minimization. In ICCV, 2009.
[41] X. Mei, H. Ling, Y. Wu, E. Blasch, and L. Bai. Minimum Error Bounded Efficient L1 Tracker with Occlusion Detection. In CVPR, 2011.
[57] Y. Wu, H. Ling, J. Yu, F. Li, X. Mei, and E. Cheng. Blurred Target Tracking by Blur-driven Tracker. In ICCV, 2011.
[64] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja. Robust Visual Tracking via Multi-task Sparse Learning. In CVPR, 2012.
[10] C. Bao, Y. Wu, H. Ling, and H. Ji. Real Time Robust L1 Tracker Using Accelerated Proximal Gradient Approach. In CVPR, 2012.
[55] D. Wang, H. Lu, and M.-H. Yang. Online Object Tracking with Sparse Prototypes. TIP, 22(1):314–325, 2013.
[42] X. Mei, H. Ling, Y. Wu, E. Blasch, and L. Bai. Efficient Minimum Error Bounded Particle Resampling L1 Tracker with Occlusion Detection. TIP, 2013, in press.
此外,Mei和Ling提出了基于稀疏表示的跟踪方法来处理毁坏的外观,最近它已经被进一步改进了。
In addition to template, many other visual features have been adopted in tracking algorithms, such as color histograms [16], histograms of oriented gradients (HOG) [17, 52], covariance region descriptor [53, 46, 56] and Haar-like features [54, 22].
References——跟踪中使用的视觉特征
[16] D. Comaniciu, V. Ramesh, and P. Meer. Kernel-Based Object Tracking. PAMI, 25(5):564–577, 2003.
[17] N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. In CVPR, 2005.
[52] F. Tang, S. Brennan, Q. Zhao, and H. Tao. Co-Tracking Using Semi-Supervised Support Vector Machines. CVPR, 2007.
[53] O. Tuzel, F. Porikli, and P. Meer. Region Covariance: A Fast Descriptor for Detection and Classification. In ECCV, 2006.
[46] F. Porikli, O. Tuzel, and P. Meer. Covariance Tracking using Model Update based on Lie Algebra. In CVPR, 2006.
[56] Y. Wu, J. Cheng, J. Wang, H. Lu, J. Wang, H. Ling, E. Blasch, and L. Bai. Real-time Probabilistic Covariance Tracking with Efficient Model Update. TIP, 21(5):2824–2837, 2012.
[54] P. Viola and M. J. Jones. Robust Real-Time Face Detection. IJCV, 57(2):137–154, 2004.
[22] H. Grabner, M. Grabner, and H. Bischof. Real-Time Tracking via On-line Boosting. In BMVC, 2006.
除了模版之外,在跟踪方法中使用了许多其它的视觉特征,例如颜色直方图,梯度直方图,协方差区域描述子和Haar-like特征。
Recently, the discriminative model has been widely adopted in tracking [15, 4], where a binary classifier is learned online to discriminate the target from the background.
References——判别式模型
[15] R. T. Collins, Y. Liu, and M. Leordeanu. Online Selection of Discriminative Tracking Features. PAMI, 27(10):1631–1643, 2005.
[4] S. Avidan. Ensemble Tracking. PAMI, 29(2):261–271, 2008.
最近,判别式模型在跟踪中得到了广泛采纳,在线学习一个二值分类器来从背景中区分目标。
Numerous learning methods have been adapted to the tracking problem, such as SVM [3], structured output SVM [26], ranking SVM [7], boosting [4, 22], semi-boosting [23] and multi-instance boosting [5].
References——跟踪中的学习方法
[3] S. Avidan. Support Vector Tracking. PAMI, 26(8):1064–1072, 2004.
[26] S. Hare, A. Saffari, and P. H. S. Torr. Struck: Structured Output Tracking with Kernels. In ICCV, 2011.
[7] Y. Bai and M. Tang. Robust Tracking via Weakly Supervised Ranking SVM. In CVPR, 2012.
[4] S. Avidan. Ensemble Tracking. PAMI, 29(2):261–271, 2008.
[22] H. Grabner, M. Grabner, and H. Bischof. Real-Time Tracking via On-line Boosting. In BMVC, 2006.
[23] H. Grabner, C. Leistner, and H. Bischof. Semi-supervised On-Line Boosting for Robust Tracking. In ECCV, 2008.
[5] B. Babenko, M.-H. Yang, and S. Belongie. Visual Tracking with Online Multiple Instance Learning. In CVPR, 2009.
许多学习方法已经在跟踪问题中得到采纳,例如SVM,结构化SVM,排序SVM,boosting,semi-boosting,多实例boosting。
To make trackers more robust to pose variation and partial occlusion, an object can be represented by parts where each one is represented by descriptors or histograms.
为了使跟踪器对姿态变化和部分遮挡更鲁棒,一个目标可以由许多部分表示,每一部分可以由描述子或直方图表示。
In [1] several local histograms are used to represent the object in a pre-defined grid structure.
References——局部直方图
[1] A. Adam, E. Rivlin, and I. Shimshoni. Robust Fragments-based Tracking using the Integral Histogram. In CVPR, 2006.
在1中一些局部直方图用来表示对象在预定义的格子结构。
Kwon and Lee [32] propose an approach to automatically update the topology of local patches to handle large pose changes.
References——自动更新局部块的拓扑结构
[32] J. Kwon and K. M. Lee. Tracking of a Non-Rigid Object via Patch-based Dynamic Appearance Modeling and Adaptive Basin Hopping Monte Carlo Sampling. In CVPR, 2009.
Kwon和Lee 提出了一种自动更新局部块拓扑结构的方法来处理大的姿态变化。
To better handle appearance variations, some approaches regarding integration of multiple representation schemes have recently been proposed [62, 51, 33].
References——一些表示方案结合
[62] L. Yuan, A. Haizhou, T. Yamashita, L. Shihong, and M. Kawade. Tracking in Low Frame Rate Video: A Cascade Particle Filter with Discriminative Observers of Different Life Spans. PAMI, 30(10):1728–1740, 2008.
[51] B. Stenger, T. Woodley, and R. Cipolla. Learning to Track with Multiple Observers. In CVPR, 2009.
[33] J. Kwon and K. M. Lee. Visual Tracking Decomposition. In CVPR, 2010.
为了更好的处理外观变化,最近提出了一些方法把多个表示方案结合起来。
Search Mechanism. To estimate the state of the target objects, deterministic or stochastic methods have been used.
搜索方案。为了估计目标对象的状态,已经提出了一些确定性和随机的方法。
When the tracking problem is posed within an optimization framework, assuming the objective function is differentiable with respect to the motion parameters, gradient descent methods can be used to locate the target efficiently [37, 16, 20, 49].
References——梯度下降法定位目标
[37] B. D. Lucas and T. Kanade. An Iterative Image Registration Technique with An Application to Stereo Vision. In IJCAI, 1981.
[16] D. Comaniciu, V. Ramesh, and P. Meer. Kernel-Based Object Tracking. PAMI, 25(5):564–577, 2003.
[20] J. Fan, Y. Wu, and S. Dai. Discriminative Spatial Attention for Robust Tracking. In ECCV, 2010.
[49] L. Sevilla-Lara and E. Learned-Miller. Distribution Fields for Tracking. In CVPR, 2012.
把跟踪问题放在一个优化框架中时,假设目标函数是关于运动参数可区分的,梯度下降法能用来有效定位目标。
However, these objective functions are usually nonlinear and contain many local minima.
然而,这些目标函数通常是非线性的且包含许多局部最小值。
To alleviate this problem, dense sampling methods have been adopted [22, 5, 26] at the expense of high computational load.
为了减少这个问题,采用了高计算负载的密集采样方法。
On the other hand, stochastic search algorithms such as particle filters [28, 44] have been widely used since they are relatively insensitive to local minima and computationally efficient [47, 40, 30].
References——随机搜索算法
[28] M. Isard and A. Blake. CONDENSATION–Conditional Density Propagation for Visual Tracking. IJCV, 29(1):5–28, 1998.
[44] P. P′erez, C. Hue, J. Vermaak, and M. Gangnet. Color-Based Probabilistic Tracking. In ECCV, 2002.
[47] D. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental Learning for Robust Visual Tracking. IJCV, 77(1):125–141, 2008.
[40] X. Mei and H. Ling. Robust Visual Tracking using L1 Minimization. In ICCV, 2009.
[30] X. Jia, H. Lu, and M.-H. Yang. Visual Tracking via Adaptive Structural Local Sparse Appearance Model. In CVPR, 2012.
另一方面,随机搜索算法例如粒子滤波已经得到了广泛应用,由于他们对局部最小值相对敏感并且计算有效。
Model Update. It is crucial to update the target representation or model to account for appearance variations.
模型更新。考虑到外观变化,更新目标表示或模型是非常关键的。
Matthews et al. [39] address the template update problem for the Lucas-Kanade algorithm [37] where the template is updated with the combination of the fixed reference template extracted from the first frame and the result from the most recent frame.
References——模型更新
[39] I. Matthews, T. Ishikawa, and S. Baker. The Template Update Problem. PAMI, 26(6):810–815, 2004.
[37] B. D. Lucas and T. Kanade. An Iterative Image Registration Technique with An Application to Stereo Vision. In IJCAI, 1981.
Matthews等人讲述了Lucas-Kanade算法的模版更新问题,模版更新结合了第一帧提取的固定参考模版和最近帧的跟踪结果。
Effective update algorithms have also been proposed via online mixture model [29], online boosting [22], and incremental subspace update [47].
References——有效的模版更新算法
[29] A. D. Jepson, D. J. Fleet, and T. F. El-Maraghi. Robust Online Appearance Models for Visual Tracking. PAMI, 25(10):1296–1311, 2003.
[22] H. Grabner, M. Grabner, and H. Bischof. Real-Time Tracking via On-line Boosting. In BMVC, 2006.
[47] D. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental Learning for Robust Visual Tracking. IJCV, 77(1):125–141, 2008.
通过在线混合模型,在线boosting和增量子空间更新,提出了许多有效的更新算法。
For discriminative models, the main issue has been improving the sample collection part to make the online-trained classifier more robust [23, 5, 31, 26].
References——判别式模型
[23] H. Grabner, C. Leistner, and H. Bischof. Semi-supervised On-Line Boosting for Robust Tracking. In ECCV, 2008.
[5] B. Babenko, M.-H. Yang, and S. Belongie. Visual Tracking with Online Multiple Instance Learning. In CVPR, 2009.
[31] Z. Kalal, J. Matas, and K. Mikolajczyk. P-N Learning: Bootstrapping inary Classifiers by Structural Constraints. In CVPR, 2010.
[26] S. Hare, A. Saffari, and P. H. S. Torr. Struck: Structured Output Tracking with Kernels. In ICCV, 2011.
对于判别式模型,主要问题是改善采样收集部分从而使在线训练的分类器更鲁棒。
While much progress has been made, it is still difficult to get an adaptive appearance model to avoid drifts.
虽然取得了很多进步,但是得到一个有效的避免漂移自适应外观模型仍然是困难的。
Context and Fusion of Trackers. Context information is also very important for tracking.
上下文和跟踪器融合。对于跟踪上下文信息也是非常重要的。
Recently some approaches have been proposed by mining auxiliary objects or local visual information surrounding the target to assist tracking [59, 24, 18].
References——辅助信息帮助跟踪
[59] M. Yang, Y. Wu, and G. Hua. Context-Aware Visual Tracking. PAMI, 31(7):1195–1209, 2008.
[24] H. Grabner, J. Matas, L. V. Gool, and P. Cattin. Tracking the Invisible: Learning Where the Object Might be. In CVPR, 2010.
[18] T. B. Dinh, N. Vo, and G. Medioni. Context Tracker: Exploring Supporters and Distracters in Unconstrained Environments. In CVPR, 2011.
最近已经提出了一些通过挖掘辅助对象或在目标周围的一些局部视觉信息的方法来辅助跟踪。
The context information is especially helpful when the target is fully occluded or leaves the image region [24].
References——上下文信息
[24] H. Grabner, J. Matas, L. V. Gool, and P. Cattin. Tracking the Invisible: Learning Where the Object Might be. In CVPR, 2010.
当目标是全部被遮挡或离开图像区域时,上下文信息是尤其有帮助的。
To improve the tracking performance, some tracker fusion methods have been proposed recently.
为了提高跟踪性能,最近提出了一些跟踪器融合的方法。
Santner et al. [48] proposed an approach that combines static, moderately adaptive and highly adaptive trackers to account for appearance changes.
References——多种外观模型结合
[48] J. Santner, C. Leistner, A. Saffari, T. Pock, and H. Bischof. PROST: Parallel Robust Online Simple Tracking. In CVPR, 2010.
考虑到外观变化,Santner等人提出了一个结合静态、适度的适应和高适应跟踪器。
Even multiple trackers [34] or multiple feature sets [61] are maintained and selected in a Bayesian framework to better account for appearance changes.
References——多跟踪器,多特征
[34] J. Kwon and K. M. Lee. Tracking by Sampling Trackers. In ICCV, 2011.
[61] J. H. Yoon, D. Y. Kim, and K.-J. Yoon. Visual Tracking via Adaptive Tracker Selection with Multiple Features. In ECCV, 2012.
在贝叶斯框架中,为了更好的解释外观变化,甚至多跟踪器或多特征集被保留和选择。
3. Evaluated Algorithms and Datasets
段落
For fair evaluation, we test the tracking algorithms whose original source or binary codes are publicly available as all implementations inevitably involve technical details and specific parameter settings.
为了公平评估的,我们测试的跟踪算法的源码是公开可获得的,因此不可避免的包含技术细节和特定的参数设置。
备注:[36,58]是联系得到的,[44,16]是自己实现的。
Table 1 shows the list of the evaluated tracking algorithms.
表一是评估的跟踪算法。
Table 1. Evaluated tracking algorithms (MU: model update, FPS:frames per second). For representation schemes, L: local, H:holistic, T: template, IH: intensity histogram, BP: binary pattern,PCA: principal component analysis, SPCA: sparse PCA, SR: sparserepresentation, DM: discriminative model, GM: generativemodel. For search mechanism, PF: particle filter, MCMC: Markov Chain Monte Carlo, LOS: local optimum search, DS: dense samplingsearch. For the model update, N: No, Y: Yes. In the Code column, M: Matlab, C:C/C++, MC: Mixture of Matlab and C/C++,suffix E: executable binary code.
表一1、评估算法(MU:模型更新,FPS:每秒多少帧)。对于表示方案,L:局部的,H:整体的,T:模版,IH:强度直方图,BP:二值模式,PCA:主成分分析,SPCA:稀疏主成份分析,SR:稀疏表示,DM:判别式模型,GM:生成式模型。对于搜索机制:PF:粒子滤波,
MCMC:马尔科夫蒙特卡罗,LOS:局部最优搜索,DS:密集采样搜索。对于模型更新:N:不更新,Y:更新。在代码栏:M:Matlab,C:C/C++,MC:Matlab混合C/C++,suffix E:可执行的二进制代码。
We also evaluate the trackers in the VIVID testbed [14] including the mean shift (MS-V), template matching (TM-V), ratio shift (RS-V) and peak difference (PD-V) methods.
References——VIVID试验台
[14] R. Collins, X. Zhou, and S. K. Teh. An Open Source Tracking Testbed and Evaluation Web Site. In PETS, 2005.
我们也评估了在VIVID试验台上的跟踪器,包括均值漂移,模版匹配,系数漂移,和峰差法。
段落
In recent years, many benchmark datasets have been developed for various vision problems, such as the Berkeley segmentation [38], FERET face recognition [45] and optical flow dataset [9].
References——各种数据集
[38] D. R. Martin, C. C. Fowlkes, and J. Malik. Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues. PAMI, 26(5):530–49, 2004.
[45] P. Phillips, H. Moon, S. Rizvi, and P. Rauss. The FERET Evaluation Methodology for Face-Recognition Algorithms. PAMI, 22(10):1090–1104, 2000.
[9] S. Baker, S. Roth, D. Scharstein, M. J. Black, J. Lewis, and R. Szeliski. A Database and Evaluation Methodology for Optical Flow. In ICCV, 2007.
在近些年,在各种视觉问题上已经开发了许多基本数据集,例如伯克利分割,FERET人脸识别和光流数据集。
There exist some datasets for the tracking in the surveillance scenario, such as the VIVID [14] and CAVIAR [21] datasets.
References——跟踪数据集
[14] R. Collins, X. Zhou, and S. K. Teh. An Open Source Tracking Testbed and Evaluation Web Site. In PETS, 2005.
[21] R. B. Fisher. The PETS04 Surveillance Ground-Truth Data Sets. In PETS, 2004.
目前存在一些用来跟踪的监控场景,例如VIVID和CAVIAR数据集。
For generic visual tracking, more sequences have been used for evaluation [47, 5].
References——更多的评估序列
[47] D. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental Learning for Robust Visual Tracking. IJCV, 77(1):125–141, 2008.
[5] B. Babenko, M.-H. Yang, and S. Belongie. Visual Tracking with Online Multiple Instance Learning. In CVPR, 2009.
对于一般的视觉跟踪,已经有更多的评估序列。
However, most sequences do not have the ground truth annotations, and the quantitative evaluation results may be generated with different initial conditions.
然而,大多数序列没有真实的标注,许多评估结果的可能有不同的初始化环境。
To facilitate fair performance evaluation, we have collected and annotated most commonly used tracking sequences.
为了促进公平的性能评估,我们已经收集并标注了大多数的常用的跟踪序列。
Figure 1 shows the first frame of each sequence where the target object is initialized with a bounding box.
图1是每个序列的第一帧,其中的目标有一个初始化的边框。
Figure 1. Tracking sequences for evaluation. The first frame with the bounding box of the target object is shown for each sequence. The sequences are ordered based on our ranking results (See supplementary material): the ones on the top left are more difficult for tracking than the ones on the bottom right. Note that we annotated two targets for the jogging sequence.
图1 评估的跟踪序列,第一帧中画出了目标的边框。这些序列被根据我们的排序结果进行了排序。左上角比右下角的更难跟踪。我们对慢跑序列标记了两个目标。
段落
Attributes of a test sequence. Evaluating trackers is difficult because many factors can affect the tracking performance.
测试序列的属性。由于有很多因素能影响跟踪性能,因此评估跟踪器是很困难的。
For better evaluation and analysis of the strength and weakness of tracking approaches, we propose to categorize the sequences by annotating them with the 11 attributes shown in Table 2.
为了更好的评估和分析跟踪方法的长处和弱点,我们计划根据表2标注的11属性对序列进行分类。
表2 测试序列中标注的属性列表,本文中用的阈值也显示出来了。
段落
The attribute distribution in our dataset is shown in Figure 2(a). Some attributes occur more frequently, e.g., OPR and IPR, than others.
数据集中的属性分布如图2所示,一些属性发生更频繁,例如
OPR和IPR比其它的更经常发生。
It also shows that one sequence is often annotated with several attributes.
它也表明一个序列经常标注多个属性。
Aside from summarizing the performance on the whole dataset, we also construct several subsets corresponding to attributes to report specific challenging conditions.
除了归纳整体数据集的性能之外,我们也构建了一些对应这些属性的子集来描述特定的有挑战性的环境。
For example, the OCC subset contains 29 sequences which can be used to analyze the performance of trackers to handle occlusion.
例如,OCC子集包括29个序列可以用来分析跟踪器处理遮挡的性能。
The attribute distributions in OCC subset is shown in Figure 2(b) and others are available in the supplemental material.
在OCC子集中的特性分布如图2(b)所示,其它的可在补充材料中得到。
4. Evaluation Methodology
段落
In this work, we use the precision and success rate for quantitative analysis. In addition, we evaluate the robustness of tracking algorithms in two aspects.
在本文中,我们用准确率和成功率进行大量分析。此外,我们从两个方面评估跟踪算法的鲁棒性。
段落
Precision plot. One widely used evaluation metric on tracking precision is the center location error, which is defined as the average Euclidean distance between the center locations of the tracked targets and the manually labeled ground truths.
精度图。一个得到广泛应用的评估跟踪精确度的度量标准是中心位置误差,中心位置误差是跟踪目标的中心位置与标注的中心位置的欧式距离。
Then the average center location error over all the frames of one sequence is used to summarize the overall performance for that sequence.
图像序列所有帧的平均中心误差用来概括整个序列的整体性能。
However, when the tracker loses the target, the output location can be random and the average error value may not measure the tracking performance correctly [6].
References——平均中心误差
[6] B. Babenko, M.-H. Yang, and S. Belongie. Robust Object Tracking with Online Multiple Instance Learning. PAMI, 33(7):1619–1632, 2011.
然而,当跟踪器丢失目标时,输出的位置可能是随机的,平均误差可能不能正确的度量跟踪性能。
Recently the precision plot [6, 27] has been adopted to measure the overall tracking performance.
References——精度图
[6] B. Babenko, M.-H. Yang, and S. Belongie. Robust Object Tracking with Online Multiple Instance Learning. PAMI, 33(7):1619–1632, 2011.
[27] F. Henriques, R. Caseiro, P. Martins, and J. Batista. Exploiting the Circulant Structure of Tracking-by-Detection with Kernels. In ECCV, 2012.
最近,精度图已经被用来度量整体的跟踪性能。
It shows the percentage of frames whose estimated location is within the given threshold distance of the ground truth.
它显示了估计位置在实际举例的阈值内的帧的百分比。
As the representative precision score for each tracker we use the score for the threshold = 20 pixels [6].
我们用20个像素作为每个跟踪器的表示准确分数的阈值。
段落
Success plot. Another evaluation metric is the bounding box overlap. Given the tracked bounding box rt and the ground truth bounding box ra, the overlap score is defined ,where and represent the intersection and union of two regions, respectively, and denotes the number of pixels in the region.
成功图。另一个评估的度量标准是边界框的重叠率。给定跟踪边界框rt和实际标注的边界框ra,重叠分数被定义为,和分别表示两个区域的交和并,表示这个区域的像素数。
To measure the performance on a sequence of frames, we count the number of successful frames whose overlap S is larger than the given threshold to.
为了度量整体图像序列的性能,我们统计成功帧的数目,成功帧是重叠率S比给定阈值大。
The success plot shows the ratios of successful frames at the thresholds varied from 0 to 1.
成功图显示了成功帧的比例,阈值为0-1。
Using one success rate value at a specific threshold (e.g. to=0.5) for tracker evaluation may not be fair or representative.
用一个在特定阈值的成功率(例如to=0.5)来评估跟踪器可能是不公平的或不具有代表性的。
Instead we use the area under curve (AUC) of each success plot to rank the tracking algorithms.
作为代替,我们用每个成功图的AUC来对跟踪算法进行排序。
段落
Robustness Evaluation. The conventional way to evaluate trackers is to run them throughout a test sequence with initialization from the ground truth position in the first frame and report the average precision or success rate.
鲁棒性评估。传统的评估跟踪器的方法是用第一帧的实际标准位置作为初始化,然后在整个测试序列上运行它们,得到一个平均精度或成功率。
We refer this as one-pass evaluation (OPE). However a tracker may be sensitive to the initialization, and its performance with different initialization at a different start frame may become much worse or better.
我们把这个看作是一个评估(OPE)。然而一个跟踪器可能对初始化是敏感的,在不同的帧的不同的初始化可能会更糟或更好。
Therefore, we propose two ways to analyze a tracker’s robustness to initialization, by perturbing the initialization temporally (i.e., start at different frames) and spatially (i.e., start by different bounding boxes).
因此,我们提出了两种初始化方式来分析一个跟踪器的鲁棒性,通过从时间上 (不同的初始帧)和空间上(不同的边框)扰乱初始化。
These tests are referred as temporal robustness evaluation (TRE) and spatial robustness evaluation (SRE) respectively.
这些测试分别被看作是时间鲁棒性估计(TRE)和空间鲁棒性估计(SRE)。
段落
The proposed test scenarios happen a lot in the real-world applications as a tracker is often initialized by an object detector, which is likely to introduce initialization errors in terms of position and scale.
当跟踪器经常被一个目标检测器初始化时,提出的测试场景在现实世界应用是经常发生的,考虑到位置和尺度这可能引入初始化误差。
In addition, an object detector may be used to re-initialize a tracker at different time instances.
此外,一个目标检测器可能在不同的时间实例中用来重新初始化一个跟踪器。
By investigating a tracker’s characteristic in the robustness evaluation, more thorough understanding and analysis of the tracking algorithm can be carried out.
通过在鲁棒性估计中研究一个跟踪器的特性,能更全面的了解和分析要运行的跟踪算法。
Temporal Robustness Evaluation. Given one initial frame together with the ground-truth bounding box of target, one tracker is initialized and runs to the end of the sequence, i.e., one segment of the entire sequence.
时间鲁棒性估计。给定一个有实际目标框的初始帧,一个跟踪器被初始化并运行到序列结束,例如,整个序列的一部分。
The tracker is evaluated on each segment, and the overall statistics are tallied.
跟踪器在每部分上进行评估,进行全面的统计。
Spatial Robustness Evaluation. We sample the initial bounding box in the first frame by shifting or scaling the ground truth.
空间鲁棒性估计。我们在第一帧采样初始化边框通过漂移或尺度化实际边框。
Here, we use 8 spatial shifts including 4 center shifts and 4 corner shifts, and 4 scale variations (supplement).
我们用4个中心漂移和4个边角漂移和4个尺度变化。
The amount for shift is 10% of target size, and the scale ratio varys among 0.8, 0.9, 1.1 and 1.2 to the ground truth.
漂移总数是目标尺寸的10%,尺度比变化为实际值的0.8,0.9,1.1,1.2。
Thus, we evaluate each tracker 12 times for SRE.
因此,我们进行空间鲁棒性估计对每个跟踪器测试12次。
5、Evaluation Results
段落
For each tracker, the default parameters with the source code are used in all evaluations.
对每个跟踪器,源码的默认参数用在所有评估中。
Table 1 lists the average FPS of each tracker in OPE running on a PC with Intel i7 3770 CPU (3.4GHz).
表一列出了在PC上每个跟踪器在一次估计中的平均帧率。
More detailed speed statistics, such as minimum and maximum, are available in the supplement.
更多速度统计细节,例如最小值和最大值,可在补充材料中得到。
段落
For OPE, each tracker is tested on more than 29,000 frames.
对于一次测试,每个跟踪器测试了超过29000帧。
For SRE, each tracker is evaluated 12 times on each sequence, where more than 350,000 bounding box results are generated.
对于空间鲁棒性估计,每个跟踪器在每个序列上评估了12次,产生了超过350000个边界框。
For TRE, each sequence is partitioned into 20 segments and thus each tracker is performed on around 310,000 frames.
对于时间鲁棒性估计,每个序列被分割成20部分,因此每个跟踪器运行了大约310000帧。
To the best of our knowledge, this is the largest scale performance evaluation of visual tracking.
据我们所知,这是视觉跟踪最大的性能估计。
We report the most important findings in this manuscript and more details and figures can be found in the supplement.
我们在手稿中记录了最重要的发现,更多的细节和数字可在补充材料中找到。
5.1. Overall Performance
The overall performance for all the trackers is summarized by the success and precision plots as shown in Fig ure 3 where only the top 10 algorithms are presented for clarity and the complete plots are displayed in the supplementary material.
图3中的成功图和精度图总结了所有跟踪器的整体性能,仅清晰的列出了前10个算法,整体图在补充材料中显示。
For success plots, we use AUC scores to summarize and rank the trackers, while for precision plots we use the results at error threshold of 20 for ranking.
对于成功图,我们用AUC分数来概括和排序跟踪器,而对于精度图我们用误差阈值来排序。
In the precision plots, the rankings of some trackers are slightly different from the rankings in the success plots in that they are based on different metrics which measure different characteristics of trackers.
在精度图,一些跟踪器的排序与成功图中的排序有一点不同,他们是基于不同的度量标准的,不同的度量标准用来衡量跟踪器的不同特性。
Because the AUC score of success plot measures the overall performance which is more accurate than the score at one threshold of the plot, in the following we mainly analyze the rankings based on success plots but use the precision plots as auxiliary.
因为成功图的AUC分数度量了整体性能,整体性能是比一个阈值更准确的,接下来我们主要分析基于成功图的排序但用精度图作为辅助。
段落
The average TRE performance is higher than that of OPE in that the number of frames decreases from the first to last segment of TRE.
平均时间鲁棒性性能比一次的时间鲁棒性性能更高,从TRE的第一部分到最后一部分帧的数目是下降的。
As the trackers tend to perform well in shorter sequences, the average of all the results in TRE tend to be higher.
由于跟踪器趋向于在更多的序列上性能更好,TRE所有结果的平均值趋向于更高。
On the other hand, the average performance of SRE is lower than that of OPE.
另一方面,SRE的平均性能比OPE的平均性能更低。
The initialization errors tend to cause trackers to update with imprecise appearance information, thereby causing gradual drifts.
初始化误差趋向于引起跟踪器用不准确的外观信息更新,因此会引起逐渐的漂移。
段落
In the success plots, the top ranked tracker SCM in OPE outperforms Struck by 2.6% but is 1.9% below Struck in SRE.
在成功图中,排序考前的跟踪器SCM在在OPE比Struck好2.6%但在SRE中比Struck低1.9%。
The results also show that OPE is not the best performance indicator as the OPE is one trial of SRE or TRE.
这些结果也显示了OPE不是最好的性能指示器,因为OPE是SRE或TRE的一次尝试。
The ranking of TLD in TRE is lower than OPE and SRE.
TLD在TRE中的排名比在OPE和SRE中更低。
This is because TLD performs well in long sequences with a redetection module while there are numerous short segments in TRE.
这是因为TLD有一个再检测模块,在长的序列中性能更好,而在TRE中有许多短序列。
The success plots of Struck in TRE and SRE show that the success rate of Struck is higher than SCM and ALSA when the overlap threshold is small, but less than SCM and ALSA when the overlap threshold is large.
当重叠阈值较小时Struck的成功图在TRE和SRE中表明Struck的成功率比SCM和ALSA更高,但重叠阈值较大时,Struck的成功率更低。
This is because Struck only estimates the location of target and does
not handle scale variation.
这是因为Struck仅估计目标的位置不处理尺度变化。
段落
Sparse representations are used in SCM, ASLA, LSK, MTT and L1APG.
稀疏表示在SCM,ASLA,LSK,MTT和L1APG中使用了。
These trackers perform well in SRE and TRE, which suggests sparse representations are effective models to account for appearance change (e.g., occlusion).
这些跟踪器在SRE和TRE中运行很好,这表明稀疏表示是解释外观变化(例如遮挡)的有效模型。
We note that SCM, ASLA and LSK outperform MTT and L1APG.
我们注意到SCM,ASLA,LSK超过了MTT和L1APG。
The results suggest that local sparse representations are more effective than the ones with holistic sparse templates.
这些结果表明局部稀疏表示比整体稀疏模版更有效。
The AUC score of ASLA deceases less than the other top 5 trackers from OPE to SRE and the ranking of ASLA also increases.
从OPE到SRE时,ASLA的AUC分数下降的比其它的前5个跟踪器要少,ASLA的排名也升高了。
It indicates the alignment-pooling technique adopted by ASLA is more robust to misalignments and background clutters.
它表明ASLA的alignment-pooling技术比不校准和杂乱背景更鲁棒。
段落
Among the top 10 trackers, CSK has the highest speed where the proposed circulant structure plays a key role.
在前10个跟踪器中,CSK的速度最快,其中的循环结构是非常重要的。
The VTD and VTS methods adopt mixture models to improve the tracking performance.
VTD和VTS方法采用了混合模型来改善跟踪性能。
Compared with other higher ranked trackers, the performance bottleneck of them can be attributed to their adopted representation based on sparse principal component analysis, where the holistic templates are used.
与其它的排名更高的跟踪器相比,他们的性能瓶颈是因为他们采用的稀疏PCA和使用的整体模版。
Due to the space limitation, the plots of SRE are presented for analysis in the following sections, and more results are included in the supplement.
由于空间限制,SRE图的分析在接下来的部分,更多的结果在补充材料中。
5.2. Attributebased Performance Analysis
By annotating the attributes of each sequence, we construct subsets with different dominant attributes which facilitates analyzing the performance of trackers for each challenging factor.
通过标注每个序列的属性,我们构建了有不同主导属性的子集,这有利于分析跟踪器在每个挑战性因素上的性能。
Due to space limitations, we only illustrate and analyze the success plots and precision plots of SRE for attributes OCC, SV, and FM as shown in Figure 4, and more results are presented in the supplementary material.
由于空间限制,我们仅说明和分析了属性OCC,SV和FM在SRE上的成功图,如图4所示,更多的结果在补充材料中。
When an object moves fast, dense sampling based trackers (e.g., Struck, TLD and CXT) perform much better than others.
当目标运动很快时,基于密集采样的跟踪器(例如Struck,TLD和CXT)性能比其它的更好。
One reason is that the search ranges are large and the discriminative models are able to discriminate the targets from the background clutters.
一个原因是搜索机制更大并且判别式模型能从杂乱背景中区分出目标。
However, the stochastic search based trackers with high overall performance (e.g., SCM and ASLA) do not perform well in this subset due to the poor dynamic models.
然而,由于差的动态模型,更高性能的基于统计搜索的跟踪器(例如SCM和ASLA)在子集中不能运行的很好。
If these parameters are set to large values, more particles are required to make the tracker stable.
如果这些参数被设置为大的值,更多的粒子要求跟踪器更稳定。
These trackers can be further improved with dynamic models with more effective particle filters.
这些跟踪器能用更有效的粒子滤波的动态模型进行进一步改进。
段落
On the OCC subset, the Struck, SCM, TLD, LSK and ASLA methods outperform others.
在OCC子集中,Struck, SCM, TLD, LSK 和ASLA方法比其它的表现更好。
The results suggest that structured learning and local sparse representations are effective in dealing with occlusions.
这些结果表明结构化学习和局部稀疏表示能有效的处理遮挡。
On the SV subset, ASLA, SCM and Struck perform best.
在SV子集中,ASLA, SCM和 Struck运行最好。
The results show that trackers with affine motion models (e.g., ASLA and SCM) often handle scale variation better than others that are designed to account for only translational motion with a few exceptions such as Struck.
结果表明,有仿射运动模型的跟踪器(例如ASLA和SCM)经常比其它那些仅考虑平移运动变换的方法(例如Struck)处理尺度变化更好。
5.3. Initialization with Different Scale
It has been known that trackers are often sensitive to initialization variations.
正如我们知道的那样,跟踪器对初始化的变化经常是很敏感的。
Figure 5 and Figure 6 show the summarized tracking performance with initialization at different scales.
图5和图6显示了归纳的用不同尺度初始化时的跟踪性能。
When computing the overlap score, we rescale the tracking results so that the performance summary could be comparable with the original scale, i.e., the plots of OPE in Figure 3.
当计算重叠分数时,我们再缩放了跟踪结果以便性能概括能与原始尺度比较,例如,图3中的 OPE图。
Figure 6 illustrates the average performance of all trackers for each scale which shows the performance often decreases significantly when the scale factor is large (e.g., *1.2) as many background pixels are inevitably included in the initial representations.
图6说明了所有跟踪器对每个尺度的平均性能,这表明当尺度因子较大时,性能经常会明显下降,因为许多背景像素是不可避免的包含在初始化表示中。
The performance of TLD, CXT, DFT and LOT decreases with the increase of initialization scale.
初始化尺度增大时,TLD, CXT, DFT 和LOT的性能下降了。
This indicates these trackers are more sensitive to background clutters.
这表明这些跟踪器对于杂乱背景是更敏感的。
Some trackers perform better when the scale factor is smaller, such as L1APG, MTT, LOT and CPF.
一些跟踪器性能更好当尺度因子很小的时候,例如L1APG, MTT, LOT 和CPF。
One reason for this in the case of L1APG and MTT is that the templates have to be warped to fit the size of the usually smaller canonical template so that if the initial template is small, more appearance details will be kept in the model.
L1APG 和MTT当尺度因子很小时运行更好的一个原因是模版已经扭曲到适应通常小的标准模版的尺寸,以便当初始化模版是小的时将更多的外观细节保留在模型中。
On the other hand, some trackers perform well or even better when the initial bounding box is enlarged, such as Struck, OAB, SemiT, and BSBT.
另一方面,当初始化边框增大时,一些跟踪器运行很好或者甚至更好,例如Struck, OAB, SemiT和BSBT。
This indicates that the Haar-like features are somewhat robust to background clutters due to the summation operations when computing features.
这表明,由于计算特征时的求和操作,Haar-like特征对杂乱背景更鲁棒。
Overall, Struck is less sensitive to scale variation than other well-performing methods.
总而言之,与其它的性能较好的方法相比,Struck对尺度变化是更不敏感的。
6. Concluding Remarks
In this paper, we carry out large scale experiments to evaluate the performance of recent online tracking algorithms.
在这篇论文中,我们完成了大量的实验来评估最新的在线跟踪算法的性能。
Based on our evaluation results and observations, we highlight some tracking components which are essential for improving tracking performance.
基于我们的评估结果和观察,我们强调了一些改善跟踪性能所必须的部分。
First, background information is critical for effective tracking.
首先,背景信息对于有效跟踪是非常关键的。
It can be exploited by using advanced learning techniques to encode the background information in the discriminative model implicitly (e.g., Struck), or serving as the tracking context explicitly (e.g., CXT).
在判别式模型中它可以用先进的学习方法来隐式的编码背景信息(例如Struck),或显示的使用跟踪上下文(例如CXT)。
Second, local models are important for tracking as shown in the performance improvement of local sparse representation (e.g., ASLA and SCM) compared with the holistic sparse representation (e.g., MTT and
L1APG).
第二,局部模型对于跟踪是很重要的,因为局部稀疏表示(例如ASLA和SCM)相比于整体稀疏表示性能(例如MTT和L1APG)得到了提高。
They are particularly useful when the appearance of target is partially changed, such as partial occlusion or deformation.
当目标的外观是部分变化的时候他们是特别有用的,例如部分遮挡或形变。
Third, motion model or dynamic model is crucial for object tracking, especially when the motion of target is large or abrupt.
第三,运动模型或动态模型对于目标跟踪是关键的,尤其是目标运动大或突然的时候。
However, most of our evaluated trackers do not focus on this component.
然而,我们评估的跟踪器的大多数不能集中在这部分上。
Good location prediction based on the dynamic model could reduce the search range and thus improve the tracking efficiency and robustness.
基于动态模型的好的位置预测能降低搜索变化,因此提高了跟踪性能和鲁棒性。
Improving these components will further advance the state of the art of online object tracking.
改进这些组成部分能进一步发展目前的在线目标跟踪。
段落
The evaluation results show that significant progress in the field of object tracking has been made in the last decade.
这些评估结果表明在过去十年目标跟踪领域已经取得了明显的进步。
We propose and demonstrate evaluation metrics for in-depth analysis of tracking algorithms from several perspectives.
我们提出和证明了深度分析跟踪算法的评估指标。
This large scale performance evaluation facilitates better understanding of the state-of-the-art online object tracking approaches, and provides a platform for gauging new algorithms.
大量的性能评估有利于更好的理解目前的在线目标跟踪算法,并提供了一个估计新方法的平台。
Our ongoing work focuses on extending the dataset and code library to include more fully annotated sequences and trackers.
为了更全面的标注数据集和跟踪器,我们不间断的工作主要集中在扩大数据集和编码库上。
Acknowledgment.
We thank the reviewers for valuable comments and suggestions.
我们感谢审稿人的有价值的评论和建议。
The work is supported partly by NSF CAREER Grant #1149783 and NSF IIS Grant #1152576.
这个工作部分由NSF CAREER Grant #1149783和NSF IIS Grant #1152576支持。
Wu is also with Nanjing University of Information Science and Technology, China and supported partly by NSFC Grant #61005027.
Wu由中国南京信息工程大学NSFC Grant #61005027支持。
References
[1] A. Adam, E. Rivlin, and I. Shimshoni. Robust Fragments-based Tracking using the Integral Histogram. In CVPR, 2006.
[2] N. Alt, S. Hinterstoisser, and N. Navab. Rapid Selection of Reliable Templates for Visual Tracking. In CVPR, 2010.
[3] S. Avidan. Support Vector Tracking. PAMI, 26(8):1064–1072, 2004.
[4] S. Avidan. Ensemble Tracking. PAMI, 29(2):261–271, 2008.
[5] B. Babenko, M.-H. Yang, and S. Belongie. Visual Tracking with Online Multiple Instance Learning. In CVPR, 2009.
[6] B. Babenko, M.-H. Yang, and S. Belongie. Robust Object Tracking with Online Multiple Instance Learning. PAMI, 33(7):1619–1632, 2011.
[7] Y. Bai and M. Tang. Robust Tracking via Weakly Supervised Ranking SVM. In CVPR, 2012.
[8] S. Baker and I. Matthews. Lucas-Kanade 20 Years On: A Unifying Framework. IJCV, 56(3):221–255, 2004.
[9] S. Baker, S. Roth, D. Scharstein, M. J. Black, J. Lewis, and R. Szeliski. A Database and Evaluation Methodology for Optical Flow. In ICCV, 2007.
[10] C. Bao, Y. Wu, H. Ling, and H. Ji. Real Time Robust L1 Tracker Using Accelerated Proximal Gradient Approach. In CVPR, 2012.
[11] M. J. Black. EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation. IJCV, 26(1):63– 84, 1998.
[12] K. Cannons. A Review of Visual Tracking. Technical Report CSE2008-07, York University, Canada, 2008.
[13] R. Collins. Mean-shift Blob Tracking through Scale Space. In CVPR, 2003.
[14] R. Collins, X. Zhou, and S. K. Teh. An Open Source Tracking Testbed and Evaluation Web Site. In PETS, 2005.
[15] R. T. Collins, Y. Liu, and M. Leordeanu. Online Selection of Discriminative Tracking Features. PAMI, 27(10):1631–1643, 2005.
[16] D. Comaniciu, V. Ramesh, and P. Meer. Kernel-Based Object Tracking. PAMI, 25(5):564–577, 2003.
[17] N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. In CVPR, 2005.
[18] T. B. Dinh, N. Vo, and G. Medioni. Context Tracker: Exploring Supporters and Distracters in Unconstrained Environments. In CVPR, 2011.
[19] J. Fan, X. Shen, and Y. Wu. Scribble Tracker: A Matting-based Approach for Robust Tracking. PAMI, 34(8):1633–1644, 2012.
[20] J. Fan, Y. Wu, and S. Dai. Discriminative Spatial Attention for Robust Tracking. In ECCV, 2010.
[21] R. B. Fisher. The PETS04 Surveillance Ground-Truth Data Sets. In PETS, 2004.
[22] H. Grabner, M. Grabner, and H. Bischof. Real-Time Tracking via On-line Boosting. In BMVC, 2006.
[23] H. Grabner, C. Leistner, and H. Bischof. Semi-supervised On-Line Boosting for Robust Tracking. In ECCV, 2008.
[24] H. Grabner, J. Matas, L. V. Gool, and P. Cattin. Tracking the Invisible: Learning Where the Object Might be. In CVPR, 2010.
[25] G. D. Hager and P. N. Belhumeur. Efficient Region Tracking With Parametric Models of Geometry and Illumination. PAMI, 20(10):1025–1039, 1998.
[26] S. Hare, A. Saffari, and P. H. S. Torr. Struck: Structured Output Tracking with Kernels. In ICCV, 2011.
[27] F. Henriques, R. Caseiro, P. Martins, and J. Batista. Exploiting the Circulant Structure of Tracking-by-Detection with Kernels. In ECCV, 2012.
[28] M. Isard and A. Blake. CONDENSATION–Conditional Density Propagation for Visual Tracking. IJCV, 29(1):5–28, 1998.
[29] A. D. Jepson, D. J. Fleet, and T. F. El-Maraghi. Robust Online Appearance Models for Visual Tracking. PAMI, 25(10):1296–1311, 2003.
[30] X. Jia, H. Lu, and M.-H. Yang. Visual Tracking via Adaptive Structural Local Sparse Appearance Model. In CVPR, 2012.
[31] Z. Kalal, J. Matas, and K. Mikolajczyk. P-N Learning: Bootstrapping inary Classifiers by Structural Constraints. In CVPR, 2010.
[32] J. Kwon and K. M. Lee. Tracking of a Non-Rigid Object via Patch-based Dynamic Appearance Modeling and Adaptive Basin Hopping Monte Carlo Sampling. In CVPR, 2009.
[33] J. Kwon and K. M. Lee. Visual Tracking Decomposition. In CVPR, 2010.
[34] J. Kwon and K. M. Lee. Tracking by Sampling Trackers. In ICCV, 2011.
[35] X. Li, W. Hu, C. Shen, Z. Zhang, A. Dick, and A. Hengel. A Survey of Appearance Models in Visual Object Tracking. TIST, 2013, in press.
[36] B. Liu, J. Huang, L. Yang, and C. Kulikowsk. Robust Tracking using Local Sparse Appearance Model and K-Selection. In CVPR, 2011.
[37] B. D. Lucas and T. Kanade. An Iterative Image Registration Technique with An Application to Stereo Vision. In IJCAI, 1981.
[38] D. R. Martin, C. C. Fowlkes, and J. Malik. Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues. PAMI, 26(5):530–49, 2004.
[39] I. Matthews, T. Ishikawa, and S. Baker. The Template Update Problem. PAMI, 26(6):810–815, 2004.
[40] X. Mei and H. Ling. Robust Visual Tracking using L1 Minimization. In ICCV, 2009.
[41] X. Mei, H. Ling, Y. Wu, E. Blasch, and L. Bai. Minimum Error Bounded Efficient L1 Tracker with Occlusion Detection. In CVPR, 2011.
[42] X. Mei, H. Ling, Y. Wu, E. Blasch, and L. Bai. Efficient Minimum Error Bounded Particle Resampling L1 Tracker with Occlusion Detection. TIP, 2013, in press.
[43] S. Oron, A. Bar-Hillel, D. Levi, and S. Avidan. Locally Orderless Tracking. In CVPR, 2012.
[44] P. P′erez, C. Hue, J. Vermaak, and M. Gangnet. Color-Based Probabilistic Tracking. In ECCV, 2002.
[45] P. Phillips, H. Moon, S. Rizvi, and P. Rauss. The FERET Evaluation Methodology for Face-Recognition Algorithms. PAMI, 22(10):1090–1104, 2000.
[46] F. Porikli, O. Tuzel, and P. Meer. Covariance Tracking using Model Update based on Lie Algebra. In CVPR, 2006.
[47] D. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental Learning for Robust Visual Tracking. IJCV, 77(1):125–141, 2008.
[48] J. Santner, C. Leistner, A. Saffari, T. Pock, and H. Bischof. PROST: Parallel Robust Online Simple Tracking. In CVPR, 2010.
[49] L. Sevilla-Lara and E. Learned-Miller. Distribution Fields for Tracking. In CVPR, 2012.
[50] S. Stalder, H. Grabner, and L. van Gool. Beyond Semi-Supervised Tracking: Tracking Should Be as Simple as Detection, but not Simpler than Recognition. In ICCV Workshop, 2009.
[51] B. Stenger, T. Woodley, and R. Cipolla. Learning to Track with Multiple Observers. In CVPR, 2009.
[52] F. Tang, S. Brennan, Q. Zhao, and H. Tao. Co-Tracking Using Semi-Supervised Support Vector Machines. CVPR, 2007.
[53] O. Tuzel, F. Porikli, and P. Meer. Region Covariance: A Fast Descriptor for Detection and Classification. In ECCV, 2006.
[54] P. Viola and M. J. Jones. Robust Real-Time Face Detection. IJCV, 57(2):137–154, 2004.
[55] D. Wang, H. Lu, and M.-H. Yang. Online Object Tracking with Sparse Prototypes. TIP, 22(1):314–325, 2013.
[56] Y. Wu, J. Cheng, J. Wang, H. Lu, J. Wang, H. Ling, E. Blasch, and L. Bai. Real-time Probabilistic Covariance Tracking with Efficient Model Update. TIP, 21(5):2824–2837, 2012.
[57] Y. Wu, H. Ling, J. Yu, F. Li, X. Mei, and E. Cheng. Blurred Target Tracking by Blur-driven Tracker. In ICCV, 2011.
[58] Y. Wu, B. Shen, and H. Ling. Online Robust Image Alignment via Iterative Convex Optimization. In CVPR, 2012.
[59] M. Yang, Y. Wu, and G. Hua. Context-Aware Visual Tracking. PAMI, 31(7):1195–1209, 2008.
[60] A. Yilmaz, O. Javed, and M. Shah. Object Tracking: A Survey. ACM Computing Surveys, 38(4):1–45, 2006.
[61] J. H. Yoon, D. Y. Kim, and K.-J. Yoon. Visual Tracking via Adaptive Tracker Selection with Multiple Features. In ECCV, 2012.
[62] L. Yuan, A. Haizhou, T. Yamashita, L. Shihong, and M. Kawade. Tracking in Low Frame Rate Video: A Cascade Particle Filter with Discriminative Observers of Different Life Spans. PAMI, 30(10):1728–1740, 2008.
[63] K. Zhang, L. Zhang, and M.-H. Yang. Real-time Compressive Tracking. In ECCV, 2012.
[64] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja. Robust Visual Tracking via Multi-task Sparse Learning. In CVPR, 2012.
[65] W. Zhong, H. Lu, and M.-H. Yang. Robust Object Tracking via Sparsity-based Collaborative Model. In CVPR, 2012.