【信息技术】【2008.06】基于上下文感知和注意力的视觉对象跟踪

最新推荐文章于 2024-04-23 09:15:00 发布

梅花香——苦寒来

最新推荐文章于 2024-04-23 09:15:00 发布

阅读量435

点赞数

原文链接：https://mp.weixin.qq.com/s?__biz=MzUxMTk0OTA3Nw==&mid=2247493211&idx=1&sn=298ca4d7b91428955f08fbac45daf0e0&chksm=f9694621ce1ecf37a5af280e9f88db736901d07f3e0151bd906bb24ffd7775c4659fccc93aab&token=1177789462&lang=zh_CN#rd

版权

在这里插入图片描述
本文为美国西北大学（作者：Ming Yang）的博士论文，共141页。

视觉目标跟踪，即从图像序列中持续推断目标的运动，是连接低级图像处理技术和高级视频内容分析的必备组件。几十年来，由于其在实践中的广泛应用（如人机交互、安全监视、机器人、医学成像和多媒体应用）以及理论上的各种影响，这一直是计算机视觉界一个活跃而富有成果的研究课题，包括图模型的贝叶斯推理、粒子滤波、核密度估计和机器学习算法等。然而，在无约束环境下进行长期的鲁棒跟踪仍然是一项非常具有挑战性的任务，现实中的困难还远未被克服。

视觉目标跟踪任务的两个核心挑战是计算效率限制，以及由于光照变化、变形、部分遮挡、伪装、快速移动和图像质量不完美等原因导致的目标的巨大的不可预测变化。更为关键的是，跟踪算法必须在无人监督的情况下处理这些变化。在线应用中的所有目标变化都是不可预测的，因此，如果不可能的话，提前设计通用的目标特定或非特定观测模型是非常困难的。这些挑战要求非平稳的目标观测模型和灵活的运动估计模型具有智能性、且适应不同场景的能力。

本文主要研究如何提高目标视觉跟踪的通用性和可靠性，力求处理巨大的变化，同时考虑计算效率的约束。首先利用以往的跟踪结果，深入分析了chicken-and-egg在线适应目标观测模型的本质。然后，我们提出了两种新的思路来对抗不可预测的变化：上下文感知跟踪和注意力跟踪。在上下文感知跟踪中，跟踪器自动发现一些与目标有短期运动相关性的辅助对象；这些辅助对象被视为空间背景，以增强目标观测模型并验证跟踪结果。注意力跟踪算法通过有选择地聚焦于目标内部的一些识别区域，或者自适应地调整特征粒度和模型弹性，增强了观测模型的鲁棒性。上下文感知跟踪的目的是搜索目标的外部信息上下文，而注意力跟踪则试图识别目标的内部特征，从而在某种意义上相互补充。所提出的方法可以容忍许多典型的变化，从而大大提高了基于区域的目标跟踪器的鲁棒性。除单目标跟踪外，本文还从博弈论的角度对多目标跟踪提出了一种新的观点，该观点将联合运动估计与特定博弈的纳什均衡联系起来，该算法与目标数量的关系具有线性复杂性。本文对具有挑战性的真实测试视频序列进行了大量的实验，证明了所提出的目标视觉跟踪算法具有良好的、有前景的效果。

Visual object tracking, i.e. consistentlyinferring the motion of a desired target from image sequences, is a must-havecomponent to bridge low-level image processing techniques and high-level videocontent analysis. This has been an active and fruitful research topic in thecomputer vision community for decades due to both its versatile applications inpractice, e.g. in human-computer interaction, security surveillance, robotics,medical imaging and multimedia applications, and diverse impacts in theory,e.g. Bayesian inference on graphical models, particle filtering, kernel densityestimation, and machine learning algorithms. However, long-term robust trackingin unconstrained environments remains a very challenging task, and thedifficulties in reality are far from being conquered. The two core challengesof the visual object tracking task are the computational efficiency constraintand the enormous unpredictable variations in targets due to lighting changes,deformations, partial occlusions, camouflage, quick motion and imperfect imagequalities, etc. More critical, the tracking algorithms have to deal with thesevariations in an unsupervised manner. All the target variations in on-lineapplications are unpredictable, thus it is extremely hard, if not impossible,to design universal target specific or non-specific observation models inadvance. Therefore, these challenges call for non-stationary target observationmodels and agile motion estimation paradigms that are intelligent and adaptiveto different scenarios. In the thesis, we mainly focus on how to enhance thegenerality and reliability of object-level visual tracking, which strives tohandle enormous variations and takes the computational efficiency constraintinto consideration as well. We first present an in-depth analysis of thechicken-and-egg nature of on-line adaptation of target observation modelsdirectly using the previous tracking results. Then, we propose two novel ideasto combat unpredictable variations: context-aware tracking and attentionaltracking. In context-aware tracking, the tracker automatically discovers someauxiliary objects that have short-term motion correlation with the target.These auxiliary objects are regarded as the spatial contexts to enhance thetarget observation model and verify the tracking results. The attentionaltracking algorithms enhance the robustness of the observation models byselectively focusing on some discriminative regions inside the targets, oradaptively tuning the feature granularity and model elasticity. Context-awaretracking aims to search for external informative contexts of targets, incontrast, attentional tracking tries to identify internal discriminativecharacteristics of targets, thus they are complementary to each other in somesense. The proposed approaches can tolerate many typical difficult variations,thus greatly enhancing the robustness of the region-based object trackers.Besides single object tracking, we also introduce a new view to multiple targettracking from a game-theoretic perspective which bridges the joint motionestimation and the Nash Equilibrium of a particular game and has linearcomplexity with respect to the number of targets. Extensive experiments on challengingreal-world test video sequences demonstrate excellent and promising results ofthe proposed object-level visual tracking algorithms.

1 引言
2 相关工作
3 在线外观模型的自适应研究
4 上下文感知的视觉跟踪
5 基于注意力的视觉跟踪
6 基于博弈论的多目标跟踪
7 结论

更多精彩文章请关注公众号：在这里插入图片描述