《Multiple Object Tracking: A Literature Review》多目标跟踪综述论文阅读笔记

在这里插入图片描述


论文地址

[1409.7618 Multiple Object Tracking: A Literature Review - arXiv]

Abstract

介绍了本综述在MOT领域的贡献。

1. Introduction

本综述只关注行人目标跟踪,原因有三:

  1. 行人非刚体,是MOT问题的理想对象。
  2. 应用面广,商业价值高。
  3. MOT领域70%都是针对行人开展。

同SOT相比,MOT还需要面对的问题有:

  1. 频繁遮挡。
  2. 轨迹初始与终止。
  3. 相似外观。
  4. 物体间相互作用。

本综述的贡献可归纳为:

  1. Unified formulation,两种不同的MOT算法分类方法。
  2. 调研了不同关键组件并进行分析。
  3. 展示了常用数据集上不同方法的实验结果。
  4. 开放问题,讨论未来发展方向。

2. MOT Problem

2.1 Problem Formulation

从概率统计的角度定义了MOT问题,简单来说就是在得到观测结果后,寻找到一个状态序列,使得后验概率最大。

2.2 MOT Categorization

从三个标准进行分类,依据任务处理的顺序:

a) initialization method, b) processing mode, and c) type of output.

2.2.1 Initialization Method

Most existing MOT works can be grouped into two sets [51], depending on how objects are initialized: Detection-Based Tracking (DBT) and Detection-Free Tracking (DFT).

在这里插入图片描述
DBT更为通用,因为DFT无法处理新物体的出现与消失问题。

2.2.2 Processing Mode

MOT can also be categorized into online tracking and online tracking. The difference is whether observations from future frames are utilized when handling the current frame.

在这里插入图片描述

2.2.3 Type of Output

This criterion classifies MOT methods into deterministic ones and probabilistic ones, depending on the randomness of output.
Stochastic Tracking. The output results of stochastic tracking vary from time to time.
Deterministic Tracking. The output of deterministic tracking is constant when running the methods multiple times.

3. MOT Component

3.1 Appearance Model

Technically, an appearance model includes two components: visual representation and
statistical measuring.

3.1.1 Visual Representation

在这里插入图片描述

  • Local features. KLT, optical flow
  • Region features. Here, order means the order of discrepancy when computing the representation.
    • Zero-order. color histogram, raw pixel template
    • First-order. Gradient-based representations like HOG, level-set formulation
    • Up-to-second-order. Region covariance matrix
  • Others.

3.1.2 Statistical Measuring

Based on visual representation, statistical measure computes the affinity between two observations.

在这里插入图片描述

3.2 Motion Model

The motion model captures the dynamic behavior of an object. It estimates the potential position of objects in the future frames, thereby reducing the search space.

3.2.1 Linear Motion Model

  • Velocity smoothness is modeled by enforcing the velocity values of an object in successive frames to change smoothly.
  • Position smoothness directly forces the discrepancy between the observed position and estimated position.
  • Acceleration smoothness

3.2.2 Non-linear Motion Model

在这里插入图片描述

3.3 Interaction Model

Interaction model, also known as mutual motion model, captures the influence of an object on other objects.

3.3.1 Social Force Models

Social force models are also known as group models. In these models, each object is considered to be dependent on other objects and environmental factors.

  • Individual force.
    • fidelity(忠诚), which means one should not change his desired destination
    • constancy, which means one should not suddenly change his momentum, including
      speed and direction
  • Group force.
    • attraction, which means individuals moving together as a group should stay close
    • repulsion(排斥), which means that individuals moving together as a group should keep some distance away from others to make all members comfortable
    • coherence, which means individuals moving together as a group should move with similar velocity

3.3.2 Crowd Motion Pattern Models

Inspired by the crowd simulation literature [24], motion patterns are introduced to alleviate the difficulty of tracking an individual object in the crowd.

3.4 Exclusion Model

Exclusion is a constraint employed to avoid physical collisions when seeking a solution to the MOT problem. It arises from the fact that two distinct objects cannot occupy the same physical space in the real world.

3.4.1 Detection-level Exclusion Modeling

Two different detection responses in the same frame cannot be assigned to the same target.

  • “Soft” modeling. Detection-level exclusion is “softly” modeled by minimizing a cost term to penalize the case of violation.
  • “Hard” modeling. “Hard” modeling of detection-level exclusion is implemented by applying explicit constraint.

3.4.2 Trajectory-level Exclusion Modeling

Generally, trajectory-level exclusion is modeled by penalizing the case that two close detection hypotheses have different trajectory labels. This will suppress one trajectory label.

3.5 Occlusion Handling

Occlusion is perhaps the most critical challenge in MOT. It is a primary cause for ID switches or fragmentation of trajectories.

3.5.1 Part-to-whole

This strategy is built on the assumption that a part of the object is still visible when an occlusion happens.
Tracker would be aware of this and adopt only the unoccluded parts for estimation. Specifically, parts are derived by dividing objects into grids uniformly [54], or fitting multiple parts into a specific kind of object like human, e.g. 15 non-overlap parts as in [51], and parts detected from the DPM detector [123] in [81, 124].

3.5.2 Hypothesize-and-test

This strategy sidesteps challenges from occlusion by hypothesizing proposals and testing the proposals according to observations at hand.

3.5.3 Buffer-and-recover

This strategy buffers observations when occlusion happens and remembers states of objects before occlusion.

3.5.4 Others

The strategies described above may not cover all the tactics explored in the community.

3.6 Inference

3.6.1 Probabilistic Inference

Approaches based on probabilistic inference typically represent states of objects as a distribution with uncertainty. The goal of a tracking algorithm is to estimate the probabilistic distribution of target state by a variety of probability reasoning methods based on existing observations.

  • Kalman filter. In the case of a linear system and Gaussian-distributed object states, the Kalman filter [39] is proven to be the optimal estimator. It has been applied in [37].
  • Extended Kalman filter. To include the non-linear case, the extended Kalman filter is one possible solution. It approximates the non-linear system by a Taylor expansion [36].
  • Particle filter. Monte Carlo sampling based models have also become popular in tracking, especially after the introduction of the particle filter [132, 133, 134, 54, 105, 34, 35, 10]. This strategy models the underlying distribution by a set of weighted particles, thereby allowing to drop any assumptions about the distribution itself [105, 34, 35, 38].

3.6.2 Deterministic Optimization

As opposed to the probabilistic inference methods, approaches based on deterministic optimization aim to find the maximum a posterior (MAP) solution to MOT. To that end, the task of inferring data association, the target states or both, is typically cast as an optimization problem.

3.6.3 Discussion

In practice, deterministic optimization or energy minimization is employed more popularly compared with probabilistic approaches.

3.7 Summary

  • It is important to note that not all existing MOT methods have all the components.
  • In general, appearance, motion and inference are mandatory in most methods.
  • It is also notable that, these components are not orthogonal to each other.

4. MOT Evaluation

For a given MOT approach, metrics and datasets are required to evaluate its performance quantitatively.

4.1 Metrics

在这里插入图片描述

4.1.1 Metrics for Detection

  • Accuracy. FAF, FPPI, MODA
  • Precision. MODP

4.1.2 Metrics for Tracking

  • Accuracy. IDs, MOTA
  • Precision. MOTP, TDE, OSPA
  • Completeness. MT, PT, ML, FM
  • Robustness. RS, RL

4.2 Datasets

在这里插入图片描述

4.3 Public Algorithms

在这里插入图片描述

4.4 Benchmark Results在这里插入图片描述

Strictly speaking, in order to make a direct and fair comparison, one needs to fix all the other components while varying the one under consideration.

5. Summary

This paper has described methods and problems related to the task of Multiple Object Tracking (MOT) in videos.

5.1 Existing Issues

  • One major issue in the MOT research is that, performance of an MOT method depends heavily on the object detectors.
  • Another nuisance is that, when developing an MOT solution, there are many parameters if this algorithm is too complicated.

5.2 Future Directions

  • MOT with video adaptation. A customization of the object detector is necessary to improve MOT performance. One solution proposed by Shu et al . [192] adapts a generic pedestrian detector to a specific video by progressively refining the generic pedestrian detector.
  • MOT under multiple cameras. The first one is that multiple cameras record the same scene, i.e., multiple views. The second one is that each camera records a different scene, i.e., a non-overlapping multi-camera network.
  • Multiple 3D object tracking. However, 3D tracking requires camera calibration, or has to overcome other challenges for estimating camera poses and scene layout. Meanwhile, 3D model design is another issue exclusive to 2D MOT.
  • MOT with scene understanding. The analyzing results from scene understanding can provide contextual information and scene structure, which is very helpful to the tracking problem if it is better incorporated into an MOT algorithm.
  • MOT with deep learning. Deep learning based models have emerged as an extremely
    powerful framework to deal with different kinds of vision problems including image classification [198], object detection [186, 187, 188], and more relevantly single object tracking
    [184].
  • MOT with other computer vision tasks. Possible combinations include object segmentation [206, 207, 208, 209], re-identification [210, 194, 211], human pose estimation [18, 212, 213, 214, 215], and action recognition [19].
  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值