《Multiple Object Tracking: A Literature Review》多目标跟踪综述论文阅读笔记

凉拌西瓜炒鸡腿

已于 2022-07-29 10:59:34 修改

阅读量1.1k

点赞数 1

文章标签：计算机视觉目标跟踪人工智能

于 2022-07-29 10:59:15 首次发布

本文链接：https://blog.csdn.net/AndrewGuo0930/article/details/126050765

版权

在这里插入图片描述

论文地址

[1409.7618 Multiple Object Tracking: A Literature Review - arXiv]

Abstract

介绍了本综述在MOT领域的贡献。

1. Introduction

本综述只关注行人目标跟踪，原因有三：

行人非刚体，是MOT问题的理想对象。
应用面广，商业价值高。
MOT领域70%都是针对行人开展。

同SOT相比，MOT还需要面对的问题有：

频繁遮挡。
轨迹初始与终止。
相似外观。
物体间相互作用。

本综述的贡献可归纳为：

Unified formulation，两种不同的MOT算法分类方法。
调研了不同关键组件并进行分析。
展示了常用数据集上不同方法的实验结果。
开放问题，讨论未来发展方向。

2. MOT Problem

2.1 Problem Formulation

从概率统计的角度定义了MOT问题，简单来说就是在得到观测结果后，寻找到一个状态序列，使得后验概率最大。

2.2 MOT Categorization

从三个标准进行分类，依据任务处理的顺序：

a) initialization method, b) processing mode, and c) type of output.

2.2.1 Initialization Method

Most existing MOT works can be grouped into two sets [51], depending on how objects are initialized: Detection-Based Tracking (DBT) and Detection-Free Tracking (DFT).

在这里插入图片描述
DBT更为通用，因为DFT无法处理新物体的出现与消失问题。

2.2.2 Processing Mode

MOT can also be categorized into online tracking and online tracking. The difference is whether observations from future frames are utilized when handling the current frame.

在这里插入图片描述

2.2.3 Type of Output

This criterion classifies MOT methods into deterministic ones and probabilistic ones, depending on the randomness of output.
Stochastic Tracking. The output results of stochastic tracking vary from time to time.
Deterministic Tracking. The output of deterministic tracking is constant when running the methods multiple times.

3. MOT Component

3.1 Appearance Model

Technically, an appearance model includes two components: visual representation and
statistical measuring.

3.1.1 Visual Representation

在这里插入图片描述

Local features. KLT, optical flow
Region features. Here, order means the order of discrepancy when computing the representation.
- Zero-order. color histogram, raw pixel template
- First-order. Gradient-based representations like HOG, level-set formulation
- Up-to-second-order. Region covariance matrix
Others.

3.1.2 Statistical Measuring

Based on visual representation, statistical measure computes the affinity between two observations.

在这里插入图片描述

3.2 Motion Model

The motion model captures the dynamic behavior of an object. It estimates the potential position of objects in the future frames, thereby reducing the search space.

3.2.1 Linear Motion Model

Velocity smoothness is modeled by enforcing the velocity values of an object in successive frames to change smoothly.
Position smoothness directly forces the discrepancy between the observed position and estimated position.
Acceleration smoothness

3.2.2 Non-linear Motion Model

在这里插入图片描述

3.3 Interaction Model

Interaction model, also known as mutual motion model, captures the influence of an object on other objects.

3.3.1 Social Force Models

Social force models are also known as group models. In these models, each object is considered to be dependent on other objects and environmental factors.

Individual force.
- fidelity（忠诚）, which means one should not change his desired destination
- constancy, which means one should not suddenly change his momentum, including
  speed and direction
Group force.
- attraction, which means individuals moving together as a group should stay close
- repulsion（排斥）, which means that individuals moving together as a group should keep some distance away from others to make all members comfortable
- coherence, which means individuals moving together as a group should move with similar velocity

3.3.2 Crowd Motion Pattern Models

Inspired by the crowd simulation literature [24], motion patterns are introduced to alleviate the difficulty of tracking an individual object in the crowd.

3.4 Exclusion Model

Exclusion is a constraint employed to avoid physical collisions when seeking a solution to the MOT problem. It arises from the fact that two distinct objects cannot occupy the same physical space in the real world.

3.4.1 Detection-level Exclusion Modeling

Two different detection responses in the same frame cannot be assigned to the same target.

“Soft” modeling. Detection-level exclusion is “softly” modeled by minimizing a cost term to penalize the case of violation.
“Hard” modeling. “Hard” modeling of detection-level exclusion is implemented by applying explicit constraint.

3.4.2 Trajectory-level Exclusion Modeling

Generally, trajectory-level exclusion is modeled by penalizing the case that two close detection hypotheses have different trajectory labels. This will suppress one trajectory label.

3.5 Occlusion Handling

Occlusion is perhaps the most critical challenge in MOT. It is a primary cause for ID switches or fragmentation of trajectories.

3.5.1 Part-to-whole

This strategy is built on the assumption that a part of the object is still visible when an occlusion happens.
Tracker would be aware of this and adopt only the unoccluded parts for estimation. Specifically, parts are derived by dividing objects into grids uniformly [54], or fitting multiple parts into a specific kind of object like human, e.g. 15 non-overlap parts as in [51], and parts detected from the DPM detector [123] in [81, 124].

3.5.2 Hypothesize-and-test

This strategy sidesteps challenges from occlusion by hypothesizing proposals and testing the proposals according to observations at hand.

3.5.3 Buffer-and-recover

This strategy buffers observations when occlusion happens and remembers states of objects before occlusion.

3.5.4 Others

The strategies described above may not cover all the tactics explored in the community.

3.6 Inference

3.6.1 Probabilistic Inference

Approaches based on probabilistic inference typically represent states of objects as a distribution with uncertainty. The goal of a tracking algorithm is to estimate the probabilistic distribution of target state by a variety of probability reasoning methods based on existing observations.

Kalman filter. In the case of a linear system and Gaussian-distributed object states, the Kalman filter [39] is proven to be the optimal estimator. It has been applied in [37].
Extended Kalman filter. To include the non-linear case, the extended Kalman filter is one possible solution. It approximates the non-linear system by a Taylor expansion [36].
Particle filter. Monte Carlo sampling based models have also become popular in tracking, especially after the introduction of the particle filter [132, 133, 134, 54, 105, 34, 35, 10]. This strategy models the underlying distribution by a set of weighted particles, thereby allowing to drop any assumptions about the distribution itself [105, 34, 35, 38].

3.6.2 Deterministic Optimization

As opposed to the probabilistic inference methods, approaches based on deterministic optimization aim to find the maximum a posterior (MAP) solution to MOT. To that end, the task of inferring data association, the target states or both, is typically cast as an optimization problem.

3.6.3 Discussion

In practice, deterministic optimization or energy minimization is employed more popularly compared with probabilistic approaches.

3.7 Summary

It is important to note that not all existing MOT methods have all the components.
In general, appearance, motion and inference are mandatory in most methods.
It is also notable that, these components are not orthogonal to each other.

4. MOT Evaluation

For a given MOT approach, metrics and datasets are required to evaluate its performance quantitatively.

4.1 Metrics

在这里插入图片描述

4.1.1 Metrics for Detection

Accuracy. FAF, FPPI, MODA
Precision. MODP

4.1.2 Metrics for Tracking

Accuracy. IDs, MOTA
Precision. MOTP, TDE, OSPA
Completeness. MT, PT, ML, FM
Robustness. RS, RL

4.2 Datasets

在这里插入图片描述

4.3 Public Algorithms

在这里插入图片描述

4.4 Benchmark Results

Strictly speaking, in order to make a direct and fair comparison, one needs to fix all the other components while varying the one under consideration.

5. Summary

This paper has described methods and problems related to the task of Multiple Object Tracking (MOT) in videos.

5.1 Existing Issues

One major issue in the MOT research is that, performance of an MOT method depends heavily on the object detectors.
Another nuisance is that, when developing an MOT solution, there are many parameters if this algorithm is too complicated.

5.2 Future Directions

MOT with video adaptation. A customization of the object detector is necessary to improve MOT performance. One solution proposed by Shu et al . [192] adapts a generic pedestrian detector to a specific video by progressively refining the generic pedestrian detector.
MOT under multiple cameras. The first one is that multiple cameras record the same scene, i.e., multiple views. The second one is that each camera records a different scene, i.e., a non-overlapping multi-camera network.
Multiple 3D object tracking. However, 3D tracking requires camera calibration, or has to overcome other challenges for estimating camera poses and scene layout. Meanwhile, 3D model design is another issue exclusive to 2D MOT.
MOT with scene understanding. The analyzing results from scene understanding can provide contextual information and scene structure, which is very helpful to the tracking problem if it is better incorporated into an MOT algorithm.
MOT with deep learning. Deep learning based models have emerged as an extremely
powerful framework to deal with different kinds of vision problems including image classification [198], object detection [186, 187, 188], and more relevantly single object tracking
[184].
MOT with other computer vision tasks. Possible combinations include object segmentation [206, 207, 208, 209], re-identification [210, 194, 211], human pose estimation [18, 212, 213, 214, 215], and action recognition [19].

凉拌西瓜炒鸡腿

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
《Multiple Object Tracking: A Literature Review》多目标跟踪综述论文阅读笔记

《Multiple Object Tracking: A Literature Review》2022多目标跟踪综述论文阅读笔记
复制链接

扫一扫

《Multiple Object Tracking: A Literature Review》多目标跟踪综述论文阅读笔记

论文地址

Abstract

1. Introduction

2. MOT Problem

2.1 Problem Formulation

2.2 MOT Categorization

2.2.1 Initialization Method

2.2.2 Processing Mode

2.2.3 Type of Output

3. MOT Component

3.1 Appearance Model

3.1.1 Visual Representation

3.1.2 Statistical Measuring

3.2 Motion Model

3.2.1 Linear Motion Model

3.2.2 Non-linear Motion Model

3.3 Interaction Model

3.3.1 Social Force Models

3.3.2 Crowd Motion Pattern Models

3.4 Exclusion Model

3.4.1 Detection-level Exclusion Modeling

3.4.2 Trajectory-level Exclusion Modeling

3.5 Occlusion Handling

3.5.1 Part-to-whole

3.5.2 Hypothesize-and-test

3.5.3 Buffer-and-recover

3.5.4 Others

3.6 Inference

3.6.1 Probabilistic Inference

3.6.2 Deterministic Optimization

3.6.3 Discussion

3.7 Summary

4. MOT Evaluation

4.1 Metrics

4.1.1 Metrics for Detection

4.1.2 Metrics for Tracking

4.2 Datasets

4.3 Public Algorithms

4.4 Benchmark Results

5. Summary

5.1 Existing Issues

5.2 Future Directions

“相关推荐”对你有帮助么？