DeepSort论文翻译-中英对照

最新推荐文章于 2025-03-03 10:08:57 发布

ambm29

最新推荐文章于 2025-03-03 10:08:57 发布

阅读量3.4k

点赞数 3

文章标签：跟踪算法多目标跟踪卷积神经网络深度学习

本文链接：https://blog.csdn.net/ambm29/article/details/102585436

版权

SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC

简单在线和实时跟踪与深度关联度量

摘要：

Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms. In this paper, we integrate appearance information to improve the performance of SORT. Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches. In spirit of the original framework we place much of the computational complexity into an offline pre-training stage where we learn a deep association metric on a largescale person re-identification dataset. During online application, we establish measurement-to-track associations using nearest neighbor queries in visual appearance space. Experimental evaluation shows that our extensions reduce the number of identity switches by 45%, achieving overall competitive performance at high frame rates.

简单线和实时跟踪(SORT)是一种实用的多目标跟踪方法，重点是简单、有效的算法。在本文中，我们将外观信息集成起来，以提高SORT的性能。由于此扩展，我们能够通过更长的闭塞周期来跟踪对象，从而有效地减少身份切换的数量。本着原有框架的精神，我们将大量的计算复杂性放在一个离线的预训练阶段，该阶段我们在大规模的人重新识别数据集中学习深度关联度量。在线应用中，我们在视觉外观空间中使用最近邻查询建立测量到跟踪关联.实验评估表明，我们的扩展将身份切换次数减少了45%，实现了在高帧速率下的总体有竞争力的性能。

1.介绍：

Due to recent progress in object detection, tracking-bydetection has become the leading paradigm in multiple object tracking. Within this paradigm, object trajectories are usually found in a global optimization problem that processes entire video batches at once. For example, flow network formulations [1, 2, 3] and probabilistic graphical models [4, 5, 6, 7] have become popular frameworks of this type. However, due to batch processing, these methods are not applicable in online scenarios where a target identity must be available at each time step. More traditional methods are Multiple Hypothesis Tracking (MHT) [8] and the Joint Probabilistic Data Association Filter (JPDAF) [9]. These methods perform data association on a frame-by-frame basis. In the JPDAF, a single state hypothesis is generated by weighting individual measurements by their association likelihoods. In MHT, all possible hypotheses are tracked, but pruning schemes must be applied for computational tractability. Both methods have recently been revisited in a tracking-by-detection scenario [10, 11] and shown promising results. However, the performance of these methods comes at increased computational and implementation complexity.

由于最近在目标检测方面的进展，通过检测来跟踪已经成为多目标跟踪的主导范式。在这个范例中，对象轨迹通常出现在全局优化问题中，该问题同时处理整个视频批次。例如，流网络公式[1，2，3]和概率图形模型[4，5，6，7]已成为这方面的流行框架。但是，由于批处理，这些方法不适用于在每个时间步长必须有目标标识的在线场景中。更传统的方法是多假设跟踪(MHT)[8]和联合概率数据关联滤波器(JPDAF)[9]。这些方法在逐帧的基础上执行数据关联. 在JPACC中，单状态假设是通过加权个体测量的关联概率来产生的。在MHT中，所有可能的假设都会被跟踪，但为了便于计算，必须采用剪枝方案。最近，这两种方法都在检测跟踪场景[10，11]中被重新研究，并显示出了有希望的结果。然而，这些方法的性能需要更多的计算量和实现复杂性。

Simple online and realtime tracking (SORT) [12] is a much simpler framework that performs Kalman filtering in image space and frame-by-frame data association using the Hungarian method with an association metric that measures bounding box overlap. This simple approach achieves favorable performance at high frame rates. On the MOT challenge dataset [13], SORT with a state-of-the-art people detector [14] ranks on average higher than MHT on standard detections.This not only underlines the influence of object detector performance on overall tracking results, but is also an important insight from a practitioners point of view.

简单线和实时跟踪(SORT)[12]是一个更简单的框架，它在图像空间中执行卡尔曼滤波，并使用匈牙利方法逐帧关联数据，使用关联度量度量边界框重叠。这种简单的方法在高帧速率下获得了良好的性能。在MOT挑战数据集[13]中，使用最先进的人员检测器[14]，SORT平均排名高于标准检测上的MHT。这不仅强调了目标检测器性能对总体跟踪结果的影响，而且也是从业者的一个重要见解。

While achieving overall good performance in terms of tracking precision and accuracy, SORT returns a relatively high number of identity switches. This is, because the employed association metric is only accurate when state estimation uncertainty is low. Therefore, SORT has a deficiency in tracking through occlusions as they typically appear in frontal-view camera scenes. We overcome this issue by replacing the association metric with a more informed metric that combines motion and appearance information. In particular, we apply a convolutional neural network (CNN) that has been trained to discriminate pedestrians on a large-scale person re-identification dataset. Through integration of this network we increase robustness against misses and occlusions while keeping the system easy to implement, efficient, and applicable to online scenarios. Our code and a pre-trained CNN model are made publicly available to facilitate research experimentation and practical application development.

虽然在跟踪精度和准确性方面取得了总体良好的性能，但SORT返回的身份切换次数相对较多。这是因为所使用的关联度量只有在状态估计不确定度较低时才是准确的。因此，SORT在通过遮挡跟踪方面有缺陷，因为它们通常出现在正面视角的摄像机场景中。我们通过将关联度量替换为结合运动和外观信息的更有见地的度量来克服这一问题。特别是，我们应用了一种卷积神经网络(Cnn)，该神经网络已被训练用于在大规模的人的再识别数据集上识别行人。通过这种网络的集成，我们提高了对错误和遮挡的鲁棒性，同时使系统易于实现、有效和适用于在线场景。我们的代码和预先训练的cnn模型是公开提供的，以促进研究、实验和实际应用开发。

2.使用深度关联度量的SORT算法

We adopt a conventional single hypothesis tracking methodology with recursive Kalman filtering and frame-by-frame data association. In the following section we describe the core components of this system in greater detail.

我们采用传统的单假设跟踪方法，采用递推卡尔曼滤波和逐帧数据关联的方法.在下一节中，我们将更详细地描述本系统的核心组件。

2.1.航迹处理与状态估计

跟踪处理和卡尔曼滤波框架与[12]中的原始公式基本相同。我们假设了一个非常普遍的跟踪场景，在这种情况下，相机是不加标记的，而且没有自我运动信息可用。虽然这些情况对过滤框架构成了挑战，但它是最近多目标跟踪基准[15]中考虑的最常见的设置。因此，我们的跟踪场景在八维状态空间(u,v,r,h,x`,y`r`,h`)上定义，其中包含包围盒中心位置(u，v)，纵横比γ，高度h和它们各自在图像坐标中的速度。我们使用了一个常速度运动的标准卡尔曼滤波和线性观测模型，其中我们以边界坐标(u，v，γ，h)为直接值作为对物体状态的观察。

For each track k we count the number of frames since the last successful measurement association ak. This counter is incremented during Kalman filter prediction and reset to 0 when the track has been associated with a measurement.Tracks that exceed a predefined maximum age Amax are considered to have left the scene and are delete