【论文翻译】SORT：SIMPLE ONLINE AND REALTIME TRACKING

最新推荐文章于 2022-07-29 16:26:28 发布

望天边星宿

最新推荐文章于 2022-07-29 16:26:28 发布

阅读量1.3k

点赞数 2

本文链接：https://blog.csdn.net/See_Star/article/details/105723105

版权

虽然在网上看到一些论文翻译，但是基本都是机翻，很多句子和专业词汇翻译都有很大问题。为了学习SORT，我就结合Google翻译和自身的理解，对原论文进行翻译。有很多地方未能理解，翻译有问题的地方请多多指教。

PDF阅读结合Copytranslator，可以通过鼠标选中句子段落快速翻译句子和词汇，像我一样英语不好的同学可以试一试。

论文下载：https://arxiv.org/abs/1602.00763

算法大致流程：
在这里插入图片描述
可以将整体拆分为两个部分：分别是匹配过程和卡尔曼预测和更新过程。

首先，使用匈牙利算法对检测结果与通过卡尔曼过滤器的预测结果进行IOU匹配，生成三类结果：无匹配轨迹、未匹配检测和已匹配轨迹。对于没匹配上的轨迹直接将其删除；对于未匹配的检测将创建新轨迹；对于已匹配的轨迹将通过卡尔曼滤波器进行更新。

更新后的轨迹与新生成的轨迹将通过卡尔曼滤波器进行新一轮的预测并与检测结果进行匹配。

摘要(ABSTRACT)

This paper explores a pragmatic approach to multiple object tracking where the main focus is to associate objects efficiently for online and realtime applications. To this end, detection quality is identified as a key factor influencing tracking performance, where changing the detector can improve tracking by up to 18.9%.Despite only using a rudimentary combination of familiar techniques such as the Kalman Filter and Hungarian algorithm for the tracking components, this approach achieves an accuracy comparable to state-of-the-art online trackers. Furthermore, due to the simplicity of our trackingmethod,the trackerupdatesat a rate of 260Hz which is over 20x faster than other state-of-the-art trackers.

Index Terms— Computer Vision, Multiple Object Tracking, Detection, Data Association

本文研究了一种实用的多对象跟踪方法，为了能有效地实时在线关联对象。检测算法的检测效果决定了跟踪效果的好坏，更改检测器可以将跟踪质量提高多达18.9％。尽管跟踪组件只使用了卡尔曼过滤器和匈牙利算法等熟悉技术，但该方法的精度仍可与最新在线跟踪器相媲美。此外，由于我们跟踪方法的简单性，该跟踪器的更新频率为260Hz，比其他最新的跟踪器快20倍以上。

关键字——计算机视觉，多目标跟踪，检测，数据关联

1. 引言（INTRODUCTION）

【段1】This paper presents a lean implementation of a tracking-by-detection framework for the problem of multiple object tracking (MOT) where objects are detected each frame and represented as bounding boxes.In contrast to many batch based tracking approaches, this work is primarily targeted towards online tracking where only detections from the previous and the current frame are presented to the tracker.

Additionally, a strong emphasis is placed on efficiency for facilitating realtime tracking and to promote greater uptake in applications such as pedestrian tracking for autonomous vehicles.

本文针对多对象跟踪问题（MOT）提出了“通过检测进行跟踪（tracking-by-detection）”的框架，在该问题中，每一帧的物体都将被检测并通过标记框（bbox）进行标记。与许多基于批次的跟踪方法相比，该算法主要针对在线跟踪，其中仅将前一帧和当前帧的检测结果呈现给跟踪器。

另外，重点主要放在效率方面，以提升实时检测效率，并进一步促进行人跟踪和自动驾驶等应用的发展。

【段2】The MOT problem can be viewed as a data association problem where the aim is to associate detections across frames in a video sequence. To aid the data association process, trackers use various methods for modelling the motion and appearance of objects in the scene. The methods employed by this paper were motivated through observations made on a recently established visual MOT benchmark.

Firstly, there is a resurgence of mature data association techniques including Multiple Hypothesis Tracking (MHT) and Joint Probabilistic Data Association(JPDA) which occupy many of the top positions of the MOT benchmark. Secondly, the only tracker that does not use the Aggregate Channel Filter (ACF) detector is also the top ranked tracker, suggesting that detection quality could be holding back the other trackers.

Furthermore, the trade-off between accuracy and speed appears quite pronounced, since the speed of most accurate trackers is considered too slow for realtime applications (see Fig. 1). With the prominence of traditional data association techniques among the top online and batch trackers along with the use of different detections used by the top tracker, this work explores how simple MOT can be and how well it can perform.

多对象跟踪问题（MOT）可以看作是数据关联问题，其目的是将视频序列中的各帧检测关联起来。为了辅助数据关联过程，跟踪器（tracks）使用各种方法对场景中的运动和物体的外观进行建模。本文采用的方法来源于最新的视觉MOT benchmark的评论。

首先，出现了成熟的数据关联技术，其中包括多重假设跟踪（MHT）和占据了MOT benchmark的许多顶级位置的联合概率数据关联技术（JPDA）。其次，唯一不使用聚合信道过滤器（ACF）检测器的跟踪器也是排名最高的跟踪器，这表明检测质量可能会影响其他跟踪器。

此外，精度和速度之间的权衡似乎非常明显，因为对于实时应用而言，最精确的跟踪器的速度也是最慢的（见图1）。随着顶级在线和批量跟踪器之间传统数据关联技术的兴起，以及顶级跟踪器的不同检测方法的使用，这项工作揭示了MOT的简单性和性能。
在这里插入图片描述
Fig.1. Benchmark performance of the proposed method (SORT) in relation to several baseline trackers. Each marker indicates a trackers accuracy and speed measured in frames per second (FPS) [Hz], i.e. higher and more right is better.

图1.SORT 的 Benchmark性能与其他跟踪器的对比。每个标记表示跟踪器的精度和速度（以每秒帧数（FPS）[Hz]为单位），即越高越好。

【段3】Keeping in line with Occam’s Razor, appearance features beyond the detection component are ignored in tracking and only the bounding box position and size are used for both motion estimation and data association.

Furthermore, issues regarding short-term and long-term occlusion are also ignored, as they occur very rarely and their explicit treatment introduces undesirable complexity into the tracking framework. We argue that incorporating complexity in the form of object re-identification adds significant overhead into the tracking framework – potentially limiting its use in realtime applications.

根据奥卡姆剃刀原理（“如无必要，勿增实体”，即“简单有效原理”），跟踪过程中将忽略检测组件之外的外观特征，仅将边框的位置和大小用于运动估计和数据关联。

此外，关于短期和长期的遮挡问题也可忽略，因为它们很少发生，并且对它们的处理会影响跟踪框架。我们认为，以对象重新标识的形式合并复杂性会在跟踪框架中增加大量开销，这可能会限制其在实时应用程序中的使用。

【段4】This design philosophy is in contrast to many proposed visual trackers that incorporate a myriad of components to handle various edge cases and detection errors.
This work instead focuses on efficient and reliable handling of the common frame-to-frame associations. Rather than aiming to be robust to detection errors, we instead exploit recent advances in visual object detection to solve the detection problem directly.

This is demonstrated by comparing the common ACF pedestrian detector with a recent convolutional neural network (CNN) based detector . Additionally, two classical yet extremely efficient methods, Kalman filter and Hungarian method, are employed to handle the motion prediction and data association components of the tracking problem respectively. This minimalistic formulation of tracking facilitates both efficiency and reliability for online tracking, see Fig. 1.

In this paper, this approach is only applied to tracking pedestrians in various environments, however due to the flexibility of CNN based detectors, it naturally can be generalized to other objects classes.

这种设计理念与许多包含无数个组件来处理各种边缘情况和检测错误的跟踪器形成了鲜明的对比。这项工作着眼于帧与帧之间的关联。我们不是追求检测错误的鲁棒性，而是利用视觉对象检测的最新进展直接解决了检测问题。

通过将常见的ACF行人检测器与最新的基于卷积神经网络（CNN）检测器进行比较来证明这一点。另外，采用了两种经典而又极为有效的方法，即卡尔曼滤波器和匈牙利方法，分别处理跟踪问题的运动预测和数据关联组件。跟踪的这种简化形式促进了在线跟踪的效率和可靠性，请参见图1。

在本文中，该方法仅适用于跟踪各种环境中的行人，但是由于基于CNN的探测器的灵活性，自然可以将其推广到其他对象类。

The main contributions of this paper are:

We leverage the power of CNN based detection in the context of MOT.
A pragmatic tracking approach based on the Kalman filter and the Hungarian algorithm is presented and evaluated on a recent MOT benchmark.
Code will be open sourced to help establish a baseline method for research experimentation and uptake in collision avoidance applications.

This paper is organised as follows: Section 2 provides a short review of related literature in the area of multiple object tracking. Section 3 describes the proposed lean tracking framework before the effectiveness of the proposed framework on standard benchmark sequences is demonstrated in Section 4. Finally, Section 5 provides a summary of the learnt outcomes and discusses future improvements.

本文的主要贡献是：

我们在MOT的背景下利用基于CNN的检测方式。
提出了基于卡尔曼滤波器和匈牙利算法的实用跟踪方法，并在最新的MOT Benchmark上进行了评估。
代码将开源，以帮助建立用于研究实验和避免碰撞应用的基线方法。

文章安排如下：

第2节简要介绍了多目标跟踪领域的相关文献。
第3节介绍跟踪框架。
第4节中证明了该框架对标准基准序列的有效性。
第5节提供了学习成果的摘要并讨论未来的改进。

2. 文献综述（LITERATURE REVIEW）

【段1】Traditionally MOT has been solved using Multiple Hypothesis Tracking(MHT) or the Joint Probabilistic Data Association (JPDA) filters, which delay making difficult decisions while there is high uncertainty over the object assignments. The combinatorial complexity of these approaches is exponential in the number of tracked objects making them impractical for realtime applications in highly dynamic environments.

Recently, Rezatofighi et al. revisited the JPDA formulation in visual MOT with the goal to address the combinatorial complexity issue with an efficient approximation of the JPDA by exploiting recent developments in solving integer programs. Similarly, Kim et al. used an appearance model for each target to prune the MHT graph to achieve state-of-the-art performance. However, these methods still delay the decision making which makes them unsuitable for online tracking.

传统的MOT问题已通过多假设跟踪（MHT）或联合概率数据关联（JPDA）过滤器所解决，然而在对象分配上有很高的不确定性，难以做出决定。这些方法的复杂组合，使得在被跟踪对象的数量呈指数增长时，在高度动态环境中的实时应用显得不切实际。

最近，Rezatofighi等人，他重新介绍了可视MOT中的JPDA公式，目的是通过利用解决整数程序的最新进展来有效地逼近JPDA，从而解决组合复杂性问题。同样，金等人为每个目标使用了外观模型来修剪MHT图，以实现最佳性能。但是，这些方法仍会延迟决策，因此不适合在线跟踪。

【段3】Many online tracking methods aim to build appearance models of either the individual objects themselves or a global model through online learning. In addition to appearance models, motion is often incorporated to assist associating detections to tracklets. When considering only one-to-one correspondences modelled as bipartite graph matching, globally optimal solutions such as the Hungarian algorithm can be used.

许多在线跟踪方法旨在通过在线学习来构建对象本身或一个全局模型的外观模型。除外观模型外，通常还包含运动以帮助将检测与片段相关联。仅考虑建模为二部图匹配的一对一对应关系时，可以使用诸如匈牙利算法之类的全局最优解算法。

【段4】The method by Geigeretal. uses the Hungarian algorithm in a two stage process. First, tracklets are formed by associating detections across adjacent frames where both geometry and appearance cues are combined to form the affinity matrix. Then, the tracklets are associated to each other to bridge broken trajectories caused by occlusion, again using both geometry and appearance cues. This two step association method restricts this approach to batch computation. Our approach is inspired by the tracking component of , however we simplify the association to a single stage with basic cues as described in the next section.

Geigeretal提出了在两个阶段处理中使用匈牙利算法。首先，通过将相邻帧之间的关联检测来形成跟踪小片段，在该相邻帧中，将几何形状和外观结合在一起形成权重矩阵。然后，再次使用几何和外观提示，将跟踪小片段彼此关联以桥接由遮挡引起的折断的轨迹。此两步关联方法将这种方法限制于批处理计算。我们的方法受到的跟踪组件的启发，但是我们将通过下一节中描述的基本提示将关联简化到单个阶段。

3. 方法（METHODOLOGY）

The proposed method is described by the key components of detection, propagating object states into future frames, associating current detections with existing objects, and managing the lifespan of tracked objects.

此方法可由检测的关键组件来描述，结合检测出的现有对象，将其状态传播到下一帧，并管理所跟踪对象的整个生命周期。

3.1 检测（Detecion）

【段1】To capitalise on the rapid advancement of CNN based detection, we utilise the Faster Region CNN (FrRCNN) detection framework. FrRCNN is an end-to-end framework that consists of two stages. The first stage extracts features and proposes regions for the second stage which then classifies the object in the proposed region. The advantage of this framework is that parameters are shared between the two stages creating an efficient framework for detection. Additionally, the network architecture itself can be swapped to any design which enables rapid experimentation of different architectures to improve the detection performance.

由于CNN检测技术的快发发展，所以我们采用Faster Region CNN（FrRCNN）检测框架。FrRCNN是一个端到端的框架，共包含两个阶段。第一阶段提取特征并提出区域，第二阶段将提取区域内的对象进行分类。该框架的优势在于，两个阶段之间共享参数，从而创建了一个有效的检测框架。此外，网络体系结构本身可以替换为任何设计，从而可以快速试验不同的体系结构以提高检测性能。

【段2】Here we compare two network architectures provided with FrRCNN, namely the architecture of Zeiler and Fergus (FrRCNN(ZF)) and the deeper architecture of Simonyan and Zisserman (FrRCNN(VGG16)). Throughout this work, we apply the FrRCNN with default parameters learnt for the PASCAL VOC challenge. As we are only interested in pedestrians we ignore all other classes and only pass person detection results with output probabilities greater than 50% to the tracking framework.

在这里，我们比较了FrRCNN提供的两种网络架构，即Zeiler and Fergus架构（FrRCNN（ZF））和Simonyan and Zisserman的更深层次的架构（FrRCNN（VGG16））。在整个项目过程中，我们将带有默认参数的FrRCNN应用于为PASCAL。因为仅对行人感兴趣，所以我们忽略了其他类别，仅将输出概率大于50％的人检测结果传递给跟踪框架。

通过切换检测器组件来比较跟踪性能。评估验证顺序如下
在这里插入图片描述
【段3】In our experiments, we found that the detection quality has a significant impact on tracking performance when comparing the FrRCNN detections to ACF detections. This is demonstrated using a validation set of sequences applied to both an existing online tracker MDP and the tracker proposed here. Table 1 shows that the best detector (FrRCNN(VGG16)) leads to the best tracking accuracy for both MDP and the proposed method.

在我们的实验中，我们发现在将FrRCNN检测与ACF检测进行比较时，检测质量对跟踪性能产生重大影响。无论是使用在线跟踪器MDP还是此处提出的跟踪器，都可以通过验证序列证明这一点。表1显示，对于MDP和所提出的方法，最佳检测器（FrRCNN（VGG16））有最佳跟踪精度。

3.2 估计模型（Estimation Model）

Here we describe the object model, i.e. the representation and the motion model used to propagate a target’s identity into the next frame. We approximate the inter-frame displacements of each object with a linear constant velocity model which is independent of other objects and camera motion. The state of each target is modelled as:

接下来，我们描述对象模型，用于将目标的信息传播到下一帧的表示形式和运动模型。我们使用线性恒速模型来估计每个对象的帧间位移，该模型与其他对象和摄像机运动无关。每个目标的状态建模为：
$\mathbf{x}=[u, v, s, r, \dot{u}, \dot{v}, \dot{s}]^{T}$

where $u$ and $v$ represent the horizontal and vertical pixel location of the centre of the target, while the scale $s$ and $r$ represent the scale (area) and the aspect ratio of the target’s bounding box respectively. Note that the aspect ratio is considered to be constant. When a detection is associated to a target, the detected bounding box is used to update the target state where the velocity components are solved optimally via a Kalman filter framework. If no detection is associated to the target, its state is simply predicted without correction using the linear velocity model.

其中 $u$ 和 $v$ 代表目标中心的水平和垂直像素位置， $s$ 和 $r$ 分别代表目标边框的比例（面积）和长宽比。注意，长宽比被认为是恒定的。当检测与目标相关联时，检测到的边界框将用于更新目标状态，在该状态下通过卡尔曼滤波器对速度分量进行最佳求解。如果没有检测与目标相关联，则无需使用线速度模型进行校正即可简单预测其状态。

3.3 数据关联（Data Association）

【段1】In assigning detections to existing targets, each target’s bounding box geometry is estimated by predicting its new location in the current frame. The assignment cost matrix is then computed as the intersection-over-union (IOU) distance between each detection and all predicted bounding boxes from the existing targets. The assignment is solved optimally using the Hungarian algorithm. Additionally, a minimum IOU is imposed to reject assignments where the detection to target overlap is less than $IOU_{min}$

在将检测结果分配给现有目标时，通过预测其在当前帧中的新位置来估计每个目标的边界框。成本矩阵的计算为每个检测结果与现有目标所有预测边界框之间的交并比（IOU）距离。使用匈牙利算法可以最佳解决分配问题。此外，如果目标重叠的检测值小于 $IOU_{min}$ ，则将施加最低IOU来拒绝分配。

【段2】We found that the IOU distance of the bounding boxes implicitly handles short term occlusion caused by passing targets. Specifically, when a target is covered by an occluding object, only the occluder is detected, since the IOU distance appropriately favours detections with similar scale. This allows both the occluder target to be corrected with the detection while the covered target is unaffected as no assignment is made.

我们发现边界框的IOU距离隐式解决了移动目标引起的短期遮挡的问题。当目标被遮挡物覆盖时，由于IOU距离适当地有利于具有类似比例的检测，因此仅检测到遮挡物。这允许通过检测来校正两个封堵器目标，而覆盖目标不受影响，因为未进行重新分配。

3.4 跟踪目标的创建与删除（Creation and Deletion of Track Identities）

【段1】When objects enter and leave the image, unique identities need to be created or destroyed accordingly. For creating trackers, we consider any detection with an overlap less than $IOU_{min}$ to signify the existence of an untracked object. The tracker is initialised using the geometry of the bounding box with the velocity set to zero. Since the velocity is unobserved at this point the covariance of the velocity component is initialised with large values, reflecting this uncertainty. Additionally, the new tracker then undergoes a probationary period where the target needs to be associated with detections to accumulate enough evidence in order to prevent tracking of false positives.

当对象进入和离开画面时，需要相应地创建或删除唯一标识。对于创建跟踪器，我们认为任何重叠小于 $IOU_{min}$ 的检测都表示存在未跟踪的对象。使用速度设置为零的包围框初始化跟踪器。由于此时未观察到速度，因此将速度分量的协方差初始化为较大的值，从而反映出这种不确定性。此外，新的跟踪器会经历一个试用期，在此期间，目标需要与检测相关联以积累足够的证据，以防止误跟踪。

【段2】Tracks are terminated if they are not detected for T Lost frames. This prevents an unbounded growth in the number of trackers and localisation errors caused by predictions over long durations without corrections from the detector. In all experiments T Lost is set to 1 for two reasons. Firstly, the constant velocity model is a poor predictor of the true dynamics and secondly we are primarily concerned with frame-to-frame tracking where object re-identification is beyond the scope of this work. Additionally, early deletion of lost targets aids efficiency. Should an object reappear, tracking will implicitly resume under a new identity.

如果未检测到T Lost帧，跟踪将终止。这可以防止由于长时间的预测而导致的跟踪器数量的无限增长和定位误差，而无需检测器进行校正。在所有实验中，T Lost设置为一的原因有两个：恒速模型无法准确预测真实的动力学；其次，我们主要关注帧到帧的跟踪，而对象的重新识别超出了本文的范围。此外，及早删除丢失的目标有助于提高效率。如果对象再次出现，跟踪将以新的身份恢复。

四. 实验（EXPERIMENTS）

We evaluate the performance of our tracking implementation on a diverse set of testing sequences as set by the MOT benchmark database which contains both moving and static camera sequences. For tuning the initial Kalman filter covariances, $IOU_{min}$ , and T Lost parameters, we use the same training/validation split as reported in [12]. The detection architecture used is the FrRCNN(VGG16) . Source code and sample detections from [22] are available online.

由移动和静态摄像拍摄出的机序列组成MOT基准数据库，由测试序列集评估跟踪性能。为了调节卡尔曼滤波器的初始协方差、 $IOU_ {min}$ 和T Lost参数，我们使用与参考论文[12]中相同的训练/验证拆分。使用的检测体系结构是FrRCNN（VGG16）。参考论文[22]的源代码和样本检测可在线获得。

4.1 衡量指标（Metrics）

Since it is difficult to use one single score to evaluate multi-target tracking performance,we utilise the evaluation metrics defined in [24], along with the standard MOT metrics:

MOTA(↑): Multi-object tracking accuracy.
MOTP(↑): Multi-object tracking precision.
FAF(↓): number of false alarms per frame.
MT(↑): number of mostly tracked trajectories. I.e. target has the same label for at least 80% of its life span.
ML(↓): number of mostly lost trajectories. i.e. target is not tracked for at least 20% of its life span.
FP(↓): number of false detections.
FN(↓): number of missed detections.
ID sw(↓): number of times an ID switches to a different previously tracked object .
Frag(↓): number of fragmentations where a track is interrupted by miss detection.

由于难以使用单一评分来评估多目标跟踪性能，所以我们采用参考论文[24]中定义的评估指标和标准的MOT指标：

MOTA（↑）：多对象跟踪精确度（指在一定实验条件下多次测定的平均值与真值相符合的程度，以误差来表示。它用来表示系统误差的大小。）。
MOTP（↑）：多对象跟踪精密度。（是指多次重复测定同一量时各测定值之间彼此相符合的程度。表征测定过程中随机误差的大小。）
FAF（↓）：每帧错误警报的数量。
MT（↑）：跟踪基本完全的轨迹数。如，目标在至少80％的生命周期中具有相同的标签。
ML（↓）：跟踪基本丢失的轨迹数。如，至少在其生命周期的20％内未跟踪目标。
FP（↓）：错误检测的数量。
FN（↓）：错过检测的数量。
ID sw（↓）：ID切换到另一个先前跟踪的对象的次数。
Frag（↓）：由于错过检测而终止跟踪的片段的数量。

Evaluation measures with (↑), higher scores denote better performance; while for evaluation measures with (↓), lower scores denote better performance. True positives are considered to have at least 50% overlap with the corresponding ground truth bounding box. Evaluation codes were downloaded from [6].

带有（↑）的评估方式，分数越高表示性能越好；对于带有（↓）的评估方式，分数越低表示性能越好。 True Postives表示与相应的真实边界框至少有50％的重叠。评估代码是从[6]下载的。
在这里插入图片描述

4.2 性能评估（Performance Evaluation）

Tracking performance is evaluated using the MOT benchmark test server where the ground truth for 11 sequences is withheld. Table 2 comparesthe proposed method SORT with several other baseline trackers. For brevity, only the most relevant trackers, which are state-of-the-art online trackers in terms of accuracy, such as (TDAM , MDP), the fastest batch based tracker (DP NMS ), and all round near online method (NOMT) are listed. Additionally, methods which inspired this approach (TBD, ALEx-TRAC, and SMOT) are also listed. Compared to these other methods,SORT achieves the highest MOTA score for the online trackers and is comparable to the state-of-the-art method NOMT which is significantly more complex and uses frames in the near future. Additionally, as SORT aims to focus on frame-to-frame associations the number of lost targets (ML) is minimal despite having similar false negatives to other trackers. Furthermore, since SORT focuses on frame-to-frame associations to grow tracklets, it has the lowest number of lost targets in comparison to the other methods.

使用MOT基准测试服务器评估跟踪性能，其中保留11个序列的基本情况。表2将本文提出的方法SORT与其他几种跟踪器进行了比较。方便起见，其中只有最相关的跟踪器（在准确性方面是最先进的在线跟踪器），例如（TDAM，MDP），最快的基于批处理的跟踪器（DP NMS）以及全方位在线方法（ NOMT）。此外，还列出了启发该方法的方法（TBD，ALEX-TRAC和SMOT）。与其他方法相比，SORT在在线跟踪器上获得了最高的MOTA评分，并且可以与最先进的方法NOMT相媲美，但NOMT更复杂。另外，由于SORT专注于帧与帧之间的关联，因此尽管具有与其他跟踪器类似的False Negative，但丢失目标的数量（ML）最少。此外，由于SORT专注于帧到帧的关联以增长小轨迹，因此与其他方法相比，它丢失的目标数量最少。

4.3 运行（Runtime）

Most MOT solutions aim to push performance towards greater accuracy, often, at the cost of runtime performance. While slow runtime may be tolerated in offline processing tasks, for robotics and autonomous vehicles, realtime performance is essential. Fig. 1 shows a number of trackers on the MOT benchmark in relation to both their speed and accuracy. This shows that methods which achieve the best accuracy also tend to be the slowest (bottom right in Figure 1). On the opposite end of the spectrum the fastest methods tend to have lower accuracy (top left corner in Figure 1). SORT combines the two desirable properties, speed and accuracy, without the typical drawbacks (top right in Figure 1). The tracking component runs at 260Hz on single core of an Intel i7 2.5GHz machine with 16 GB memory.

大多数MOT解决方案以实时性为代价，来提高准确性。尽管离线任务允许缓慢的运行时间，但对于机器人技术和自动驾驶汽车，实时性能至关重要。图1显示了MOT基准测试上许多跟踪器的速度和准确性。这表明达到最佳精度的方法也往往是最慢的（图1右下）。在频谱的另一端，最快的方法往往具有较低的精度（图1的左上角）。SORT结合了两个理想的属性，即速度和准确性，而没有典型的缺点（图1右上方）。跟踪组件在具有16 GB内存的Intel i7 2.5GHz计算机的单核上以260Hz运行。

五. 结论（CONCLUSION）

In this paper, a simple online tracking framework is presented that focuses on frame-to-frame prediction and association. We showed that the tracking quality is highly dependent on detection performance and by capitalising on recent developments in detection, state-of-the-art tracking quality can be achieved with only classical tracking methods. The presented framework achieves best in class performance with respect to both speed and accuracy, while other methods typically sacrifice one for the other. The presented framework’s simplicity makes it well suited as a baseline, allowing for new methods to focus on object re-identification to handle long term occlusion. As our experiments highlight the importance of detection quality in tracking, future work will investigate a tightly coupled detection and tracking framework.

在本文中，提出了一个简单的在线跟踪框架，该框架侧重于帧与帧的预测和关联。我们证明了跟踪质量在很大程度上取决于检测性能，并且通过利用检测的最新发展，仅通过经典跟踪方法就可以实现最佳的跟踪质量。所提出的框架在速度和准确性方面均达到了同类最佳的性能，而其他方法通常会牺牲一方面。框架的简单使其非常适合作为基准，从而允许新方法着重于对象的重新识别以处理长期遮挡。由于我们的实验突出了检测质量在跟踪中的重要性，因此未来的工作将研究紧密耦合的检测和跟踪框架。

望天边星宿

关注

2
点赞
踩
9

收藏

觉得还不错? 一键收藏
打赏
0
评论
【论文翻译】SORT：SIMPLE ONLINE AND REALTIME TRACKING

虽然在网上看到一些论文翻译，但是基本都是机翻，很多句子和专业词汇翻译都有很大问题。为了学习SORT，我就结合Google翻译和自身的理解，对原论文进行翻译。PDF阅读结合Copytranslator，可以通过鼠标选中句子段落快速翻译句子和词汇，像我一样英语不好的同学可以试一试。论文下载：https://arxiv.org/abs/1602.00763文章目录摘要(ABSTRACT)1. 引...
复制链接

扫一扫