【ICCV 2021】精度提高了15.2%！！！遥遥领先于SC3D和P2B！Box-Aware Feature Enhancement for SOT

最新推荐文章于 2024-10-02 10:53:34 发布

ai蒜法攻城狮

最新推荐文章于 2024-10-02 10:53:34 发布

阅读量944

点赞数 23

分类专栏：清晰结构论文集文章标签： python 计算机视觉目标检测机器学习人工智能深度学习神经网络

本文链接：https://blog.csdn.net/2301_77698744/article/details/137375975

版权

清晰结构论文集专栏收录该内容

1 篇文章 0 订阅

订阅专栏

🧑 作者简介:

不能准确理清学术论文内容？不如拜访【爱蒜法】论文讲解：

CSDN全站首个中外合作大学专注前沿大语言模型和目标识别方向知识分享者，组织内排行榜占据前列

📒 博客介绍：

1.LLM和MOT，SOT等领域内，使用有最清晰讲解结构的论文讲解；

2.学校内部复习资料和个人学习笔记等等，根据平时的学习，更新博客，欢迎后台学习交流。

🗄️ 专栏介绍：本文归属于专栏【清晰结构论文集】，专栏文章分析在校内排行榜占据前列，持续更新中，欢迎大家免费订阅关注。

论文导航：
📁 【清晰结构论文】：

Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds：https://arxiv.org/pdf/2108.04728v2.pdf

SOT（多目标追踪）问题————解决SOT中SC3D和P2B无法解决的有大面积遮挡情况和稀释点云表现不良的情况

1.《Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds》解读总览

2.概述及引言

3.方法（简化版本，详细部分爱蒜法在文首论文链接）

4.实验

5.结果和结论

1.《Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds》解读总览

时间

2021.10.31

作者

Chaoda Zheng， Xu Yan and so on

总结

1.研究背景与挑战：

>>LiDAR作为一种流行的3D传感器，因其精确的测量、合理的成本和对环境光变化不敏感而受到青睐。

>>然而,由于动态环境和自遮挡，LiDAR生成的点云数据通常是不规则和不完整的，这增加了SOT任务的难度。
2.特征比较的传统方法：

>>使用穷举搜索或卡尔曼滤波器生成当前帧的候选形状，并使用孪生网络与模板进行比较。

>> 然而，这些方法存在缺陷，如无法编码对象的大小信息，以及无法捕获对象内部的显式部分结构信息。
3.Box-Aware Tracker (BAT)的构建：

>>通过将BoxCloud和BAFF模块集成到现有的P2B模型中，作者构建了一个更优越的box-aware跟踪器（BAT）。

>>BAT通过利用辅助的3D BBox作为输入，捕获更多的形状约束和部件感知信息，从而在LiDAR点云上实现有效和鲁棒的跟踪。
4.3D中BBox的利用：

>>在3D SOT任务中，尽管第一帧提供了对象的BBox，但先前的方法仅使用固定放大倍数扩展BBox并裁剪出内部的点云作为模板。

>>在3D检测中，有研究使用来自3D真实BBox的free部分监督来预测所有3D点的内部对象部分位置，但这些方法仅关注部分感知信息，忽略了BBox的大小。

2.概述及引言

Current 3D single object tracking approaches track the target based on a feature comparison between the target template and the search area. However, due to the common occlusion in LiDAR scans, it is non-trivial to conduct accurate feature comparisons on severe sparse and incomplete shapes. In this work, we exploit the ground truth bounding box given in the first frame as a strong cue to enhance the feature description of the target object, enabling a more accurate feature comparison in a simple yet effective way. In particular, we first propose the BoxCloud, an informative and robust representation, to depict an object using the point-to-box relation. We further design an efficient box-aware feature fusion module, which leverages the aforementioned BoxCloud for reliable feature matching and embedding. Integrating the proposed general components into an existing model P2B, we construct a superior box-aware tracker (BAT). Experiments confirm that our proposed BAT outperforms the previous state-of-the-art by a large margin on both KITTI and NuScenes benchmarks, achieving a 15.2% improvement in terms of precision while running ∼20% faster.

Single object tracking (SOT) in 3D scenes has a broad spectrum of practical applications, such as autonomous driving, semantic understanding, and assistive robotics. Given a 3D bounding box (BBox) of an object as the template in the first frame, the SOT task is to keep track of this object across all frames. In real scenes, LiDAR becomes a popular 3D sensor due to its precise measurement, reasonable cost, and insensitivity to ambient light variations. In this paper, we focus on SOT on LiDAR data, which can be viewed as 3D point clouds in general. Due to the moving environment and self-occlusion, point clouds generated by a LiDAR system are inevitably irregular and incomplete, making the SOT task very challenging.

当前的3D单目标跟踪方法基于目标模板与搜索区域之间的特征比较来跟踪目标。然而，由于激光雷达扫描中常见的遮挡，对于严重稀疏和不完整的形状进行准确的特征比较并非易事。在本项工作中，我们利用第一帧中给定的地面真实边界框作为强线索来增强目标对象的特征描述，以一种简单而有效的方式实现更准确的特性比较。具体来说，我们首先提出了BoxCloud，这是一种信息丰富且强大的表示方法，用于通过点到框的关系来描述对象。我们进一步设计了一个高效的框感知特征融合模块，利用上述BoxCloud进行可靠的特征匹配和嵌入。将所提出的通用组件集成到现有模型P2B中，我们构建了一个更优秀的框感知跟踪器（BAT）。实验证实，我们提出的BAT在KITTI和NuScenes基准测试中大幅超越了先前最先进的技术，精度提高了15.2%，同时运行速度提高了约20%。

单目标跟踪（SOT）在3D场景中有着广泛的实际应用，例如自动驾驶、语义理解以及辅助机器人技术。在第一帧中给定一个3D边界框（BBox）作为对象的模板，SOT任务是在所有帧中持续跟踪这个对象。在真实场景中，激光雷达因其精确的测量、合理的成本以及对环境光变化不敏感而成为流行的3D传感器。本文专注于激光雷达数据上的SOT，通常可以视为一般的3D点云。由于移动环境和自遮挡，由激光雷达系统生成的点云不可避免地是不规则和不完整的，这使得SOT任务非常具有挑战性

3.方法（简化版本，详细部分爱蒜法在文首论文链接）

3.1 Problem Statement

In 3D single object tracking, given the target BBox in the first frame (𝑥,𝑦,𝑧,𝑤,𝑙,ℎ,𝜃)∈𝑅7(x,y,z,w,l,h,θ)∈R7, the tracker must locate the same target across subsequent frames. We propose a box-aware tracker 𝑡𝑟𝑎𝑐𝑘𝑏𝑜𝑥−𝑎𝑤𝑎𝑟𝑒trackbox−aware that enhances feature matching by utilizing the target's BBox information, formalized as: 𝑡𝑟𝑎𝑐𝑘𝑏𝑜𝑥−𝑎𝑤𝑎𝑟𝑒:𝑅7×𝑅𝑁𝑡×3×𝑅𝑁𝑠×3→𝑅4trackbox−aware:R7×RNt×3×RNs×3→R4 𝑡𝑟𝑎𝑐𝑘𝑏𝑜𝑥−𝑎𝑤𝑎𝑟𝑒(𝐵𝑡,𝑃𝑡,𝑃𝑠)→(𝑥,𝑦,𝑧,𝜃)trackbox−aware(Bt,Pt,Ps)→(x,y,z,θ)

3.2 Box-Aware Tracker (BAT)

BAT is based on the P2B model, extracting features through PointNet++ and generating target proposals using VoteNet. The improved target-specific search area generation method is given by: 𝐹𝑠=BAFF(𝐶𝑡,𝐹𝑡,𝐹𝑠)Fs=BAFF(Ct,Ft,Fs) where 𝐶𝑡Ct is the BoxCloud of the template, and 𝐹𝑡Ft and 𝐹𝑠Fs are the features of the template and search area, respectively.

3.3 BoxCloud Representation

BoxCloud is defined by calculating the distances from each point 𝑝𝑖pi in the point cloud to the BBox corners and center, formalized as: 𝐶={𝑐𝑖∈𝑅9∣𝑐𝑖𝑗=∥𝑝𝑖−𝑞𝑗∥2,∀𝑗∈[1,9]}C={ci∈R9∣cij=∥pi−qj∥2,∀j∈[1,9]}

3.4 Box-Aware Feature Fusion (BAFF)

The BAFF module generates an enhanced target-specific search area through BoxCloud comparison and feature aggregation. BoxCloud comparison uses pairwise L2 distance: Dist∈𝑅𝑀1×𝑀2=Pairwise(𝐶𝑡,𝐶𝑠)Dist∈RM1×M2=Pairwise(Ct,Cs) Feature aggregation employs a mini-PointNet: 𝑓𝑖𝑠=MaxPool({𝑀𝐿𝑃([𝑝𝑡𝑗,𝑓𝑡𝑗,𝑐𝑡𝑗,𝑓𝑠𝑖])}𝑘𝑗=1)fis=MaxPool({MLP([ptj,ftj,ctj,fsi])}kj=1)

3.5 Implementation

During training, we generate training samples from consecutive frames to optimize the loss function 𝐿=𝐿𝑏𝑐+𝜆𝐿𝑟𝑝𝑛L=Lbc+λLrpn. In testing, we update the template and search area frame by frame, using the trained BAT for target tracking

3.1 问题陈述

在3D单目标跟踪中，给定第一帧的目标BBox (𝑥,𝑦,𝑧,𝑤,𝑙,ℎ,𝜃)∈𝑅7(x,y,z,w,l,h,θ)∈R7，跟踪器需在后续帧中定位同一目标。我们提出框感知跟踪器 𝑡𝑟𝑎𝑐𝑘𝑏𝑜𝑥−𝑎𝑤𝑎𝑟𝑒trackbox−aware，利用目标的BBox信息来增强特征匹配，形式化为： 𝑡𝑟𝑎𝑐𝑘𝑏𝑜𝑥−𝑎𝑤𝑎𝑟𝑒:𝑅7×𝑅𝑁𝑡×3×𝑅𝑁𝑠×3→𝑅4trackbox−aware:R7×RNt×3×RNs×3→R4 𝑡𝑟𝑎𝑐𝑘𝑏𝑜𝑥−𝑎𝑤𝑎𝑟𝑒(𝐵𝑡,𝑃𝑡,𝑃𝑠)→(𝑥,𝑦,𝑧,𝜃)trackbox−aware(Bt,Pt,Ps)→(x,y,z,θ)

3.2 框感知跟踪器（BAT）

BAT基于P2B模型，通过PointNet++提取特征，并使用VoteNet生成目标提案。改进的目标特定搜索区域生成方法为： 𝐹𝑠=BAFF(𝐶𝑡,𝐹𝑡,𝐹𝑠)Fs=BAFF(Ct,Ft,Fs) 其中 𝐶𝑡Ct 是模板的BoxCloud，𝐹𝑡Ft 和 𝐹𝑠Fs 分别是模板和搜索区域的特征。

3.3 BoxCloud表示

BoxCloud通过计算点云中每个点 𝑝𝑖pi 到BBox角和中心的距离来定义，形式化为： 𝐶={𝑐𝑖∈𝑅9∣𝑐𝑖𝑗=∥𝑝𝑖−𝑞𝑗∥2,∀𝑗∈[1,9]}C={ci∈R9∣cij=∥pi−qj∥2,∀j∈[1,9]}

3.4 框感知特征融合（BAFF）

BAFF模块通过BoxCloud比较和特征聚合来生成增强的目标特定搜索区域。BoxCloud比较使用逐点L2距离： Dist∈𝑅𝑀1×𝑀2=Pairwise(𝐶𝑡,𝐶𝑠)Dist∈RM1×M2=Pairwise(Ct,Cs) 特征聚合使用mini-PointNet： 𝑓𝑖𝑠=MaxPool({𝑀𝐿𝑃([𝑝𝑡𝑗,𝑓𝑡𝑗,𝑐𝑡𝑗,𝑓𝑠𝑖])}𝑘𝑗=1)fis=MaxPool({MLP([ptj,ftj,ctj,fsi])}kj=1)

3.5 实现

训练时，我们从连续帧生成训练样本，优化损失函数 𝐿=𝐿𝑏𝑐+𝜆𝐿𝑟𝑝𝑛L=Lbc+λLrpn。测试时，我们逐帧更新模板和搜索区域，使用训练好的BAT进行目标跟踪。

4.实验

BAT 与 KITTI（左）和 NuScenes（右）数据上的最新方法之间的性能比较表

We conduct extensive experiments on two widely-adopted datasets (i.e., KITTI and NuScenes) to validate the effectiveness of the proposed BAT method. Both datasets contain point clouds scanned by LiDAR sensors. For the KITTI dataset, we report the published results from corresponding papers. For the NuScenes dataset, since there are no published results, we evaluate SC3D and P2B using their official open-source codes. The left part of Table 1 records the results on the KITTI dataset, where BAT shows a significant performance gain over existing methods, outperforming P2B by over 10% on average. For the pedestrian category, BAT even achieves about 25% improvement over P2B. Furthermore, as shown in the right part of Table 1, BAT is consistently superior to the other two competitors on four main classes in NuScenes。

我们在两个广泛采用的数据集（即KITTI和NuScenes）上进行了广泛的实验，以验证所提出的BAT方法的有效性。这两个数据集都包含了由激光雷达传感器扫描的点云。对于KITTI数据集，我们报告了相应论文中发布的结果。对于NuScenes数据集，由于没有发布结果，我们使用它们的官方开源代码来评估SC3D和P2B。在KITTI数据集的左侧部分记录了结果，其中BAT在所有方法中表现出显著的性能提升，平均超过P2B超过10%。对于行人类别，BAT甚至比P2B提高了约25%。此外，如表1右侧部分所示，BAT在NuScenes的四个主要类别中始终优于其他两个竞争对手

BAT与P2B相比的优势案例

第一帧汽车上的点数不同 k 值的性能

5.结果和结论

In the experimental section, our method achieved significant results on the KITTI and NuScenes datasets. For the KITTI dataset, our method outperformed existing techniques in all categories with higher success and precision rates. Particularly in the pedestrian category, our method achieved about a 25% higher success rate compared to P2B. On the NuScenes dataset, despite the more complex scenes and a higher number of objects, our method still maintained its advantage in all major categories, demonstrating better robustness and accuracy.

This paper presents a novel box-aware feature enhancement method for single object tracking on point clouds. By introducing the BoxCloud representation and box-aware feature fusion module, our method demonstrates higher accuracy and robustness when dealing with sparse and incomplete shapes. The experimental results show that our method achieves significant performance improvements on both KITTI and NuScenes benchmarks, especially when handling sparse data. We believe that the BoxCloud representation provides a flexible and powerful tool for future target tracking tasks, with the potential to further enhance tracking performance.

在实验部分，我们的方法在KITTI和NuScenes数据集上取得了显著的成果。对于KITTI数据集，我们的方法在所有类别中都取得了比现有技术更高的成功率和精度。特别是在行人类别中，我们的方法比P2B提高了约25%的成功率。在NuScenes数据集上，尽管场景更加复杂，对象数量更多，我们的方法仍然在所有主要类别中保持了优势，显示出更好的鲁棒性和准确性

本文提出了一种新的基于框感知的特征增强方法，用于点云上的单目标跟踪。通过引入BoxCloud表示和框感知特征融合模块，我们的方法在处理稀疏和不完整形状时表现出更高的准确性和鲁棒性。实验结果表明，我们的方法在KITTI和NuScenes基准测试中均取得了显著的性能提升，特别是在处理稀疏数据时。我们相信，BoxCloud表示为未来的目标跟踪任务提供了一个灵活而强大的工具，有望进一步提升跟踪性能。