【CVPR2020】3D目标检测论文汇总

文章目录

1. 3D目标检测——室外

1. Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection
  • 摘要

    Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques. Owing to the severe spatial occlusion and inherent variance of point density with the distance to sensors, appearance of a same object varies a lot in point cloud data. Designing robust feature representation against such appearance changes is hence the key issue in a 3D object detection method. In this paper, we innovatively propose a domain adaptation like approach to enhance the robustness of the feature representation. More specifically, we bridge the gap between the perceptual domain where the feature comes from a real scene and the conceptual domain where the feature is extracted from an augmented scene consisting of non-occlusion point cloud rich of detailed information. This domain adaptation approach mimics the functionality of the human brain when proceeding object perception. Extensive experiments demonstrate that our simple yet effective approach fundamentally boosts the performance of 3D point cloud object detection and achieves the state-of-the-art results.

    尽管最近的研究推动了深度学习技术的发展,但从3D点云进行目标检测仍然是一项艰巨的任务。由于严重的空间遮挡、点密度以及传感器之间距离的固有差异,因此点云数据中同一对象的外观变化很大。因此,针对这种外观变化设计鲁棒的特征表示是3D目标检测方法中的关键问题。在本文中,我们创新地提出了一种类似域自适应的方法,以增强特征表示的鲁棒性。 更具体地说,我们弥合了以下两个域的差异:1)感知域:特征来自真实场景;2)概念域:特征提取自一个由丰富的详细信息的非遮挡点云组成的增强场景。在进行对象感知时,这种域适应方法可模仿人脑的功能。大量的实验表明,我们简单而有效的方法从根本上提高了3D点云目标检测的性能,并获得了SOTA的结果。

  • 整体架构

2. Structure Aware Single-stage 3D Object Detection from Point Cloud
  • 摘要

    3D object detection from point cloud data plays an essential role in autonomous driving. Current single-stage detectors are efficient by progressively downscaling the 3D point clouds in a fully convolutional manner. However, the downscaled features inevitably lose spatial information and cannot make full use of the structure information of 3D point cloud, degrading their localization precision. In this work, we propose to improve the localization precision of single-stage detectors by explicitly leveraging the structure information of 3D point cloud. Specifically, we design an auxiliary network which converts the convolutional features in the backbone network back to point-level representations. The auxiliary network is jointly optimized, by two point-level supervisions, to guide the convolutional features in the backbone network to be aware of the object structure. The auxiliary network can be detached after training and therefore introduces no extra computation in the inference stage. Besides, considering that single-stage detectors suffer from the discordance between the predicted bounding boxes and corresponding classification confidences, we develop an efficient part-sensitive warping operation to align the confidences to the predicted bounding boxes. Our proposed detector ranks at the top of KITTI 3D/BEV detection leaderboards and runs at 25 FPS for inference.

    从点云数据检测3D目标在自动驾驶中起着至关重要的作用。当前的单级检测器以完全卷积的方式逐步缩小3D点云的尺寸,这非常有效。但是,缩小后的特征不可避免地会丢失空间信息,并且无法充分利用3D点云的结构信息,从而降低定位精度。在这项工作中,我们提出通过显式利用3D点云的结构信息来提高单级探测器的定位精度。具体来说,我们设计了一个辅助网络,该网络将骨干网中的卷积特征转换回点级(point-level)表示。辅助网络是通过两个点级别的监督共同优化的,以指导骨干网络中的卷积特征了解目标结构。辅助网络可以在训练后分离,因此在推理阶段不会引入额外的计算。 此外,考虑到单级检测器会遇到预测边界框与相应分类置信度之间的不一致的情况,我们开发了一种有效的部件敏感变形操作(part-sensitive warping operation),以将置信度与预测边界框对齐。我们提出的检测器排在KITTI 3D / BEV检测排行榜的顶部,以25 FPS的推理速度运行。

  • 整体架构
  • 结果

在这里插入图片描述

3. PnPNet: End-to-End Perception and Prediction with Tracking in the Loop
  • 摘要

    We tackle the problem of joint perception and motion forecasting in the context of self-driving vehicles. Towards this goal we propose PnPNet, an end-to-end model that takes as input sequential sensor data, and outputs at each time step object tracks and their future trajectories. The key component is a novel tracking module that generates object tracks online from detections and exploits trajectory level features for motion forecasting. Specifically, the object tracks get updated at each time step by solving both the data association problem and the trajectory estimation problem. Importantly, the whole model is end-to-end trainable and benefits from joint optimization of all tasks. We validate PnPNet on two large-scale driving datasets, and show significant improvements over the state-of-the-art with better occlusion recovery and more accurate future prediction.

    我们想要解决自动驾驶中的联合感知和运动预测问题。为了实现这一目标,我们提出了PnPNet,这是一个端到端模型,该模型将连续传感器数据作为输入,并在每个时间步长输出目标轨迹及其未来轨迹。关键组件是一个新颖的跟踪模块,该模块可根据检测结果在线生成目标跟踪轨迹,并利用轨迹级别的特征进行运动预测。具体而言,通过解决数据关联问题和轨迹估计问题,在每个时间步更新对象轨迹。 重要的是,整个模型是端到端可训练的,并且受益于所有任务的联合优化。我们在两个大型驾驶数据集上验证了PnPNet,并显示了与最新技术相比的显着改进,具有更好的遮挡恢复和更准确的未来预测。

4. DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes
  • 摘要

    We propose DOPS, a fast single-stage 3D object detection method for LIDAR data. Previous methods often make domain-specific design decisions, for example projecting points into a bird-eye view image in autonomous driving scenarios. In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes. The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes. 3D bounding box parameters are estimated in one pass for every point, aggregated through graph convolutions, and fed into a branch of the network that predicts latent codes representing the shape of each detected object. The latent shape space and shape decoder are learned on a synthetic dataset and then used as supervision for the end-toend training of the 3D object detection pipeline. Thus our model is able to extract shapes without access to groundtruth shape information in the target dataset. During experiments, we find that our proposed method achieves stateof-the-art results by ∼5% on object detection in ScanNet scenes, and it gets top results by 3.4% in the Waymo Open Dataset, while reproducing the shapes of detected cars.

    我们提出了DOPS,一种用于LIDAR数据的快速单阶段3D目标检测方法。先前的方法经常做出特定领域的设计决策,例如在自动驾驶场景中将点投影到鸟瞰图像中。相反,我们提出了一种适用于室内和室外场景的通用方法。我们方法的核心新颖之处在于快速的单次遍历(single-pass)体系结构,该体系结构既可以检测3D对象又可以估计其形状。 单次遍历对3D边界框参数的每个点进行估算,通过图卷积进行聚合,然后馈入网络的一个分支,该分支预测表示每个检测到的物体形状的潜在代码。在合成数据集上学习潜在的形状空间和形状解码器,然后将其用作3D目标检测管道的端到端训练的监督。 因此,我们的模型能够提取形状而无需访问目标数据集中的地面形状信息。在实验过程中,我们发现我们提出的方法在ScanNet场景中的目标检测方面,比SOTA结果提高了约5%,在Waymo Open Dataset中获得了比SOTA提高3.4%的最佳结果,同时再现了检测到的汽车的形状。

  • 整体架构

  • 结果

  • 论文: DOPS: Learning to Detect 3D Objects and Predict Their 3D Shapes

5. Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
  • 摘要

    In this paper, we propose a graph neural network to detect objects from a LiDAR point cloud. Towards this end, we encode the point cloud efficiently in a fixed radius near-neighbors graph. We design a graph neural network, named Point-GNN, to predict the category and shape of the object that each vertex in the graph belongs to. In Point-GNN, we propose an auto-registration mechanism to reduce translation variance, and also design a box merging and scoring operation to combine detections from multiple vertices accurately. Our experiments on the KITTI benchmark show the proposed approach achieves leading accuracy using the point cloud alone and can even surpass fusion-based algorithms. Our results demonstrate the potential of using the graph neural network as a new approach for 3D object detection.

    在本文中,我们提出了一种图神经网络来检测LiDAR点云中的物体。为此,我们在固定半径的近邻图中有效地编码了点云。我们设计了一个图神经网络,称为Point-GNN,以预测图中每个顶点所属的对象的类别和形状。在Point-GNN中,我们提出了一种自动注册机制来减少平移差异,并且还设计了一种框合并和计分操作,以准确地组合来自多个顶点的检测。 我们在KITTI基准上进行的实验表明,所提出的方法仅使用点云即可达到领先的准确性,甚至可以超越基于融合的算法。我们的结果证明了使用图神经网络作为3D目标检测的新方法的潜力。

  • 整体架构

  • 结果

  • 论文:Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud

  • 代码: ht

  • 2
    点赞
  • 24
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值