PointPillars: Fast Encoders for Object Detection from Point Clouds
CVPR 2019
点云中的目标检测是许多机器人应用程序(例如自动驾驶)的重要部分。在本文中,我们考虑将点云编码为适合下游检测pipeline的格式。 最近的文献提出了两种类型的编码器。 固定编码器趋向于快速但牺牲准确性,而从数据中学习的编码器则更准确,但速度较慢。 在这项工作中,我们提出了PointPillars,这是一种新颖的编码器,它利用PointNets学习在vertical columns (pillars)中的点云的表示形式。 尽管编码的特征可以与任何标准的2D卷积检测网络一起使用,但我们进一步提出了精益的下游网络。 大量的实验表明,就速度和准确性而言,PointPillars的性能均优于以前的编码器。 尽管仅使用了激光雷达,但在3D和BEV KITTI benchmarks中,我们的pipeline甚至优于fusion methods。以62 Hz的频率运行时可达到这种检测性能:运行时间的性能提高了2-4倍。 我们方法的更快版本可以达到105 Hz。这些benchmarks表明PointPillars是点云目标检测中合适的编码方式。
问题
作者认为3D conv会使推理速度变慢。浪费计算资源。
创新
- We propose a novel point cloud encoder and network, PointPillars, that operates on the point cloud to enable end-to-end training of a 3D object detection network.
- We show how all computations on pillars can be posed as dense 2D convolutions which enables inference at 62 Hz; a factor of 2-4 times faster than other methods.
- We conduct experiments on the KITTI dataset and demonstrate state of the art results on cars, pedestrians,and cyclists on both BEV and 3D benchmarks.
- We conduct several ablation studies to examine the key factors that enable a strong detection performance.
数据集
-
KITTI数据扩充:
First, following SECOND [2], we create a lookup table of the ground truth 3D boxes for all classes and the associated point clouds that falls inside these 3D boxes. Then for each sample, we randomly select 15; 0; 8 ground truth samples for cars, pedestrians, and cyclists respectively and place them into the current point cloud. We found these settings to perform better than the proposed settings.
Next, all ground truth boxes are individually augmented. Each box is rotated (uniformly drawn from [-π/20,π/20]) and translated (x, y, and z independently drawn from N(0, 0.25)) to further enrich the training set.
Finally, we perform two sets of global augmentations that are jointly applied to the point cloud and all boxes.First, we apply random mirroring flip along the x axis, then a global rotation and scaling. Finally, we apply a global translation with x, y, z drawn from N(0, 0.2) to simulate localization noise.
网络结构
(1)将点云转换成sparse pseudo image的encoder;
(2)2D卷积主干,将 pseudo image处理为high-level representation;
(3)检测头用于回归3D boxes(SSD 1)
LOSS
与SECOND 2的相同。
实验结果
链接
引用
[1] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. SSD: Single shot multibox detector. In
ECCV, 2016
[2] Y. Yan, Y. Mao, and B. Li. SECOND: Sparsely embedded convolutional detection. Sensors, 18(10), 2018.