PointPillars

PointPillars: Fast Encoders for Object Detection from Point Clouds

CVPR 2019


​ 点云中的目标检测是许多机器人应用程序(例如自动驾驶)的重要部分。在本文中,我们考虑将点云编码为适合下游检测pipeline的格式。 最近的文献提出了两种类型的编码器。 固定编码器趋向于快速但牺牲准确性,而从数据中学习的编码器则更准确,但速度较慢。 在这项工作中,我们提出了PointPillars,这是一种新颖的编码器,它利用PointNets学习在vertical columns (pillars)中的点云的表示形式。 尽管编码的特征可以与任何标准的2D卷积检测网络一起使用,但我们进一步提出了精益的下游网络。 大量的实验表明,就速度和准确性而言,PointPillars的性能均优于以前的编码器。 尽管仅使用了激光雷达,但在3D和BEV KITTI benchmarks中,我们的pipeline甚至优于fusion methods。以62 Hz的频率运行时可达到这种检测性能:运行时间的性能提高了2-4倍。 我们方法的更快版本可以达到105 Hz。这些benchmarks表明PointPillars是点云目标检测中合适的编码方式。

问题

作者认为3D conv会使推理速度变慢。浪费计算资源。

创新

  • We propose a novel point cloud encoder and network, PointPillars, that operates on the point cloud to enable end-to-end training of a 3D object detection network.
  • We show how all computations on pillars can be posed as dense 2D convolutions which enables inference at 62 Hz; a factor of 2-4 times faster than other methods.
  • We conduct experiments on the KITTI dataset and demonstrate state of the art results on cars, pedestrians,and cyclists on both BEV and 3D benchmarks.
  • We conduct several ablation studies to examine the key factors that enable a strong detection performance.

数据集

  • KITTI数据扩充:

    ​ First, following SECOND [2], we create a lookup table of the ground truth 3D boxes for all classes and the associated point clouds that falls inside these 3D boxes. Then for each sample, we randomly select 15; 0; 8 ground truth samples for cars, pedestrians, and cyclists respectively and place them into the current point cloud. We found these settings to perform better than the proposed settings.
    ​ Next, all ground truth boxes are individually augmented. Each box is rotated (uniformly drawn from [-π/20,π/20]) and translated (x, y, and z independently drawn from N(0, 0.25)) to further enrich the training set.
    ​ Finally, we perform two sets of global augmentations that are jointly applied to the point cloud and all boxes.First, we apply random mirroring flip along the x axis, then a global rotation and scaling. Finally, we apply a global translation with x, y, z drawn from N(0, 0.2) to simulate localization noise.

网络结构

在这里插入图片描述

(1)将点云转换成sparse pseudo image的encoder;

(2)2D卷积主干,将 pseudo image处理为high-level representation;

(3)检测头用于回归3D boxes(SSD 1)

LOSS

与SECOND 2的相同。

实验结果

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

链接

项目地址

引用

[1] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. SSD: Single shot multibox detector. In
ECCV, 2016

[2] Y. Yan, Y. Mao, and B. Li. SECOND: Sparsely embedded convolutional detection. Sensors, 18(10), 2018.

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值