part-A^2文章学习

最新推荐文章于 2023-11-30 16:57:48 发布

飘向你的天空

最新推荐文章于 2023-11-30 16:57:48 发布

阅读量1.1k

点赞数 1

分类专栏：文章学习文章标签：文章学习点云目标检测

本文链接：https://blog.csdn.net/qq_18703017/article/details/100800194

版权

文章学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

part-A^2文章学习

文章引用
Abstract
Introduction
Related Work
Method
Experiments

文章引用

S. Shi, Z. Wang, X. Wang and H. Li. Part-A^2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud. arXiv preprint arXiv:1907.03670 2019.

Abstract

$Part-A^2$ 是 The Chinese university of Hong Kong与SenseTime推出的3维点云目标检测网络。
$Part-A^2$ 整个网络由part-aware与part-aggregation组成。part-aware阶段通过对3维GTboxes的无监督学习同时对 coarse 3D proposals、accurate intra-object part locations进行预测。在相同proposal中预测出的intra-object part locations经过文章新提出的RoI-aware point cloud pooling模块group处理后，随后在part-aggregation阶段，基于pooled part locations对框重新评定，优化框的位置。

Introduction

随着自主驾驶和机器人技术的飞速发展，基于激光雷达点云数据的三维目标检测越来越受到人们的重视。作者提到，虽然基于2D图像的目标检测取得了重大进展，然而，由于激光雷达获取的点云数据具有稀疏性、不规则性，将2D的目标检测算法直接拓展到3D目标检测并不能获取理想的效果。如何从不规则的点云数据中提取有利于任务的可分辨特征仍然是一个开放性、挑战性的课题。
针对此问题，现有的3D的目标检测算法主要存在三种大致的解决思路：

将3D点云投影到2D feature maps，通过2D卷积神经网络进行3D目标检测。
将无序的3D点云数据体素化，通过3D卷积神经网络或者3D稀疏卷积网络提取特征，进行3D目标检测。
将2D图像与3D点云进行联合处理。比如F-PointNet，现在2D图像检测出框，在通过映射关系找到相应点云数据，进而利用3D网络模型(比如Pointnet)进行3D网络框的回归。

作者指出，这些工作要么在投影和量化过程中受到信息丢失的影响，要么在很大程度上依赖于二维目标检测器的性能。
不同于通过BVmap和2D图像进行3D目标检测，Shi等通过前景点云分割的方式直接从点云中生成3D点云框，而分割任务的点云标签通过3D box annotations进行标记。（S. Shi, X. Wang, and H. Li. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition）。然而，作者随后发现3D box annotations不仅提供了分割mask，同时也可以获取3D框内所有点云的intra-object part locations。因为2D图像中目标物体经常存在遮挡，而在点云中不存在，所以这也是3D box annotations与2D box annotations相区别的一点。2D GT boxes对目标内每个像素生成的intra-object locations不够准确，而3D intra-object part locations是准确的、丰富的。然而，在3D目标检测任务中，3D intra-object part locations之前并未用到过。
基于此发现，作者提出了 $Part-A^2$ ，包括part-aware stage与part-aggregation stage。

Figure 1. Intra-object part locations and segmentation masks can be robustly predicted by the proposed part-aware and aggregation network even when objects are partially occluded. Such part locations can assist accurate 3D object detection. Best view in colors.

如图1所示，在 part-aware 阶段，网络将对所有前景点进行 intra-object part locations 评估。同样，GT part location annotations 和 segmentation masks 通过 GT 3D box annotations 生成。通过体素化与稀疏卷积进行点云特征的提取。同时，增加 region proposals 生成 3D proposal 为 part-aggrgation 进行 parts grouped。
针对给定3D proposal 内的所有点云，part-aggregation 应该能够通过学习这些点的 intra-object part locations 的空间关系，对此3D proposal 的质量进行评估，并且，对其进行微调，优化。基于此，作者提出了新的 RoI-aware point cloud pooling 模块，消除点云 region pooling 的 ambiguity。传统pooling操作针对所有的点或者非空体素，RoI-aware pooling针对 3D RoI 内的所有体素（both non-empty voxels and empty voxels）[暂时不是很明白，红色标记]，作者认为，空体素同样对box的几何信息有贡献，RoI-aware pooling 在 box score 与 location refinement任务中起到了关键作用。经过 RoI-aware pooling 后，网络通过稀疏卷积与池化对 part locations 的信息进行 aggregates。实验证明，the aggregated part features 能够显著提升3D预测框的效果。
文章主要贡献：

We propose a novel part-aware and aggregation neural network for 3D object detection from point cloud. With only the 3D box annotations as supervisions, our proposed method could predict the intra-object 3D part location accurately, which are then aggregated by our part-aggregation network to learn the spatial relationship between these parts for predicting accurate 3D object locations and confidences.
We present the differentiable RoI-aware point cloud pooling module to eliminate the ambiguity in point cloud region pooling by encoding the position-specific features of 3D proposals. The experiments show that the pooled feature representation benefits the part-aggregation stage significantly.
Our proposed part-aware and aggregation method out-performs all published methods with remarkable margins on the challenging 3D detection benchmark of KITTI dataset as of July 9, 2019, which demonstrates the effectiveness of our method.

Related Work

3D object detection from multiple sensors

一般而言，多传感器信息是指点云、图像。

paper	mainwork
X. Chen, H. Ma, J. Wan, B. Li, and T. Xia. Multi-view 3d object detection network for autonomous driving. In CVPR, 2017. J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. Waslander.Joint 3d proposal generation and object detection from view aggregation. IROS, 2018	将点云投影到BV feature map，分别对 BV feature map 与图像进行特征提取，通过将 3D proposal 投影到相应的 2D feature maps，对提取的特征进行裁剪、融合，进而用于 3D 检测。
M. Liang, B. Yang, S. Wang, and R. Urtasun. Deep continuous fusion for multi-sensor 3d object detection. In ECCV, 2018	进一步探索了特征融合算法
C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas. Frustum pointnets for 3d object detection from rgb-d data. arXiv preprint arXiv:1711.08488, 2017 D. Xu, D. Anguelov, and A. Jain. Pointfusion: Deep sensor fusion for 3d bounding box estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 244–253, 2018.	首先在 2D 图像中检测出 2D boxes，然后在点云场景中裁剪除相应的点云区域，再利用点云处理模型（如Pointnet）进行 3D boxes 的预测。

$Part-A^2$ 仅仅用到了点云信息，并未用到图像等其他的传感器信息。

3D object detection from point clouds only

paper	mainwork
Y. Zhou and O. Tuzel. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pages 4490–4499, 2018.	Voxelnet，通过体素特征编码（Voxel Feature Encoding，VFE）层进行特征的提取
Y. Yan, Y. Mao, and B. Li. Second: Sparsely embedded convolutional detection. Sensors, 18(10):3337, 2018.	在 Voxelnet 的基础上，引入了稀疏卷积，进行特征的提取
B. Yang, W. Luo, and R. Urtasun. Pixor: Real-time 3d object detection from point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7652–7660, 2018. B. Yang, M. Liang, and R. Urtasun. Hdnet: Exploiting hd maps for 3d object detection. In 2nd Conference on Robot Learning (CoRL), 2018. A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Bejbom. Pointpillars: Fast encoders for object detection from point clouds. CVPR, 2019.	将 3D 点云数据映射到 2D 空间，比如 BV map，随后通过 2D CNN 进行 3D 目标检测
S. Shi, X. Wang, and H. Li. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–779, 2019.	直接对原始点与进行处理，通过分割前景点云的方式来实现目标检测，分割的标签通过 3D GT box annotations 生成

作者在PointRCNN的基础上，通过 3D GT box annotations 对 intra-object part locations 进行预测，以此提升了 3D 目标检测的准确度。

Point cloud feature learning for 3D object detection

现有的一些基于点云数据的特征提取算法，以下三种最为常见：

将 3D 点云数据投影到 2D，利用 2D CNN 进行点云特征提取。
采用 PointNet 系列网络进行点云特征提取。
基于 Voxelnet 和稀疏卷积加速进行点云特征提取

Method

method
Figure 2. The overall framework of our part-aware and aggregation neural network for 3D object detection. It consists of two stages: (a) The first part-aware stage estimates intra-object part locations accurately and generates 3D proposals by feeding the raw point cloud to our newly designed backbone network. (b) The second part-aggregation stage conducts the proposed RoI-aware point cloud pooling operation to group the part information from each 3D proposal, then the part-aggregation network is utilized to score boxes and refine locations based on the part features and information.

文章的主要出发点应该是通过 3D GT boxes annotations 生成 intra-object part locations 和 segmentation mask（Our key observation is that, the ground-truth boxes of 3D object detection automatically provide accurate intra-object part locations and segmentation mask for each 3D point since objects in the 3D space are naturally separated.）

Learning to estimate intra-object part locations

Efficient point-wise feature learning via sparse convolution

前景点云的分割、3D intra-object part lications 的预测，需要提取到基于点的 discriminative 特征。在此，作者借鉴了 VoxelNet 与 Second 网络，首先对点云场景进行体素化，然后通过稀疏卷积网络对非空体素进行特征提取。以体素的中心点为新的点云，因为作者使用的体素尺寸（5cmX5cmX10cm）相对于场景比较小（70m×80m×4m），KITTI数据集中，每个场景大约16,000个点。自我感觉应该是与体素大小、传感器有关系[暂时不是很明白，红色标记]。
作者设计了类似 UNet 的网络结构用来提取点云点特征，如图2所示。通过三次步长为2的稀疏卷积将空间分辨率下降到8倍，每次稀疏卷积后有多层submanifold sparse convolutions。作者同时设计了一种基于稀疏操作、类似上采样的模块，对混合特征进行微调、优化以及节省计算量。如图3所示。

Figure 3. Sparse up-sampling and feature refinement block. This module is adopted in the decoder of our sparse convolution based UNet backbone. The lateral features and bottom features are first fused and transformed by sparse convolution. The fused feature is then up-sampled by the sparse inverse convolution.

Semantic segmentation and intra-object part location prediction

Intra-object part 信息是神经网络识别、检测目标的基础。作者拿车身以及车轮进行了举例说明。通过对每个点前景分割mask和intra-object part locations的学习、预测，神经网络能够对目标的形状、位姿惊醒推断，这对 3D 目标检测很有帮助。
图2中所示的基于稀疏卷积的UNet主要框架，添加了前景点分割与 intra-object part locations 预测的两个分支。在前景点分割的任务中，作者使用 focal loss 进行训练，GT-boxes 内的点为正样本，外部的点为负样本。
3D GT boxes 提供了 3D intra-object part location 的标签。作者将前景点（ $p_x,p_y,p_z$ ）的 part 标签用（ $O_x,O_y,O_z$ ）文中说 three continuous values，不知道怎么体现continous。[暂时不是很明白，红色标记]，表示相应目标内的相对位置。3D box 的表征方式（ $C_x,C_y,C_z,h,w,l,\theta$ ），3D part location 标签可表示为：
公式1
不知道到最后 $O_z$ 是不是写错了。公式不理解[暂时不是很明白，红色标记]。 $O_x,O_y,O_z\in[0,1]$ ，part locations 中心即为 $[0.5, 0.5, 0.5]$ 。这里所有的坐标 $z$ 轴垂直地面， $x, y$ 与水平面平行。
文章采用了二值交叉熵函数作为损失，沿着三个维度：
公式2
part location prediction 只适用于前景点。

3D proposal generation

为了集成前面已经预测到的 intra-object，文章生成了 3D proposals，来集成前景点中属于同一目标的 part 信息。如图二中所示，文章采用了 second 中的 RPN head，作用在稀疏编码生成的 feature map。另外作者指出，feature map 经过了8x的下采样，并且在BV图同一位置、不同高度的 features 集成之后生成了用于 3D proposal generation 的 2D BV feature map。不同高度集成不理解[暂时不是很明白，红色标记]。

RoI-aware point cloud feature pooling

在给定 predicted intra-object part locations 和 3D proposals 后，我们通过集成同一 proposal 内所有点的 part 信息对 box 进行打分及 proposal 的优化。

Figure 4. Illustration of RoI-aware point cloud feature pooling. Due to the ambiguity showed in the above BEV figure, We could not recover the original box shape by using previous point cloud pooling method. Our proposed RoI-aware point cloud pooling method could encode the box shape by keeping the empty voxels, which could be efficiently processed by following sparse convolution.

作者认为，在之前的 pointRCNN 工作中，第二阶段中的基于 proposal 内点的point cloud region pooling operation 存在歧义性。如图4所示，不同 3D proposal 经过 point cloud region pooling operation 会产生相同的结果，这对第二阶段的 refinement network 是不利的。
基于此，作者在本文章中提出 RoI-aware point cloud pooling，首先将每个 3D proposal 按空间分成 $\times W \times L$ 个体素（height, width, length），然后基于体素中的点特征做 aggregating 处理，处理方式可以是 max-pooling 或者 average-pooling。空的体素特征设置为0，并且标记为 empty。作者提到，RoI-aware pooling module 是 differentiable（可微的），使得整个网络可以进行端到端训练。
RoI-aware point cloud pooling module 将不同 3D proposals 规整到同一局部空间坐标系，每个体素对 3D proposal 内的固定位置特征进行编码。作者再一次提到这种做法是很有意义的，并且其池化后的特征对于后续的 box scoring 和 location refinement 是非常有效的。

Part location aggregation for 3D box refinement

通过 3D proposal 内所有点云的 intra-object part locations 的空间分布，能够对 3D proposal 的参数进行评估。作者指出，可以将其描述为一个优化问题，通过拟合相应 3D proposal 内的所有点云的 part locations 来确定 3D bounding box 的参数。然而，这种基于优化的方法对一些特异点以及 predicted part locations 非常敏感。
针对这个问题，文章提出了基于学习的 part location information 集成方法，对于每个 3D proposal，通过 RoI-aware point cloud pooling 分别处理 point-wise part locations（average-pooling）和 point-wise features（max-pooling），最终得到了两组feature maps（size为 $14 \times 14 \times14 \times 4$ 以及 $14 \times 14 \times14 \times C$ ），对于 predicted part location map，4-dim 为 $x, y, z$ 以及 foreground segmentation scores。
如图二所示，池化操作之后，通过一种分层方式的part-aggregation network 学习 predicted intra-object part locations 的空间分布。首先通过卷积核为 $\times 3 \times3$ 的稀疏卷积层将之前池化后的两组 feature maps 弄到相同维度。将其 concat 之后，通过4层卷积核为 $\times 3 \times3$ 的稀疏卷积将 part information 进行集成。第二层卷积层后作者还用稀疏最大池化进行降维操作，最终得到一个特征向量，而后紧跟 final box scoring 和 location refinement 两个分支。