1. Motivation
当时大多使用handcraft或者3D cnn来做点云的研究,本文提出使用deep learning来对点云做3D的分类以及分割。
Most existing features for point cloud are handcrafted towards specific tasks
由于点云是不规则的数据,因此之前的研究将点云数据转换为通常的3D体素网格或图像集合,然后在送入神经网络结构。但是会存在使得数据变大,同时会影响数据的自然不变性。
This data representation transfor- mation, however, renders the resulting data unnecessarily voluminous — while also introducing quantization artifacts that can obscure natural invariances of the data.
对于点云定义的基础理解:
点云的表示:NxD,N代表N unordered points,每一个points有D dim vector(eg D=3 for x,y,z axis or more dimension for color(RGB 3维), 法向量等。
点云具有的性质:平移对称性以及转换不变性。
-
permutation invariance: point cloud is a set of unordered points.
-
transformation invariance: point cloud rotations should not alter classification results.
点云任务中的Classification, Part Segmentation, Semantic Segmentation:
2. Introduction and Related Work
2.1 Introduction
-
PointNet要完成的任务概括为,对于分类来说,是对于所有的输入的整体进行分类(相当于2D的图片分类问题);对于part segment/ segment来说,是对于输出的每一个点的类别进行划分per point segment/part label。
Our PointNet is a unified architecture that directly takes point clouds as input and outputs either class labels for the entire input or per point segment/part labels for each point of the input
-
basis setting的每一个point仅仅包含三维坐标(x,y,z)。还有一些额外信息如下:
In the basic setting each point is represented by just its three coordinates (x, y, z).
Additional dimensions may be added by by computing normals and other local or global features
-
对于本文方法的关键之处在于使用了对称函数,max pooling
Key to our approach is the use of a single symmetric function, max pooling.
-
将点云数据喂入PointNet之前,本文添加了一个依赖数据的框架转换器来处理数据,进行规范化,以进一步改善结果。
2.2 Related Work
-
Point Cloud Features
Most existing features for point cloud are handcrafted towards specific tasks
-
Deep Learning on 3D Data
Volumetric CNNs: [28, 17, 18] are the pioneers applying 3D convolutional neural networks on voxelized shapes.
由于3D卷积的计算开销较大,以及数据的稀疏性,输入数据受到分辨率的限制。
However, volumetric representation is constrained by its resolution due to data sparsity and computation cost of 3D convolution.
3. Contribution
-
本文提出了PointNet。
We design a novel deep net architecture suitable for consuming unordered point sets in 3D.
-
可以应用于多个点云任务中。
We show how such a net can be trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks.
-
对本文的理论进行了分析与证明(对称性)
We provide thorough empirical and theoretical analysis on the stability and efficiency of our method.
-
We illustrate the 3D features computed by the selected neurons in the net and develop intuitive explanations for its performance.
4. Method
4.1. Properties of Point Sets in R n R^n Rn
点云集合的输入来自于Euclidean space,有三个主要性质:
-
Unordered(相当于第一张图所示,对于N个点云的vector数据排列,可以有N!的排列组合方式)
In other words, a network that consumes N 3D point sets needs to be invariant to N! permutations of the input set in data feeding order.
-
Interaction among points
-
Invariance under transformations.
这点与CNN类似,学习到的点云表示不会被变换操作所影响。对于分类任务以及分割任务,都可以正确的判断出类别。
对于第一个性质,直观的想法就是构造对称函数,例如(mean,maxpool 等与数据的分布无关)。作者将低维的特征D通过MLP 映射为高维特征,这么做的目的在于使用冗余的高维信息来避免点云信息的丢失,保留足够的点云信息,通过另一个网络γ来进一步提取点云的特征。
因此,使用g以及γ与h组成一个表达式,如果g是对称的那么f(x)就是对称的。
4.2 PointNet Architecture
如图2所示,表示PointNet Architecture。PointNet网络的输入是 a set of 3D points { P i ∣ i = 1 , . . . , n } \{P_i | i=1,...,n \} {Pi∣i=1,...,n}, P i P_i Piis a vector of ( x , y , z ) (x,y,z) (x,y,z)以及额外的feature channels。
PointNet的网络有3个核心的模块:
-
the max pooling layer as a symmetric function to aggregate information from all the points
-
a local and global information combination structure
-
two joint alignment networks that align both input points and point features.
PointNet网络的输出分为分类和分割任务2种输出:
-
对于分类的任务的输出 K K K:
Our proposed deep network outputs k scores for all the k candidate classes.
-
对于语义分割任务的输出 n × m n \times m n×m:
Our model will output n × m scores for each of the n points and each of the m semantic sub- categories.
对称函数:
作者把mlp作为函数h,将g由variable function和max pooling function组成。
- Local and Global Information Aggregation
对于分割网络来说,它是要预测每一个点属于某个类,作者在视频所说,将local和global的信息进行融合,也就是Nx64和1024的global feature生成 n x 1088相当于一种检索操作。
- Joint Alignment Network
为了能够得到点云的transform invariance,作者使用了简单的affine transformation matrix。
We predict an affine transformation matrix by a mini-network (T-net in Fig 2) and directly apply this transformation to the coordinates of input points.
在附录中,给出了变换的mini-pointnet的结构:
It’s composed of a shared MLP(64, 128, 1024) network (with layer output sizes 64, 128, 1024) on each point, a max pooling across points and two fully connected layers with output sizes 512, 256.
接着输出得到2个变换矩阵,3x3以及64x64的矩阵,然后和Nx3以及Nx64进行简单的矩阵乘法计算。
作者希望在优化过程中,使得feature transformation matrix尽可能接近orthogonal matrix:
在图5中可以得出这样子优化使得网络的性能会提高。