PU-Net: a Deep Learning Network application in 3D Point Cloud Upsampling

9 篇文章 0 订阅
2 篇文章 0 订阅

Brief

About 3D point cloud up-sampling from Xianzhi Li’s work:

Detail

Backgroud

Due to the sparseness and irregularity of the point cloud, learning a deep net on point cloud remains a challenging work. Recent work have been trying to accomplish upsampling based on some prior knowledge and assumption, also some external input and information such as normal vector. What’s more, some works trying to extract features directly from the point cloud are always running into problem of missing semantic information and getting a different shape of point cloud from the original one. Since semantic information can be captured through deep net, it came to authors’ mind that maybe they could bring about a break-through in point upsampling by using the deep net to extract features from target point cloud.

PU-Net

Challenges in learning features from point cloud with deep net:

  • How to prepared enough training dataset
  • How to expand the number of points
  • How to design loss function

Dataset Preparation

Due to there are only mesh data available, they then decide to create their training data from the those data:

  • As upsampling can be treated as a work on local region of a image, firstly they splits the mesh data into several separated parts, regrading each of which as a patch
  • Then, they transform mesh surface into dense point cloud through poisson disk sampling. By this they can obtain their ground truth
  • Last is to produce input from the ground truth. Since there are more than exacly one input corresponding, on-the-fly inputs are generated by randomly sampled from the ground truth point sets with a fixed downsampling rate r.

Figure.1 from mesh to densy point cloud as ground truth

With 40 mesh splitted into 1000 patches, a 4k-size training dataset is now available, with each patch consists of input and corresponding ground truth.

Expansion of Points in Point Cloud

First of all, we need to extract features from the local region in the point cloud, required to perform the extraction on each point in the cloud since local features is expected for solution of upsampling problem. They construct a network similar to PointNet++. As shown in the following figure, we can see that features are extracted from different levels of resolution of the point clouds generated by a exponential downsampling starting from the original one. In each layer, green points are generated by interpolation from nearest red points. Only the output of the last layer is accepted in PointNet++. However, as we have been mentioning above, local feature is required in upsampling problem, so in PU-Net they concat each feature map attained from each layer to produce the final output of this hierarchical feature learning network.

Figure.2 PU-Net hierarchical feature learning which is more helpful for upsampling task

The expansion is carried out in feature space. That means they do not directly expand points in the point cloud according to the features extracted, instead they expand feature map using different convolutional kernel in feature space, reshape and finally regress the 3D coordinates by a fully connected layer. The expansion is shown in the following picture:

Figure.3 Expansion in feature space from NxC to NxrC

The expansion operation can be represented as the following function:
f ′ = R S ( [ C 1 2 ( C 1 1 ( f ) ) , … , C r 2 ( C r 1 ( f ) ) ] ) f'=\mathcal{RS([C_{1}^{2}(C_{1}^{1}(f)),\dots,C_{r}^{2}(C_{r}^{1}(f))])} f=RS([C12(C11(f)),,Cr2(Cr1(f))])
in which:

  • R S \mathcal{RS} RS is the reshape function
  • C r 2 , C r 1 \mathcal{C_{r}^{2},C_{r}^{1}} Cr2,Cr1 represent the second time and first time convolution with kernel r alternatively

Note that two times of convolution is performed in order to break points’ correlation, 'cause points generated from the same feature map, although different convolutional kernel is applied, usually gather togather. It’s much better to use different convolutional kernel to perform a two-time convolution to ensure a much more uniform generation.

Construction of Loss Function

Two basic requirement:

  • points generated should be uniform
  • points generated should be informative and should not cluster

Two loss functions are designed to ensure satisfactory points’ distribution listed above.

The First one is call reconstruction loss using an Earth Movers Distance, namely EMD, which is famous for evaluating least distance to transform one distribution to another. By means of this evaluation, points generated will be engaged to be on the surface and outliers will be punished, gradually moving towards the surface through iterations. The loss function can be represented as follow:
L r e c = d E M D ( S p , s g t ) = min ⁡ Ø : S p → S g t ∑ x i ∈ S p ∣ ∣ x i − Ø ( x i ) ∣ ∣ 2 L_{rec}=d_{EMD}(S_{p},s_{gt})=\min_{\text\O:S_{p}\rightarrow S_{gt}}\sum_{x_{i}\in S_{p}}||x_{i}-\text\O(x_{i})||_{2} Lrec=dEMD(Sp,sgt)=Ø:SpSgtminxiSpxiØ(xi)2
with:

  • S p S_{p} Sp is the predicted point and S g t S_{gt} Sgt is the ground truth
  • Ø : S p → S g t \text\O:S_{p}\rightarrow S_{gt} Ø:SpSgt inddicates the bijection mapping

The second one is call repulsion loss which will punish those points clustering to ensure a much more uniform distribution. The loss function can be represented as follow:
L r e p = ∑ i = 0 N ^ ∑ i ′ ∈ K ( i ) η ( ∣ ∣ x i ′ − x i ∣ ∣ ) w ( ∣ ∣ x i ′ − x i ∣ ∣ ) L_{rep}=\sum_{i=0}^{\hat{N}}\sum_{i'\in K(i)}\eta(||x_{i'}-x_{i}||)w(||x_{i'}-x_{i}||) Lrep=i=0N^iK(i)η(xixi)w(xixi)
with:

  • N ^ \hat N N^ is the number of output points
  • K ( i ) K(i) K(i) are the k nearest point of x_{i}
  • repulsion term: η ( r ) = − r \eta(r)=-r η(r)=r
  • fast-decaying weight function: w ( r ) = e − r 2 / h 2 w(r)=e^{-r^{2}/h^{2}} w(r)=er2/h2, which will decrease over downsampling

Contribution

Their work is first to apply deep net in point cloud upsampling, which can capture much more features in the point cloud and present us a better solution compared with those traditional solution which directly extract features from point cloud.

Consideration

As Xinzhi Li showed in GAMES Webinar 120, she told us that PU-Net, although having performed well in upsampling point cloud generation, doesn’t have the capability of edge detection resulting in a tough surface on some regular object like legs of a chair. That’s why they came up with a new work in the same year called EC-Net, namely Edge-aware Point set Consolidation Network, which was accepted in ECCV 2018.

3D U-Net是一种经典的深度学习模型,用于处理三维(3D)医学图像分割任务,如CT扫描、MRI等。它结合了卷积神经网络(Convolutional Neural Network, CNN)的编码结构(下采样)和解码结构(上采样),形成了U形架构,因此得名。 U-Net的核心特点是它的跳跃连接(jump connections),这些连接允许模型保留高分辨率的空间信息,这对于需要精确定位的医疗图像分析至关重要。这种结构特别适合于像素级别的预测,如肿瘤检测或器官分割。 以下是使用Python(通常搭配Keras或PyTorch库)编写3D U-Net的一个简要示例: ```python # 使用Keras库 from keras.layers import Conv3D, MaxPooling3D, UpSampling3D, concatenate from keras.models import Model def create_u_net(input_shape=(None, None, None, channels)): inputs = Input(shape=input_shape) # 编码部分 conv1 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(inputs) pool1 = MaxPooling3D(pool_size=(2, 2, 2))(conv1) conv2 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(pool1) pool2 = MaxPooling3D(pool_size=(2, 2, 2))(conv2) # 中间层 encoded = conv2 # 解码部分 up1 = UpSampling3D(size=(2, 2, 2))(encoded) merge1 = concatenate([up1, conv2], axis=-1) conv3 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(merge1) conv4 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(conv3) up2 = UpSampling3D(size=(2, 2, 2))(conv4) merge2 = concatenate([up2, conv1], axis=-1) output_layer = Conv3D(channels, (1, 1, 1), activation='sigmoid')(merge2) model = Model(inputs=inputs, outputs=output_layer) return model ``` 请注意,实际应用中还需要对数据预处理、损失函数、优化器等进行配置,并可能需要进一步的训练和调整。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值