三维点云处理(7)——Deep Learning

最新推荐文章于 2024-05-07 19:39:05 发布

PhD.Prince

最新推荐文章于 2024-05-07 19:39:05 发布

阅读量476

点赞数 1

分类专栏：三维点云处理

本文链接：https://blog.csdn.net/weixin_44554973/article/details/107126979

版权

三维点云处理专栏收录该内容

8 篇文章 4 订阅

订阅专栏

Introduction

在这里插入图片描述

Perceptron Optimization (Neural Network)

A Perceptron: $y=w^Tx+b$ , Modify weight $w$ such that $\hat{y}$ gets ‘closer’ to $y$ .
在这里插入图片描述

Deep learning is trying to solve one problem:
$min_xf(x),y=f(x)=w^Tx+b$
It is a linear regression problem
$w=\arg\min_w\sum_{i=1}^M\frac{1}{2}(w^Tx_i-y_i)^2$
- Features $x_i\in\R^n$
- Ground truth $y_i\in\R$
Typical $A x = b$ problem, but we can solve it by Gradient Descent
$x^*=\arg\min_xF(x),x_{n+1}=x_n-\gamma\nabla{F(x_n)}$
- $\gamma$ is the step size
- $\nabla{F(x_n)}$ is the gradient of $F$ at $x_n$

Chain Rule

$f(x)=g(u),u=h(x)\rightarrow{f'(x)}=g'(u)h'(x)$

Matrix Calculus

在这里插入图片描述
Denominator layout:
$\frac{\partial\mathbf{y}}{\partial{x}}\in\R^{1\times{m}},\frac{\partial{y}}{\partial\mathbf{x}}=\R^n$
Numerator layout:
$\frac{\partial\mathbf{y}}{\partial{x}}\in\R^m,\frac{\partial{y}}{\partial\mathbf{x}}=\R^{1\times{n}}$
$\mathbf{x}\in\R^n,\mathbf{y}\in\R^m,\mathbf{X}\in\R^{n\times{m}},\mathbf{Y}\in\R^{m\times{n}}$
$\frac{\partial{y}}{\partial\mathbf{x}}= \left[\begin{array}{c} \frac{\partial{y}}{\partial{x_1}} \\ \frac{\partial{y}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{y}}{\partial{x_n}} \end{array}\right], \frac{\partial\mathbf{y}}{\partial{x}}= \left[\begin{array}{cccc} \frac{\partial{y_1}}{\partial{x}} & \frac{\partial{y_2}}{\partial{x}} & \cdots & \frac{\partial{y_n}}{\partial{x}} \end{array}\right]$
$\frac{\partial\mathbf{y}}{\partial\mathbf{x}}= \left[\begin{array}{cccc} \frac{\partial{y_1}}{\partial{x_1}} & \frac{\partial{y_2}}{\partial{x_1}} & \cdots & \frac{\partial{y_m}}{\partial{x_1}}\\ \frac{\partial{y_1}}{\partial{x_2}} & \frac{\partial{y_2}}{\partial{x_2}} & \cdots & \frac{\partial{y_m}}{\partial{x_2}}\\ \vdots & \vdots & \ddots &\vdots\\ \frac{\partial{y_1}}{\partial{x_n}} & \frac{\partial{y_2}}{\partial{x_n}} & \cdots & \frac{\partial{y_m}}{\partial{x_n}}\\ \end{array}\right]$
$\frac{\partial{y}}{\partial\mathbf{X}}= \left[\begin{array}{cccc} \frac{\partial{y}}{\partial{x_{11}}} & \frac{\partial{y}}{\partial{x_{12}}} & \cdots & \frac{\partial{y}}{\partial{x_{1m}}}\\ \frac{\partial{y}}{\partial{x_{21}}} & \frac{\partial{y}}{\partial{x_{22}}} & \cdots & \frac{\partial{y}}{\partial{x_{2m}}}\\ \vdots & \vdots & \ddots &\vdots\\ \frac{\partial{y}}{\partial{x_{n1}}} & \frac{\partial{y}}{\partial{x_{n2}}} & \cdots & \frac{\partial{y}}{\partial{x_{nm}}}\\ \end{array}\right], \frac{\partial\mathbf{Y}}{\partial{x}}= \left[\begin{array}{cccc} \frac{\partial{y_{11}}}{\partial{x}} &\frac{\partial{y_{21}}}{\partial{x}} & \cdots & \frac{\partial{y_{m1}}}{\partial{x}}\\ \frac{\partial{y_{12}}}{\partial{x}} &\frac{\partial{y_{22}}}{\partial{x}} & \cdots & \frac{\partial{y_{m2}}}{\partial{x}}\\ \vdots & \vdots & \ddots &\vdots\\ \frac{\partial{y_{1n}}}{\partial{x}} &\frac{\partial{y_{2n}}}{\partial{x}} & \cdots & \frac{\partial{y_{mn}}}{\partial{x}}\\ \end{array}\right]$
在这里插入图片描述

Loss

Regression

在这里插入图片描述

L1 Loss:
$l(\hat{y},y)=|\hat{y}-y|$
L2 Loss:
$l(\hat{y},y)=(\hat{y}-y)^2$

Classification

Cross Entroypy Loss ( Negative Log Softmax)
$H(p,q)=-\sum_{y_i}p(y_i)\log{q(y_i)}\Leftrightarrow{L}=H(p,q)=-\log\frac{e^{s_{i^*}}}{\Sigma_je^{s_j}},where\,p(y_{i^*})=1\\ \sum_{y_i}p(y_i)=1,\sum_{y_i}q(y_i)=1$
$p(y_i=1)$ is the ground-truth probability of category $i$
$q(y_i=1)$ is the predicted probability of category $i$

Activation Function (Non-linear)

Rectified Linear Unit (ReLU):
$\left\{ \begin{array}{cc} 0,for\,x\le0\\ x,for\,x>0 \end{array} \right.$
Exponential Linear Unit (ELU):
$\left\{ \begin{array}{cc} \alpha(e^x)-1,for\,x\le0\\ x,for\,x>0 \end{array} \right.$

Multi-Layer Perceptron (MLP)

A MLP with activation & $\ge1$ hidden layers is able to simulate ANY function $f (x)$ where $x$ is the input.
Back Propagation (Gradient Descent)
SGD OR Adam (Step Optimizer)

CNN

Features can be extracted in a local neighborhood.
VS MLP
1. Sparse connection vs. Dense
2. Weight sharing vs. Unique weights
3. Local invariant vs. Local variant :
  - Features should not depend on the location within the image
  - Make the same prediction no matter where is the object in the image
Output length $o$
$o=\left\lfloor\frac{n+p-k}{s}\right\rfloor+1$
- kernel size $k$ : unknown/trainable parameters
- Input length $n$ : receptive field
- Padding $p$
- Stride $s$ : Less compute & Increase receptive field

2D-CNN

在这里插入图片描述
$Y_{i,j}=\sum^{k_h-1}_{a=0}\sum^{k_w-1}_{b=0}X_{i*s_h+a,j*s_w+b}\times{W_{a,b}}$

Multiple features $\rightarrow$ Multiple kernels & Inputs
$Y_{l,i,j}=\sum^{c_i-1}_{d=0}\sum^{k_h-1}_{a=0}\sum^{k_w-1}_{b=0}X_{d,i+a,j+b}\times{W_{l,d,a,b}}+b_l,l\in[0,c_o)$
Parameter size $c_o\times{k_h}\times{k_w}$
Output shape $(c_o,o_h,o_w)=(c_o,\left\lfloor\frac{n_h+p_h-k_h}{s_h}\right\rfloor+1,\left\lfloor\frac{n_w+p_w-k_w}{s_w}\right\rfloor+1)$
Computation cost $O((c_o\times{k_h}\times{k_w})\times(o_h\times{o_w}))$

3D-CNN

在这里插入图片描述

Natural extension of 2D Convolution
Input: Each small cube contains $d$ features/channels
Kernel: Each small cube contains $d$ weights
Output: Each small cube is a scalar

Pooling (Matrix $\rightarrow$ Vector)

Aggregate information in each receptive field
- Max
- Average
No trainable parameter
Same padding/stride method

Deep Learning for Point Cloud

在这里插入图片描述

3D convolution
Multi-view projection onto images + 2D convolution
Simply run 1D/2D convolution or even MLP on point cloud (order)

VoxNet (3D convolution)¹

在这里插入图片描述

Content of each grid cell
- Binary
- Number of points
- Probability
- etc. (TSDF or TDF)
Accuracy on ModelNet40: 83%
Conv(o, k, s)
- o: number of kernels
- k: size of kernel (same for x/y/z)
- s: stride (same for x/y/z)

MVCNN (Multi-view)²

- ModelNet40 Accuracy 90.1%

Point CNN (MLP)

在这里插入图片描述

Activation function:
$h_W,b(x)=f(W^Tx)=f(\sum^3_{i=1}w_ix_i+b)=f(w_1x_1+w_2x_2+w_3x_3+b)$
Not Permutation Invariant
$f(w_1x_3+w_2x_1+w_3x_2+b)\ne{h_{W,b}(x)}$

PointNet³

在这里插入图片描述

shared MLP + max pool = PN
T-Net is a PointNet itself
Process each point (feature) independently $n\times{C_1}\rightarrow{n}\times{C_2}$
Use Max/Average to pool the features $n\times{C}\rightarrow{1}\times{C}$
Proof – PointNet is able to simulate any function on the point cloud
$|f(S)-\gamma(MAX(h(x_1),\cdots,h(x_n)))|<\epsilon$
- $\forall\epsilon>0,\exists{h}:\R^m\rightarrow\R^{m'},and\gamma:\R^n\rightarrow\R$ , $h\rightarrow$ Shared MLP, $\gamma\rightarrow$ MLP for global feature
- Input points $S=\{x_1,..,x_n\},x_i\in\R^m,x_i\in[0,1]$ ,Denote the space of $S$ as $\chi$ , i.e., $S\in\chi$
- A continuous function $f:\chi\rightarrow\R$
- MAX function: takes $n$ vectors, give element-wise maximum
- $h(\cdot)$ maps $x_i$ to the some deterministic position of a huge vector, By Voxel Grid Downsampling.
- MAX $(h(x_1),\cdots,h(x_n))$ simply builds a voxel grid representation, there will be lots of 0 elements because of empty cells in voxel grid.
- $\gamma(\cdot)=reconstruct\,the\,points+f(\cdot)$
Critical Points Set & Upper Bound Shape
Limitations of PointNet
- CNN has multiple, increasing receptive field
- PointNet has one receptive field – all points

Voxel Feature Encoding (VFE)

在这里插入图片描述

PointNet++

在这里插入图片描述

In each set abstraction:
- Sampling: FPS
  - Point #: $N_{i-1}\rightarrow{N_i}$
- Grouping:
  - Radius Neighbors ( + random sampling)
  - K Nearest Neighbors
- PointNet
  - Point #: $N_i$
  - Channel #: $C_{i-1}\rightarrow{C_i}$
  - Concatenate with coordinates so $d+C_{i-1}\rightarrow{C_i}$
  - Normalize point coordinate in the group
  - Centered with the Node
Multi-scale grouping (MSG)
- Multiple Grouping & PointNet
  - $r = 0.1$ grouping + PN
  - $r = 0.2$ grouping + PN
  - $r = 0.4$ grouping + PN
  - This is compute intensive
- Concatenate the multi-scale feature vectors
Multi-resolution grouping (MRG)
- Get features from previous level (previous previous level)
- Still increase compute
Interpolation
- Upsample the features from previous layer
- $x\in\R^3$ : point coordinates at the upsampled level, # = $N_1$
- $f\in\R^{C_2}$ : interpolated features
- $x_i\in\R^3$ : point coordinates at the previous level ( $N_2$ points)
- $w_i\in\R$ : reciprocal of distance $d(x,x_i)$
- $f_i\in{R^{C_2}}$ : point features at the previous level
  $f^{(j)}(x)=\frac{\Sigma^k_{i=1}w_i(x)f^{(j)}_i}{\Sigma^k_{i=1}w_i(x)}\quad{where}\quad{w_i(x)}=\frac{1}{d(x,x_i)^p},j=1,\cdots,C$
Data Augmentation
- Normalization: zero-mean & centered-with-node
- Input Point Dropout (DP)
- Gaussian Noise
- Rotation ( Less overfitting & Worse performance)

VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition, D Maturana, et.al. ↩︎
Multi-view Convolutional Neural Networks for 3D Shape Recognition, Hang Su et.al. ↩︎
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Cr Qi, et.al., CVPR 2017 ↩︎

PhD.Prince

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
三维点云处理(7)——Deep Learning

Deep LearningIntroductionPointNetPointNet++IntroductionExplain the principles of the operations of different layers and training algorithmsCompare different neural network architecturesImplement popular neural networksSolve problems using neural netw
复制链接

扫一扫