论文标题:PU-GAN: a Point Cloud Upsampling Adversarial Network
标签:有监督 | 点云上采样
首先我们来分析一下文章题目:PU-GAN: a Point Cloud Upsampling Adversarial Network
PU即Point Upsampling,也就是本文要做的任务是点云上采样。关于点云上采样的介绍,我在介绍PU-Net的这篇文章中介绍过,可参考:
刘昕宸:细嚼慢咽读论文:点云上采样网络开天辟地PU-Netzhuanlan.zhihu.com
GAN即现在大名鼎鼎的GAN(生成对抗网络),也就是本文使用的网络是GAN,依赖GAN来实现点云的上采样。上采样任务其实也是一种生成式任务,因此很自然地想到可以使用GAN来尝试一下。关于GAN的基本原理介绍,可参考:
刘昕宸:通俗理解GAN(一):把GAN给你讲得明明白白zhuanlan.zhihu.com
1 motivation
上采样的意义我在PU-Net那篇文章中详细介绍过:
点云处理任务存在极大挑战,很重要的一点是点云这种数据形式的稀疏性和不规则性。
而本文要做的上采样任务,正是为了解决点云数据稀疏性这一问题,为下游各种特征学习任务提供更“高质”的数据。
点云上采样任务,简单来说就是输入某一点云,生成保持基本形状的“更稠密”点云。
单就上采样效果而言,之前基于深度学习的方法如PU-Net、MPU在现实场景扫描点云上取得的效果均非常有限。我们来看看PU-GAN论文在开头放的图(Kitti数据集上测试):
点云上采样本质也是一种生成式任务,在视觉领域做生成任务,自然而然地就会想到:不妨试试GAN??
2 contribution
- 针对点云上采样任务,提出了GAN框架的解决方案,并且取得了非常好的效果。(原文指出:the difficulty to balance between the generator and discriminator and to avoid the tendency of poor convergence.)
- 局部网络结构设计非常有新意:比如up-down-up unit用来expand point features,self-attention unit用来feature integration quality等
- 设计了compound loss,特别是设计了用来约束上采样点云均匀分布的uniform loss,让人眼前一亮。
- PU-GAN不仅在一般点云模型上做了实验,还在KITTI这样真实扫描的场景点云上做了上采样实验,依然取得了非常好的效果,这也进一步验证了PU-GAN强大的泛化能力。
3 solution
本文的目标就是上采样,也就是给定有N个点的稀疏点集 ,我们期望生成有 rN 个点的稠密点集
.
Q 并不需要是 P 的超集,但是需要满足以下2个条件:
- Q 应该能够和 P 表达一样的underlying geometry of latent target object.
- Q 内的点应该是在target object surface上均匀分布的,即使甚至输入 P 都是非均匀的。
PU-GAN的网络结构图如下所示:
因为是GAN,所以网络分成了Generator和Discriminator两部分。
Generator用于从稀疏点云 P 生成稠密点云 Q .
Discriminator用于区分真实稠密点云和generator生成的点云。
3.1 Generator
看出来了嘛,其实generator的整体框架还是PU-Net那一套:patch --> feature extraction --> feature expansion --> coordinate reconstruction ;-)
Generator全局代码:
class Generator(object):
def __init__(self, opts,is_training, name="Generator"):
self.opts = opts
self.is_training = is_training
self.name = name
self.reuse = False
self.num_point = self.opts.patch_num_point
self.up_ratio = self.opts.up_ratio
self.up_ratio_real = self.up_ratio + self.opts.more_up
self.out_num_point = int(self.num_point*self.up_ratio)
def __call__(self, inputs):
with tf.variable_scope(self.name, reuse=self.reuse):
features = ops.feature_extraction(inputs, scope='feature_extraction', is_training=self.is_training, bn_decay=None)
H = ops.up_projection_unit(features, self.up_ratio_real, scope="up_projection_unit", is_training=self.is_training, bn_decay=None)
coord = ops.conv2d(H, 64, [1, 1],
padding='VALID', stride=[1, 1],
bn=False, is_training=self.is_training,
scope='fc_layer1', bn_decay=None)
coord = ops.conv2d(coord, 3, [1, 1],
padding='VALID', stride=[1, 1],
bn=False, is_training=self.is_training,
scope='fc_layer2', bn_decay=None,
activation_fn=None, weight_decay=0.0)
outputs = tf.squeeze(coord, [2])
outputs = gather_point(outputs, farthest_point_sample(self.out_num_point, outputs))
self.reuse = True
self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, self.name)
return outputs
Ⅰ Patch Extraction
对每个3D mesh,在表面随机选择200个种子点,对每个种子点根据测地线距离生成patch,并将每个patch normalize到一个unit sphere中。
对每个patch,使用Poisson Disk Sampling生成 ,作为有 rN 个点的目标点云
我们动态地对 随机采样 N 个点,生成输入点云 P .
Ⅱ Feature Extraction
本模块旨在提取point-wise feature:
输入点云 N*d ( d 包括点云的原始数据,坐标、颜色、法向量等, d 一般为 3 ),输出point-wise feature N*C
本模块直接借鉴了论文Patch-based progressive 3D point set upsampling的特征提取方法,使用了dense connection来集成不同层的特征。
网络结构如下,处理过程非常明晰了:
我们再来看看代码加深理解:
def feature_extraction(inputs, scope='feature_extraction2', is_training=True, bn_decay=None):
with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):
use_bn = False
use_ibn = False
growth_rate = 24
dense_n = 3
knn = 16
comp = growth_rate*2
l0_features = tf.expand_dims(inputs, axis=2)
l0_features = conv2d(l0_features, 24, [1, 1],
padding='VALID', scope='layer0', is_training=is_training, bn=use_bn, ibn=use_ibn,
bn_decay=bn_decay, activation_fn=None)
l0_features = tf.squeeze(l0_features, axis=2)
# encoding layer
l1_features, l1_idx = dense_conv(l0_features, growth_rate=growth_rate, n=dense_n, k=knn,
scope="layer1", is_training=is_training, bn=use_bn, ibn=use_ibn,
bn_decay=bn_decay)
l1_features = tf.concat([l1_features, l0_features], axis=-1) # (12+24*2)+24=84
l2_features = conv1d(l1_features, comp, 1, # 24
padding='VALID', scope='layer2_prep', is_training=is_training, bn=use_bn, ibn=use_ibn,
bn_decay=bn_decay)
l2_features, l2_idx = dense_conv(l2_features, growth_rate=growth_rate, n=dense_n, k=knn,
scope="layer2", is_training=is_training, bn=use_bn, bn_decay=bn_decay)
l2_features = tf.concat([l2_features, l1_features], axis=-1) # 84+(24*2+12)=144
l3_features = conv1d(l2_features, comp, 1, # 48
padding='VALID', scope='layer3_prep', is_training=is_training, bn=use_bn, ibn=use_ibn,
bn_decay=bn_decay) # 48
l3_features, l3_idx = dense_conv(l3_features, growth_rate=growth_rate, n=dense_n, k=knn,
scope="layer3", is_training=is_training, bn=use_bn, bn_decay=bn_decay)
l3_features = tf.concat([l3_features, l2_features], axis=-1) # 144+(24*2+12)=204
l4_features = conv1d(l3_features, comp, 1, # 48
padding='VALID', scope='layer4_prep', is_training=is_training, bn=use_bn, ibn=use_ibn,
bn_decay=bn_decay) # 48
l4_features, l3_idx = dense_conv(l4_features, growth_rate=growth_rate, n=dense_n, k=knn,
scope="layer4", is_training=is_training, bn=use_bn, bn_decay=bn_decay)
l4_features = tf.concat([l4_features, l3_features], axis=-1) # 204+(24*2+12)=264
l4_features = tf.expand_dims(l4_features, axis=2)
return l4_features
核心dense_conv的实现:
def dense_conv(feature, n=3,growth_rate=64, k=16, scope='dense_conv',**kwargs):
with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
y, idx = get_edge_feature(feature, k=k, idx=None) # [B N K 2*C]
for i in range(n):
if i == 0:
y = tf.concat([
conv2d(y, growth_rate, [1, 1], padding='VALID', scope='l%d' % i, **kwargs),
tf.tile(tf.expand_dims(feature, axis=2), [1, 1, k, 1])], axis=-1)
elif i == n-1:
y = tf.concat([
conv2d(y, growth_rate, [1, 1], padding='VALID', scope='l%d' % i, activation_fn=None, **kwargs),
y], axis=-1)
else:
y = tf.concat([
conv2d(y, growth_rate, [1, 1], padding='VALID', scope='l%d' % i, **kwargs),
y], axis=-1)
y = tf.reduce_max(y, axis=-2)
return y, idx
Ⅲ Feature Expansion
和PU-Net一样,PU-GAN也设计了自己的feature expansion模块,这也应该是上采样算法的核心了吧
PU-Net的做法是 直接复制点的特征,然后使用不同的 MLP来分别独立处理各自的点特征备份。
即使PU-Net使用了诸如 repulsion loss这样的约束,但这种上采样方式仍然会导致 扩展的点特征过于接近彼此,影响上采样质量。
输入point-wise feature N*C ,输出
PU-GAN还设计了up-down-up expansion unit来增强特征扩展的效果,以实现enabling the generator to produce more diverse point distributions.
网络结构图如下所示,还包括了Up-feature operator和Down-feature operator的结构:
看代码是比较明晰的,点特征输入到Up-feature operator生成 ,再输入Down-feature operator将其降采样回
.
计算降采样点特征与原输入之间的difference .
输入 到Up-feature operator生成
,将
作为
的偏移量,得
.
def up_projection_unit(inputs,up_ratio,scope="up_projection_unit",is_training=True,bn_decay=None):
with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
L = conv2d(inputs, 128, [1, 1],
padding='VALID', stride=[1, 1],
bn=False, is_training=is_training,
scope='conv0', bn_decay=bn_decay)
H0 = up_block(L,up_ratio,is_training=is_training,bn_decay=bn_decay,scope='up_0')
L0 = down_block(H0,up_ratio,is_training=is_training,bn_decay=bn_decay,scope='down_0')
E0 = L0-L
H1 = up_block(E0,up_ratio,is_training=is_training,bn_decay=bn_decay,scope='up_1')
H2 = H0+H1
return H2
Up-feature operator:
不像PU-Net直接复制,PU-GAN在复制点特征时使用了grid结构(可参考FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation),这等价于在输入点附近增加一些新的点.
整合复制点特征使用了self attention机制
def up_block(inputs, up_ratio, scope='up_block', is_training=True, bn_decay=None):
with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):
net = inputs
dim = inputs.get_shape()[-1]
out_dim = dim*up_ratio
grid = gen_grid(up_ratio)
grid = tf.tile(tf.expand_dims(grid, 0), [tf.shape(net)[0], 1,tf.shape(net)[1]]) # [batch_size, num_point*4, 2])
grid = tf.reshape(grid, [tf.shape(net)[0], -1, 1, 2])
#grid = tf.expand_dims(grid, axis=2)
net = tf.tile(net, [1, up_ratio, 1, 1])
net = tf.concat([net, grid], axis=-1)
net = attention_unit(net, is_training=is_training)
net = conv2d(net, 256, [1, 1],
padding='VALID', stride=[1, 1],
bn=False, is_training=is_training,
scope='conv1', bn_decay=bn_decay)
net = conv2d(net, 128, [1, 1],
padding='VALID', stride=[1, 1],
bn=False, is_training=is_training,
scope='conv2', bn_decay=bn_decay)
return net
1)grid机制
为每个feature-map copy生成一个唯一的2D vector,然后将该2D vector拼接给其对应feature-map copy内的每一个点。
因为该2D vector的存在,因此复制的点特征还是有些细微差别的。
def gen_grid(up_ratio):
import math
"""
output [num_grid_point, 2]
"""
sqrted = int(math.sqrt(up_ratio))+1
for i in range(1,sqrted+1).__reversed__():
if (up_ratio%i) == 0:
num_x = i
num_y = up_ratio//i
break
grid_x = tf.lin_space(-0.2, 0.2, num_x)
grid_y = tf.lin_space(-0.2, 0.2, num_y)
x, y = tf.meshgrid(grid_x, grid_y)
grid = tf.reshape(tf.stack([x, y], axis=-1), [-1, 2]) # [2, 2, 2] -> [4, 2]
return grid
2)attention机制
def attention_unit(inputs, scope='attention_unit',is_training=True):
with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
dim = inputs.get_shape()[-1].value
layer = dim//4
f = conv2d(inputs,layer, [1, 1],
padding='VALID', stride=[1, 1],
bn=False, is_training=is_training,
scope='conv_f', bn_decay=None)
g = conv2d(inputs, layer, [1, 1],
padding='VALID', stride=[1, 1],
bn=False, is_training=is_training,
scope='conv_g', bn_decay=None)
h = conv2d(inputs, dim, [1, 1],
padding='VALID', stride=[1, 1],
bn=False, is_training=is_training,
scope='conv_h', bn_decay=None)
s = tf.matmul(hw_flatten(g), hw_flatten(f), transpose_b=True) # # [bs, N, N]
beta = tf.nn.softmax(s, axis=-1) # attention map
o = tf.matmul(beta, hw_flatten(h)) # [bs, N, N]*[bs, N, c]->[bs, N, c]
gamma = tf.get_variable("gamma", [1], initializer=tf.constant_initializer(0.0))
o = tf.reshape(o, shape=inputs.shape) # [bs, h, w, C]
x = gamma * o + inputs
return x
Down-feature operator:
Down结构比较简单:
对expanded features降采样,对特征reshape,然后使用一系列MLPs来拟合原特征
def down_block(inputs,up_ratio,scope='down_block',is_training=True,bn_decay=None):
with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):
net = inputs
net = tf.reshape(net,[tf.shape(net)[0],up_ratio,-1,tf.shape(net)[-1]])
net = tf.transpose(net, [0, 2, 1, 3])
net = conv2d(net, 256, [1, up_ratio],
padding='VALID', stride=[1, 1],
bn=False, is_training=is_training,
scope='conv1', bn_decay=bn_decay)
net = conv2d(net, 128, [1, 1],
padding='VALID', stride=[1, 1],
bn=False, is_training=is_training,
scope='conv2', bn_decay=bn_decay)
return net
Ⅳ Coordinate Reconstruction
最后是坐标重建:
coord = ops.conv2d(H, 64, [1, 1],
padding='VALID', stride=[1, 1],
bn=False, is_training=self.is_training,
scope='fc_layer1', bn_decay=None)
coord = ops.conv2d(coord, 3, [1, 1],
padding='VALID', stride=[1, 1],
bn=False, is_training=self.is_training,
scope='fc_layer2', bn_decay=None,
activation_fn=None, weight_decay=0.0)
outputs = tf.squeeze(coord, [2])
3.2 Discriminator
Discriminator的目标是分辨上采样点云是否是Generator生成的
首先使用一个轻量的网络结构整合local和global信息,提取global feature
另外Discriminator还使用了self-attention unit来enhance the feature integration and improve the subsequent feature extraction capability
最后使用MLP和pooling得到了最后的confidence value,可以理解成是Discriminator判断输入上采样点云是真实上采样点云的可能性。
class Discriminator(object):
def __init__(self, opts,is_training, name="Discriminator"):
self.opts = opts
self.is_training = is_training
self.name = name
self.reuse = False
self.bn = False
self.start_number = 32
#print('start_number:',self.start_number)
def __call__(self, inputs):
with tf.variable_scope(self.name, reuse=self.reuse):
inputs = tf.expand_dims(inputs,axis=2)
with tf.variable_scope('encoder_0', reuse=tf.AUTO_REUSE):
features = ops.mlp_conv(inputs, [self.start_number, self.start_number * 2])
features_global = tf.reduce_max(features, axis=1, keep_dims=True, name='maxpool_0')
features = tf.concat([features, tf.tile(features_global, [1, tf.shape(inputs)[1],1, 1])], axis=-1)
features = ops.attention_unit(features, is_training=self.is_training)
with tf.variable_scope('encoder_1', reuse=tf.AUTO_REUSE):
features = ops.mlp_conv(features, [self.start_number * 4, self.start_number * 8])
features = tf.reduce_max(features, axis=1, name='maxpool_1')
with tf.variable_scope('decoder', reuse=tf.AUTO_REUSE):
outputs = ops.mlp(features, [self.start_number * 8, 1])
outputs = tf.reshape(outputs, [-1, 1])
self.reuse = True
self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, self.name)
return outputs
3.3 Loss
因为GAN的原因,PU-GAN的loss设计得比较多,主要分为了Generator loss和Discriminator loss:
Ⅰdiscriminator loss
self.D_loss = discriminator_loss(self.D,self.input_y,self.G_y)
discriminator_loss只包括了adversarial loss.
discriminator_loss设计的adversarial loss很简单:
是真实点云, Q 是generator生成的fake点云, D(Q) 表示判别器输出的confidence value.
def discriminator_loss(D, input_real, input_fake, Ra=False, gan_type='lsgan'):
real = D(input_real)
fake = D(input_fake)
real_loss = tf.reduce_mean(tf.square(real - 1.0))
fake_loss = tf.reduce_mean(tf.square(fake))
loss = real_loss + fake_loss
return loss
Ⅱ generator loss
self.dis_loss = self.opts.fidelity_w * pc_distance(self.G_y, self.input_y, radius=self.pc_radius)
if self.opts.use_repulse:
self.repulsion_loss = self.opts.repulsion_w*get_repulsion_loss(self.G_y)
else:
self.repulsion_loss = 0
self.uniform_loss = self.opts.uniform_w * get_uniform_loss(self.G_y)
self.pu_loss = self.dis_loss + self.uniform_loss + self.repulsion_loss + tf.losses.get_regularization_loss()
self.G_gan_loss = self.opts.gan_w*generator_loss(self.D,self.G_y)
self.total_gen_loss = self.G_gan_loss + self.pu_loss
generator loss包括了reconstruction loss,repulsion loss,uniform loss和adversarial loss.
1)adversarial loss
与上面Discriminator的adversarial loss基本类似:
def generator_loss(D,input_fake):
fake = D(input_fake)
fake_loss = tf.reduce_mean(tf.square(fake - 1.0))
return fake_loss
2)reconstruction loss
PU-GAN默认使用的是EMD loss,EMD的详细理解见这篇文章:
刘昕宸:点云距离度量:完全解析EMD距离(Earth Mover's Distance)zhuanlan.zhihu.com
3)repulsion loss
repulsion loss设计来自PU-Net,想详细了解可参考这篇文章:
刘昕宸:细嚼慢咽读论文:点云上采样网络开天辟地PU-Netzhuanlan.zhihu.com
4)uniform loss
PU-GAN这一工作的一大贡献就是设计了uniform loss来控制生成点云分布的均匀性。
首先PU-Net设计了NUC这一评价指标来衡量生成点云的均匀性,但是这一评价忽视了local clutter of points,因此不宜再被采纳。
什么叫“忽视了local clutter of points”呢?
下面三个disk包含了相同数量的点(因此NUC都是一样的),但是它们的均匀程度显然是不同的。造成NUC衡量失效的原因,很大可能是局部点分布均匀程度NUC是无法刻画的。
而这里uniform loss的设计就是同时考虑了global和local!!!
第一项:
对于有 rN 个点的点集 Q (在实验中实际就是1个patch):
step 1. 使用最远点采样(FPS)采样 M 个seed points
step 2. 以每个seed point为球心,使用半径为 的ball query得到point subset
.
分析:
严格坐落在 Q 表面面积为 的local disk上。
还记得上面介绍的Ⅰ Patch Extraction嘛?我们通过测地线距离+正则化提取patch,因此patch就已经被我们normalize到一个unit sphere中了,patch的表面积是
因此 内expected percentage of points p 就应该是
并且 内expected number of points
就应该是 rNp 了
自然而然地,遵循chi-square model设计了uniform loss的第一项,用来衡量 与
的偏差:
第二项:
考虑local point clutter,对 中的每个点,找到其最近邻并计算距离
( k 表示第
中的第 k 个点)
想象一下,如果 是均匀分布的,那么点与点之间的距离分布应该是这样的:
此时expected point-to-neighbor distance
自然而然地,再次遵循chi-square model,设计了uniform loss的第二项,用来衡量 与
的偏差:
因此最终我们可得uniform loss:
程序实现:
def get_uniform_loss(pcd, percentages=[0.004,0.006,0.008,0.010,0.012], radius=1.0):
B,N,C = pcd.get_shape().as_list()
npoint = int(N * 0.05)
loss=[]
for p in percentages:
nsample = int(N*p)
r = math.sqrt(p*radius)
disk_area = math.pi *(radius ** 2) * p/nsample
#print(npoint,nsample)
new_xyz = gather_point(pcd, farthest_point_sample(npoint, pcd)) # (batch_size, npoint, 3)
idx, pts_cnt = query_ball_point(r, nsample, pcd, new_xyz)#(batch_size, npoint, nsample)
#expect_len = tf.sqrt(2*disk_area/1.732)#using hexagon
expect_len = tf.sqrt(disk_area) # using square
grouped_pcd = group_point(pcd, idx)
grouped_pcd = tf.concat(tf.unstack(grouped_pcd, axis=1), axis=0)
var, _ = knn_point(2, grouped_pcd, grouped_pcd)
uniform_dis = -var[:, :, 1:]
uniform_dis = tf.sqrt(tf.abs(uniform_dis+1e-8))
uniform_dis = tf.reduce_mean(uniform_dis,axis=[-1])
uniform_dis = tf.square(uniform_dis - expect_len) / (expect_len + 1e-8)
uniform_dis = tf.reshape(uniform_dis, [-1])
mean, variance = tf.nn.moments(uniform_dis, axes=0)
mean = mean*math.pow(p*100,2)
#nothing 4
loss.append(mean)
return tf.add_n(loss)/len(percentages)
4 dataset and experiments
4.1 dataset
从PU-Net和MPU的数据集以及Visionair repository中挑选了147个模型,尽可能多地覆盖不同的类型。其中120个模型用于训练,27个用于测试。
训练数据的准备:因为PU-GAN是基于patch的,因此需要先对各个模型提取patch。在每个训练模型上提取200个patch,120个模型就一共能提取24000个patch用于训练,其中每个patch就是一个(input patch, groundtruth patch)的pair,input patch有256个点,groundtruth patch有1024个点。
4.2 metrics
评价指标包括了4类:
- point-to-surface (P2F) distance
- Chamfer distance (CD)
- Hausdorff distance (HD)
- uniform metrics: 评估方法类同上面的uniform loss
4.3 experiments
- 定量比较
2. 定性比较
3. 真实扫描场景的上采样
针对KITTI数据集:
4. 消融实验
5 conclusion
PU-GAN基本延续了PU-Net的思路,但是在诸多细节上均做了非常明显的改进(gridding, self-attention, dense connection, up-down-up, uniform loss等),也确实取得了更加强大的效果。
更值得关注的是,PU-GAN还在真实扫描点云数据集Kitti上做了实验,仍然取得了不错的效果,这一方面既证明了PU-GAN网络的泛化能力,另一方面也说明了PU-GAN具有非常大的实际使用价值,这一点是非常重要的。