【转载】细嚼慢咽读论文：点云上采样GAN的实践——PU-GAN_pu-gan: a point cloud upsampling adversarial netwo-CSDN博客

论文标题：PU-GAN: a Point Cloud Upsampling Adversarial Network

标签：有监督 | 点云上采样

首先我们来分析一下文章题目：PU-GAN: a Point Cloud Upsampling Adversarial Network

PU即Point Upsampling，也就是本文要做的任务是点云上采样。关于点云上采样的介绍，我在介绍PU-Net的这篇文章中介绍过，可参考：

刘昕宸：细嚼慢咽读论文：点云上采样网络开天辟地PU-Netzhuanlan.zhihu.com

GAN即现在大名鼎鼎的GAN（生成对抗网络），也就是本文使用的网络是GAN，依赖GAN来实现点云的上采样。上采样任务其实也是一种生成式任务，因此很自然地想到可以使用GAN来尝试一下。关于GAN的基本原理介绍，可参考：

刘昕宸：通俗理解GAN（一）：把GAN给你讲得明明白白zhuanlan.zhihu.com

1 motivation

上采样的意义我在PU-Net那篇文章中详细介绍过：

点云处理任务存在极大挑战，很重要的一点是点云这种数据形式的稀疏性和不规则性。
而本文要做的上采样任务，正是为了解决点云数据稀疏性这一问题，为下游各种特征学习任务提供更“高质”的数据。
点云上采样任务，简单来说就是输入某一点云，生成保持基本形状的“更稠密”点云。

单就上采样效果而言，之前基于深度学习的方法如PU-Net、MPU在现实场景扫描点云上取得的效果均非常有限。我们来看看PU-GAN论文在开头放的图（Kitti数据集上测试）：

点云上采样本质也是一种生成式任务，在视觉领域做生成任务，自然而然地就会想到：不妨试试GAN？？

2 contribution

针对点云上采样任务，提出了GAN框架的解决方案，并且取得了非常好的效果。（原文指出：the difficulty to balance between the generator and discriminator and to avoid the tendency of poor convergence.）
局部网络结构设计非常有新意：比如up-down-up unit用来expand point features，self-attention unit用来feature integration quality等
设计了compound loss，特别是设计了用来约束上采样点云均匀分布的uniform loss，让人眼前一亮。
PU-GAN不仅在一般点云模型上做了实验，还在KITTI这样真实扫描的场景点云上做了上采样实验，依然取得了非常好的效果，这也进一步验证了PU-GAN强大的泛化能力。

3 solution

本文的目标就是上采样，也就是给定有N个点的稀疏点集 $P=\left \{ p_{i} \right \}_{i=1}^{N}$ ，我们期望生成有 rN 个点的稠密点集 $Q=\left \{ q_{i} \right \}_{i=1}^{rN}$ .

Q 并不需要是 P 的超集，但是需要满足以下2个条件：

Q 应该能够和 P 表达一样的underlying geometry of latent target object.
Q 内的点应该是在target object surface上均匀分布的，即使甚至输入 P 都是非均匀的。

PU-GAN的网络结构图如下所示：

因为是GAN，所以网络分成了Generator和Discriminator两部分。

Generator用于从稀疏点云 P 生成稠密点云 Q .

Discriminator用于区分真实稠密点云和generator生成的点云。

3.1 Generator

看出来了嘛，其实generator的整体框架还是PU-Net那一套：patch --> feature extraction --> feature expansion --> coordinate reconstruction ;-)

Generator全局代码：

class Generator(object):
    def __init__(self, opts,is_training, name="Generator"):
        self.opts = opts
        self.is_training = is_training
        self.name = name
        self.reuse = False
        self.num_point = self.opts.patch_num_point
        self.up_ratio = self.opts.up_ratio
        self.up_ratio_real = self.up_ratio + self.opts.more_up
        self.out_num_point = int(self.num_point*self.up_ratio)

    def __call__(self, inputs):
        with tf.variable_scope(self.name, reuse=self.reuse):

            features = ops.feature_extraction(inputs, scope='feature_extraction', is_training=self.is_training, bn_decay=None)

            H = ops.up_projection_unit(features, self.up_ratio_real, scope="up_projection_unit", is_training=self.is_training, bn_decay=None)

            coord = ops.conv2d(H, 64, [1, 1],
                               padding='VALID', stride=[1, 1],
                               bn=False, is_training=self.is_training,
                               scope='fc_layer1', bn_decay=None)

            coord = ops.conv2d(coord, 3, [1, 1],
                               padding='VALID', stride=[1, 1],
                               bn=False, is_training=self.is_training,
                               scope='fc_layer2', bn_decay=None,
                               activation_fn=None, weight_decay=0.0)
            outputs = tf.squeeze(coord, [2])



            outputs = gather_point(outputs, farthest_point_sample(self.out_num_point, outputs))
        self.reuse = True
        self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, self.name)
        return outputs

Ⅰ Patch Extraction

对每个3D mesh，在表面随机选择200个种子点，对每个种子点根据测地线距离生成patch，并将每个patch normalize到一个unit sphere中。

对每个patch，使用Poisson Disk Sampling生成 $\hat{Q}$ ，作为有 rN 个点的目标点云

我们动态地对 $\hat{Q}$ 随机采样 N 个点，生成输入点云 P .

Ⅱ Feature Extraction

本模块旨在提取point-wise feature：

输入点云 N*d （ d 包括点云的原始数据，坐标、颜色、法向量等， d 一般为 3 ），输出point-wise feature N*C

本模块直接借鉴了论文Patch-based progressive 3D point set upsampling的特征提取方法，使用了dense connection来集成不同层的特征。

网络结构如下，处理过程非常明晰了:

我们再来看看代码加深理解：

def feature_extraction(inputs, scope='feature_extraction2', is_training=True, bn_decay=None):
    with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):

        use_bn = False
        use_ibn = False
        growth_rate = 24

        dense_n = 3
        knn = 16
        comp = growth_rate*2
        l0_features = tf.expand_dims(inputs, axis=2)
        l0_features = conv2d(l0_features, 24, [1, 1],
                                     padding='VALID', scope='layer0', is_training=is_training, bn=use_bn, ibn=use_ibn,
                                     bn_decay=bn_decay, activation_fn=None)
        l0_features = tf.squeeze(l0_features, axis=2)

        # encoding layer
        l1_features, l1_idx = dense_conv(l0_features, growth_rate=growth_rate, n=dense_n, k=knn,
                                                  scope="layer1", is_training=is_training, bn=use_bn, ibn=use_ibn,
                                                  bn_decay=bn_decay)
        l1_features = tf.concat([l1_features, l0_features], axis=-1)  # (12+24*2)+24=84

        l2_features = conv1d(l1_features, comp, 1,  # 24
                                     padding='VALID', scope='layer2_prep', is_training=is_training, bn=use_bn, ibn=use_ibn,
                                     bn_decay=bn_decay)
        l2_features, l2_idx = dense_conv(l2_features, growth_rate=growth_rate, n=dense_n, k=knn,
                                                  scope="layer2", is_training=is_training, bn=use_bn, bn_decay=bn_decay)
        l2_features = tf.concat([l2_features, l1_features], axis=-1)  # 84+(24*2+12)=144

        l3_features = conv1d(l2_features, comp, 1,  # 48
                                     padding='VALID', scope='layer3_prep', is_training=is_training, bn=use_bn, ibn=use_ibn,
                                     bn_decay=bn_decay)  # 48
        l3_features, l3_idx = dense_conv(l3_features, growth_rate=growth_rate, n=dense_n, k=knn,
                                                  scope="layer3", is_training=is_training, bn=use_bn, bn_decay=bn_decay)
        l3_features = tf.concat([l3_features, l2_features], axis=-1)  # 144+(24*2+12)=204

        l4_features = conv1d(l3_features, comp, 1,  # 48
                                     padding='VALID', scope='layer4_prep', is_training=is_training, bn=use_bn, ibn=use_ibn,
                                     bn_decay=bn_decay)  # 48
        l4_features, l3_idx = dense_conv(l4_features, growth_rate=growth_rate, n=dense_n, k=knn,
                                                  scope="layer4", is_training=is_training, bn=use_bn, bn_decay=bn_decay)
        l4_features = tf.concat([l4_features, l3_features], axis=-1)  # 204+(24*2+12)=264

        l4_features = tf.expand_dims(l4_features, axis=2)

    return l4_features

核心dense_conv的实现：

def dense_conv(feature, n=3,growth_rate=64, k=16, scope='dense_conv',**kwargs):
    with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
        y, idx = get_edge_feature(feature, k=k, idx=None)  # [B N K 2*C]
        for i in range(n):
            if i == 0:
                y = tf.concat([
                    conv2d(y, growth_rate, [1, 1], padding='VALID', scope='l%d' % i, **kwargs),
                    tf.tile(tf.expand_dims(feature, axis=2), [1, 1, k, 1])], axis=-1)
            elif i == n-1:
                y = tf.concat([
                    conv2d(y, growth_rate, [1, 1], padding='VALID', scope='l%d' % i, activation_fn=None, **kwargs),
                    y], axis=-1)
            else:
                y = tf.concat([
                    conv2d(y, growth_rate, [1, 1], padding='VALID', scope='l%d' % i, **kwargs),
                    y], axis=-1)
        y = tf.reduce_max(y, axis=-2)
        return y, idx

Ⅲ Feature Expansion

和PU-Net一样，PU-GAN也设计了自己的feature expansion模块，这也应该是上采样算法的核心了吧

PU-Net的做法是 直接复制点的特征，然后使用不同的 MLP来分别独立处理各自的点特征备份。
即使PU-Net使用了诸如 repulsion loss这样的约束，但这种上采样方式仍然会导致 扩展的点特征过于接近彼此，影响上采样质量。

输入point-wise feature N*C ，输出 $rN*{C}'$

PU-GAN还设计了up-down-up expansion unit来增强特征扩展的效果，以实现enabling the generator to produce more diverse point distributions.

网络结构图如下所示，还包括了Up-feature operator和Down-feature operator的结构：

看代码是比较明晰的，点特征输入到Up-feature operator生成 $H_{0}$ ，再输入Down-feature operator将其降采样回 $L_{0}$ .

计算降采样点特征与原输入之间的difference $E_{0}$ .

输入 $E_{0}$ 到Up-feature operator生成 $H_{1}$ ，将 $H_{1}$ 作为 $H_{0}$ 的偏移量，得 $H_{2}=H_{0}+H_{1}$ .

def up_projection_unit(inputs,up_ratio,scope="up_projection_unit",is_training=True,bn_decay=None):
    with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
        L = conv2d(inputs, 128, [1, 1],
                               padding='VALID', stride=[1, 1],
                               bn=False, is_training=is_training,
                               scope='conv0', bn_decay=bn_decay)

        H0 = up_block(L,up_ratio,is_training=is_training,bn_decay=bn_decay,scope='up_0')

        L0 = down_block(H0,up_ratio,is_training=is_training,bn_decay=bn_decay,scope='down_0')
        E0 = L0-L
        H1 = up_block(E0,up_ratio,is_training=is_training,bn_decay=bn_decay,scope='up_1')
        H2 = H0+H1
    return H2

Up-feature operator：

不像PU-Net直接复制，PU-GAN在复制点特征时使用了grid结构（可参考FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation），这等价于在输入点附近增加一些新的点.

整合复制点特征使用了self attention机制

def up_block(inputs, up_ratio, scope='up_block', is_training=True, bn_decay=None):
    with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):
        net = inputs
        dim = inputs.get_shape()[-1]
        out_dim = dim*up_ratio
        grid = gen_grid(up_ratio)
        grid = tf.tile(tf.expand_dims(grid, 0), [tf.shape(net)[0], 1,tf.shape(net)[1]])  # [batch_size, num_point*4, 2])
        grid = tf.reshape(grid, [tf.shape(net)[0], -1, 1, 2])
            #grid = tf.expand_dims(grid, axis=2)

        net = tf.tile(net, [1, up_ratio, 1, 1])
        net = tf.concat([net, grid], axis=-1)

        net = attention_unit(net, is_training=is_training)

        net = conv2d(net, 256, [1, 1],
                                 padding='VALID', stride=[1, 1],
                                 bn=False, is_training=is_training,
                                 scope='conv1', bn_decay=bn_decay)
        net = conv2d(net, 128, [1, 1],
                          padding='VALID', stride=[1, 1],
                          bn=False, is_training=is_training,
                          scope='conv2', bn_decay=bn_decay)

    return net

1）grid机制

为每个feature-map copy生成一个唯一的2D vector，然后将该2D vector拼接给其对应feature-map copy内的每一个点。

因为该2D vector的存在，因此复制的点特征还是有些细微差别的。

def gen_grid(up_ratio):
    import math
    """
    output [num_grid_point, 2]
    """
    sqrted = int(math.sqrt(up_ratio))+1
    for i in range(1,sqrted+1).__reversed__():
        if (up_ratio%i) == 0:
            num_x = i
            num_y = up_ratio//i
            break
    grid_x = tf.lin_space(-0.2, 0.2, num_x)
    grid_y = tf.lin_space(-0.2, 0.2, num_y)

    x, y = tf.meshgrid(grid_x, grid_y)
    grid = tf.reshape(tf.stack([x, y], axis=-1), [-1, 2])  # [2, 2, 2] -> [4, 2]
    return grid

2）attention机制

def attention_unit(inputs, scope='attention_unit',is_training=True):
    with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
        dim = inputs.get_shape()[-1].value
        layer = dim//4
        f = conv2d(inputs,layer, [1, 1],
                              padding='VALID', stride=[1, 1],
                              bn=False, is_training=is_training,
                              scope='conv_f', bn_decay=None)

        g = conv2d(inputs, layer, [1, 1],
                            padding='VALID', stride=[1, 1],
                            bn=False, is_training=is_training,
                            scope='conv_g', bn_decay=None)

        h = conv2d(inputs, dim, [1, 1],
                            padding='VALID', stride=[1, 1],
                            bn=False, is_training=is_training,
                            scope='conv_h', bn_decay=None)


        s = tf.matmul(hw_flatten(g), hw_flatten(f), transpose_b=True)  # # [bs, N, N]

        beta = tf.nn.softmax(s, axis=-1)  # attention map

        o = tf.matmul(beta, hw_flatten(h))   # [bs, N, N]*[bs, N, c]->[bs, N, c]
        gamma = tf.get_variable("gamma", [1], initializer=tf.constant_initializer(0.0))

        o = tf.reshape(o, shape=inputs.shape)  # [bs, h, w, C]
        x = gamma * o + inputs

    return x

Down-feature operator：

Down结构比较简单：

对expanded features降采样，对特征reshape，然后使用一系列MLPs来拟合原特征

def down_block(inputs,up_ratio,scope='down_block',is_training=True,bn_decay=None):
    with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):
        net = inputs
        net = tf.reshape(net,[tf.shape(net)[0],up_ratio,-1,tf.shape(net)[-1]])
        net = tf.transpose(net, [0, 2, 1, 3])

        net = conv2d(net, 256, [1, up_ratio],
                                 padding='VALID', stride=[1, 1],
                                 bn=False, is_training=is_training,
                                 scope='conv1', bn_decay=bn_decay)
        net = conv2d(net, 128, [1, 1],
                          padding='VALID', stride=[1, 1],
                          bn=False, is_training=is_training,
                          scope='conv2', bn_decay=bn_decay)

    return net

Ⅳ Coordinate Reconstruction

最后是坐标重建：

coord = ops.conv2d(H, 64, [1, 1],
    padding='VALID', stride=[1, 1],
    bn=False, is_training=self.is_training,
    scope='fc_layer1', bn_decay=None)

coord = ops.conv2d(coord, 3, [1, 1],
    padding='VALID', stride=[1, 1],
    bn=False, is_training=self.is_training,
    scope='fc_layer2', bn_decay=None,
    activation_fn=None, weight_decay=0.0)

outputs = tf.squeeze(coord, [2])

3.2 Discriminator

Discriminator的目标是分辨上采样点云是否是Generator生成的

首先使用一个轻量的网络结构整合local和global信息，提取global feature

另外Discriminator还使用了self-attention unit来enhance the feature integration and improve the subsequent feature extraction capability

最后使用MLP和pooling得到了最后的confidence value，可以理解成是Discriminator判断输入上采样点云是真实上采样点云的可能性。

class Discriminator(object):
    def __init__(self, opts,is_training, name="Discriminator"):
        self.opts = opts
        self.is_training = is_training
        self.name = name
        self.reuse = False
        self.bn = False
        self.start_number = 32
        #print('start_number:',self.start_number)

    def __call__(self, inputs):
        with tf.variable_scope(self.name, reuse=self.reuse):
            inputs = tf.expand_dims(inputs,axis=2)
            with tf.variable_scope('encoder_0', reuse=tf.AUTO_REUSE):
                features = ops.mlp_conv(inputs, [self.start_number, self.start_number * 2])
                features_global = tf.reduce_max(features, axis=1, keep_dims=True, name='maxpool_0')
                features = tf.concat([features, tf.tile(features_global, [1, tf.shape(inputs)[1],1, 1])], axis=-1)
                features = ops.attention_unit(features, is_training=self.is_training)
            with tf.variable_scope('encoder_1', reuse=tf.AUTO_REUSE):
                features = ops.mlp_conv(features, [self.start_number * 4, self.start_number * 8])
                features = tf.reduce_max(features, axis=1, name='maxpool_1')

            with tf.variable_scope('decoder', reuse=tf.AUTO_REUSE):
                outputs = ops.mlp(features, [self.start_number * 8, 1])
                outputs = tf.reshape(outputs, [-1, 1])

        self.reuse = True
        self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, self.name)

        return outputs

3.3 Loss

因为GAN的原因，PU-GAN的loss设计得比较多，主要分为了Generator loss和Discriminator loss：

Ⅰdiscriminator loss

self.D_loss = discriminator_loss(self.D,self.input_y,self.G_y)

discriminator_loss只包括了adversarial loss.

discriminator_loss设计的adversarial loss很简单：

$\hat{Q}$ 是真实点云， Q 是generator生成的fake点云， D(Q) 表示判别器输出的confidence value.

def discriminator_loss(D, input_real, input_fake, Ra=False, gan_type='lsgan'):
    real = D(input_real)
    fake = D(input_fake)
    real_loss = tf.reduce_mean(tf.square(real - 1.0))
    fake_loss = tf.reduce_mean(tf.square(fake))

    loss = real_loss + fake_loss

    return loss

Ⅱ generator loss

self.dis_loss = self.opts.fidelity_w * pc_distance(self.G_y, self.input_y, radius=self.pc_radius)

if self.opts.use_repulse:
    self.repulsion_loss = self.opts.repulsion_w*get_repulsion_loss(self.G_y)
else:
    self.repulsion_loss = 0

self.uniform_loss = self.opts.uniform_w * get_uniform_loss(self.G_y)
self.pu_loss = self.dis_loss + self.uniform_loss + self.repulsion_loss + tf.losses.get_regularization_loss()

self.G_gan_loss = self.opts.gan_w*generator_loss(self.D,self.G_y)
self.total_gen_loss = self.G_gan_loss + self.pu_loss

generator loss包括了reconstruction loss，repulsion loss，uniform loss和adversarial loss.

1）adversarial loss

与上面Discriminator的adversarial loss基本类似：

def generator_loss(D,input_fake):
    fake = D(input_fake)

    fake_loss = tf.reduce_mean(tf.square(fake - 1.0))
    return fake_loss

2）reconstruction loss

PU-GAN默认使用的是EMD loss，EMD的详细理解见这篇文章：

刘昕宸：点云距离度量：完全解析EMD距离(Earth Mover's Distance)zhuanlan.zhihu.com

3）repulsion loss

repulsion loss设计来自PU-Net，想详细了解可参考这篇文章：

刘昕宸：细嚼慢咽读论文：点云上采样网络开天辟地PU-Netzhuanlan.zhihu.com

4）uniform loss

PU-GAN这一工作的一大贡献就是设计了uniform loss来控制生成点云分布的均匀性。

首先PU-Net设计了NUC这一评价指标来衡量生成点云的均匀性，但是这一评价忽视了local clutter of points，因此不宜再被采纳。

什么叫“忽视了local clutter of points”呢？

下面三个disk包含了相同数量的点（因此NUC都是一样的），但是它们的均匀程度显然是不同的。造成NUC衡量失效的原因，很大可能是局部点分布均匀程度NUC是无法刻画的。

而这里uniform loss的设计就是同时考虑了global和local！！！

第一项：

对于有 rN 个点的点集 Q （在实验中实际就是1个patch）：

step 1. 使用最远点采样（FPS）采样 M 个seed points

step 2. 以每个seed point为球心，使用半径为 $r_{d}$ 的ball query得到point subset $S_{j}, j = 1...M$ .

分析：

严格坐落在 Q 表面面积为 $\pi r_{d}^{2}$ 的local disk上。

还记得上面介绍的Ⅰ Patch Extraction嘛？我们通过测地线距离+正则化提取patch，因此patch就已经被我们normalize到一个unit sphere中了，patch的表面积是 $\pi1 ^{2}$

因此 $S_{j}$ 内expected percentage of points p 就应该是

并且 $S_{j}$ 内expected number of points $\hat{n}$ 就应该是 rNp 了

自然而然地，遵循chi-square model设计了uniform loss的第一项，用来衡量 $\left | S_{j} \right |$ 与 $\hat{n}$ 的偏差：

第二项：

考虑local point clutter，对 $S_{j}$ 中的每个点，找到其最近邻并计算距离 $d_{j,k}$ （ k 表示第 $S_{j}$ 中的第 k 个点）

想象一下，如果 $S_{j}$ 是均匀分布的，那么点与点之间的距离分布应该是这样的：

此时expected point-to-neighbor distance

自然而然地，再次遵循chi-square model，设计了uniform loss的第二项，用来衡量 $d_{j,k}$ [公式] 与 $\hat{d}$ 的偏差：

因此最终我们可得uniform loss：

程序实现：

def get_uniform_loss(pcd, percentages=[0.004,0.006,0.008,0.010,0.012], radius=1.0):
    B,N,C = pcd.get_shape().as_list()
    npoint = int(N * 0.05)
    loss=[]
    for p in percentages:
        nsample = int(N*p)
        r = math.sqrt(p*radius)
        disk_area = math.pi *(radius ** 2) * p/nsample
        #print(npoint,nsample)
        new_xyz = gather_point(pcd, farthest_point_sample(npoint, pcd))  # (batch_size, npoint, 3)
        idx, pts_cnt = query_ball_point(r, nsample, pcd, new_xyz)#(batch_size, npoint, nsample)

        #expect_len =  tf.sqrt(2*disk_area/1.732)#using hexagon
        expect_len = tf.sqrt(disk_area)  # using square

        grouped_pcd = group_point(pcd, idx)
        grouped_pcd = tf.concat(tf.unstack(grouped_pcd, axis=1), axis=0)

        var, _ = knn_point(2, grouped_pcd, grouped_pcd)
        uniform_dis = -var[:, :, 1:]
        uniform_dis = tf.sqrt(tf.abs(uniform_dis+1e-8))
        uniform_dis = tf.reduce_mean(uniform_dis,axis=[-1])
        uniform_dis = tf.square(uniform_dis - expect_len) / (expect_len + 1e-8)
        uniform_dis = tf.reshape(uniform_dis, [-1])

        mean, variance = tf.nn.moments(uniform_dis, axes=0)
        mean = mean*math.pow(p*100,2)
        #nothing 4
        loss.append(mean)
    return tf.add_n(loss)/len(percentages)

4 dataset and experiments

4.1 dataset

从PU-Net和MPU的数据集以及Visionair repository中挑选了147个模型，尽可能多地覆盖不同的类型。其中120个模型用于训练，27个用于测试。

训练数据的准备：因为PU-GAN是基于patch的，因此需要先对各个模型提取patch。在每个训练模型上提取200个patch，120个模型就一共能提取24000个patch用于训练，其中每个patch就是一个(input patch, groundtruth patch)的pair，input patch有256个点，groundtruth patch有1024个点。