Paper Reading: PointNet (Analysis + Coding)

Content

Background

Contribution

 PointNet

主要解决的问题

PointNet结构

Baseline

Symmetry Function:

Local and Global Information Aggregation:

Alignment Network:

Loss的计算

Experiment

分类表现 

分割表现

自测表现

Coding

Training

T-Net

Classification

Segmentation

Reference


Background

典型的卷积架构需要高度规则的输入格式,比如Image grids和3D voxels(为了权重共享和内核优化)。所以,之前大部分的研究都需要首先将点云转换为常规的3D voxels grids和Collections of images。这样不得会有大量的工作量。PointNet的出现直接对点云使用深度学习的方法,解决了这个问题。

对于PointNet的进一步提升,详情见PointNet++(Analysis & Coding)

Contribution

  • 设计了一种使用于三维空间中的消耗无需点集的深度网络结构;
  • 将PointNet使用于Classification,Segmentation以及Scene semantic parsing;
  • 对该方法的稳定性和有效性做了深入的论证;

 PointNet

主要解决的问题

  • 点云的无序性:因为点云本身是没有顺序的,而且点云中的点无论怎么变化都不会影响点云的整体结构。然而使用N个3D点云的网络需要保持N的不变。PointNet引入了Symmetry Function解决了这个问题;
  • 点和点之间的相互作用:点云不是孤立存在的,只有一个点和它相邻的点在一起才能生成有意义的点集。因此点云的深度网络需要不仅提取局部特征,还要提取全局特征。PointNet引入了Loacal and global feature aggregation的方法解决了这个问题。
  • 点云的旋转不变性:因为点云的空间结构,学习到的点集应该对于某些转换是不变的。比如点云的旋转和平移不能改变全局点云的分类和分割。PointNet引入了Alignment Network解决了这个问题。

PointNet结构

Baseline

PointNet过程

  1. 对点云进行flap,rotation, scaling等操作后,作为输入点云;
  2. Point_Encoder
    1. 输入点云大小为n*3的Tensor,之后经过3*3的T-Net将点云和规范化,也就是将点云经过了一个相同的网络变换到了统一的空间里。其实T-Net就是一个小的point-net,在网络中T-Net有两个,一个是input transform,另一个是feature transform。input的T-NET是旋转出一个易于分割的角度,feature的T-NET是将特征对齐;
    2. 之后经过了一系列的mlp和T-Net得到了最终的n*1024特征。
    3. 使用max_pool,这也文章的亮点,使用max-pool作为对称函数解决了点云的无序性,也就是无论点云以什么样子的顺序输入,max-pool之后的结果都一样;
    4. 对于Seg网络来说,使用了全局特征和局部特征的结合,也就是将max-pool之前和之后的做了拼接(concat),这里global*64就可以和局部特征拼接;
  3. Point-Decoder: 用于各种下游任务;

Symmetry Function

为了解决点云的排列不变性,作者在此提出了3种解决方法:

  1. 将输入的点云排序
  2. 将输入作为一个序列训练RNN
  3. 使用一个简单的堆成函数

对于1来说,因为在高维空间中并不存在一个稳定的排列顺序。如果这种排列顺序存在,那么会要求该映射在维度降低时保持空间接近。因此无法实现。

对于2来说,RNN对于短小的序列具有很好的鲁棒性,但是点云一般都是上千个输入元素,因此不能使用RNN

对于3来说,可以通过对集合中的元素变换后的元素应用对称函数近似定义在点集上的一般函数。PointNet中使用了这个方法。在PointNet中,构建了一个如下的对称函数:

 在此堆成函数中想要让f(x)近似等于g(x)。在实现的过程中,h(x)通过mlp来实现,g(x)通过Max_pool实现。以下是对于三种方法的结果对比:

Local and Global Information Aggregation:

点云的分割不仅仅需要局部的特征,而且还需要全局的特征。在PointNet中将Max_Pool之前和之后的数据进行了拼接并且用拼接的数据继续提取每个点的特征。这样保证了每个点的特征同时又感知局部和全局的信息。具体的凭借过程入Baseline中的下图所示:

Alignment Network:

点云经过一些刚体变换,点云的结构和形状是不发生改变的。为了保证点云经过一定的集合变换,点云的语义标注时不变的,作者加入了Alignment Network,保证了特种空间的对齐。在PoingNet中,作者在点的特征上插入另一个对其网络(T-Net),并预测一个特征变换矩阵来对其来自不同输入的点云特征。然而,因为特征空间中的转换矩阵相比空间转换矩阵要维度高很多,因此不好优化。所以在Training_loss中加入一个正则优化项约束一下,将特征变化矩阵约束为接近正交矩阵。公式如下,其中A为T-Net的特征对其矩阵:

其中T-Net的Baseline如下:

在T-Net中进行了两项操作。首先先将原始点云作为输入,将数据从[32, 1024, 1]变为[32, 256]。其中在3*3和64*64的Transform中经历了mlp(64,128,1024) + Max_pooling + Full_connencted(512, 256)。之后又加入了旋转和平移的偏置。在3*3中将数据由[32, 256]变为了[32, 3, 3]。在64*64中将数据由[32, 256]变为了[32, 64, 64]。详细见代码。

对于加入T-Net的表现如下: 

Loss的计算

多分类交叉熵计算函数+Alignment Network中的

Experiment

分类表现 

分割表现

自测表现

 我自己在Colab上跑了一下分类的结果,如下图所示,一共跑了250个epoch,使用的Tesla-P100:

Coding

Training

import argparse
import math
from cv2 import mean
import h5py
import numpy as np
import tensorflow as tf
import socket
import importlib
import os
import sys

import matplotlib.pyplot as plt
from IPython import display
BASE_DIR = os.path.dirname(os.path.abspath(__file__))      #获取文件路径
sys.path.append(BASE_DIR)
sys.path.append(os.path.join(BASE_DIR, 'models'))
sys.path.append(os.path.join(BASE_DIR, 'utils'))
import provider
import tf_util





parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=int, default=0, help='GPU to use [default: GPU 0]')
parser.add_argument('--model', default='pointnet_cls', help='Model name: pointnet_cls or pointnet_cls_basic [default: pointnet_cls]')
parser.add_argument('--log_dir', default='log', help='Log dir [default: log]')
parser.add_argument('--num_point', type=int, default=1024, help='Point Number [256/512/1024/2048] [default: 1024]')
parser.add_argument('--max_epoch', type=int, default=250, help='Epoch to run [default: 250]')
parser.add_argument('--batch_size', type=int, default=32, help='Batch Size during training [default: 32]')
parser.add_argument('--learning_rate', type=float, default=0.001, help='Initial learning rate [default: 0.001]')
parser.add_argument('--momentum', type=float, default=0.9, help='Initial learning rate [default: 0.9]')
parser.add_argument('--optimizer', default='adam', help='adam or momentum [default: adam]')
parser.add_argument('--decay_step', type=int, default=200000, help='Decay step for lr decay [default: 200000]')
parser.add_argument('--decay_rate', type=float, default=0.7, help='Decay rate for lr decay [default: 0.8]')
FLAGS = parser.parse_args()


BATCH_SIZE = FLAGS.batch_size
NUM_POINT = FLAGS.num_point
MAX_EPOCH = FLAGS.max_epoch
BASE_LEARNING_RATE = FLAGS.learning_rate         #Basic Learning rate
GPU_INDEX = FLAGS.gpu
MOMENTUM = FLAGS.momentum
OPTIMIZER = FLAGS.optimizer
DECAY_STEP = FLAGS.decay_step
DECAY_RATE = FLAGS.decay_rate

MODEL = importlib.import_module(FLAGS.model)     # import network module
MODEL_FILE = os.path.join(BASE_DIR, 'models', FLAGS.model+'.py')
LOG_DIR = FLAGS.log_dir
if not os.path.exists(LOG_DIR): os.mkdir(LOG_DIR)
os.system('cp %s %s' % (MODEL_FILE, LOG_DIR))    # bkp of model def    copy
os.system('cp train.py %s' % (LOG_DIR))          # bkp of train procedure
LOG_FOUT = open(os.path.join(LOG_DIR, 'log_train.txt'), 'w')
LOG_FOUT.write(str(FLAGS)+'\n')

MAX_NUM_POINT = 2048
NUM_CLASSES = 40

BN_INIT_DECAY = 0.5
BN_DECAY_DECAY_RATE = 0.5
BN_DECAY_DECAY_STEP = float(DECAY_STEP)
BN_DECAY_CLIP = 0.99

HOSTNAME = socket.gethostname()

# ModelNet40 official train/test split
TRAIN_FILES = provider.getDataFiles( \
    os.path.join(BASE_DIR, 'data/modelnet40_ply_hdf5_2048/train_files.txt'))
TEST_FILES = provider.getDataFiles(\
    os.path.join(BASE_DIR, 'data/modelnet40_ply_hdf5_2048/test_files.txt'))


mean_loss = []
mean_accurcy= []
Eval_mean_loss = []
Eval_accuracy = []
Eval_avg_class_acc = []
Global_epoch = 0


# write log
def log_string(out_str):
    LOG_FOUT.write(out_str+'\n')
    LOG_FOUT.flush()   #刷新缓冲区
    print(out_str)

#get learning rate
def get_learning_rate(batch):
    #指数衰减学习率
    learning_rate = tf.train.exponential_decay(
                        BASE_LEARNING_RATE,  # Base learning rate.
                        batch * BATCH_SIZE,  # Current index into the dataset.
                        DECAY_STEP,          # Decay step.
                        DECAY_RATE,          # Decay rate.
                        staircase=True)
    learning_rate = tf.maximum(learning_rate, 0.00001) # CLIP THE LEARNING RATE!
    return learning_rate        

def get_bn_decay(batch):
    bn_momentum = tf.train.exponential_decay(
                      BN_INIT_DECAY,
                      batch*BATCH_SIZE,
                      BN_DECAY_DECAY_STEP,
                      BN_DECAY_DECAY_RATE,
                      staircase=True)
    bn_decay = tf.minimum(BN_DECAY_CLIP, 1 - bn_momentum)
    return bn_decay

def train():
    with tf.Graph().as_default():
        with tf.device('/gpu:'+str(GPU_INDEX)):
            pointclouds_pl, labels_pl = MODEL.placeholder_inputs(BATCH_SIZE, NUM_POINT)
            is_training_pl = tf.placeholder(tf.bool, shape=())
            print(is_training_pl)
            
            # Note the global_step=batch parameter to minimize. 
            # That tells the optimizer to helpfully increment the 'batch' parameter for you every time it trains.
            batch = tf.Variable(0)
            bn_decay = get_bn_decay(batch)
            tf.summary.scalar('bn_decay', bn_decay)

            # Get model and loss 
            pred, end_points = MODEL.get_model(pointclouds_pl, is_training_pl, bn_decay=bn_decay)
            loss = MODEL.get_loss(pred, labels_pl, end_points)
            tf.summary.scalar('loss', loss)

            correct = tf.equal(tf.argmax(pred, 1), tf.to_int64(labels_pl)) #对比向量或者矩阵中元素是否相等(挨个比较)  argmax(1)比较每一行的最大值,to_int64()转换为int类型
            accuracy = tf.reduce_sum(tf.cast(correct, tf.float32)) / float(BATCH_SIZE) #求和
            tf.summary.scalar('accuracy', accuracy)

            # Get training operator
            learning_rate = get_learning_rate(batch)
            tf.summary.scalar('learning_rate', learning_rate)
            if OPTIMIZER == 'momentum':
                optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=MOMENTUM)
            elif OPTIMIZER == 'adam':
                optimizer = tf.train.AdamOptimizer(learning_rate)
            train_op = optimizer.minimize(loss, global_step=batch)  #计算loss对vall_list的梯度,并且计算出的梯度更行对应变量的值
            
            # Add ops to save and restore all the variables.
            saver = tf.train.Saver()
            
        # Create a session
        config = tf.ConfigProto()
        config.gpu_options.allow_growth = True
        config.allow_soft_placement = True
        config.log_device_placement = False
        sess = tf.Session(config=config)

        # Add summary writers
        #merged = tf.merge_all_summaries()
        merged = tf.summary.merge_all()
        train_writer = tf.summary.FileWriter(os.path.join(LOG_DIR, 'train'),
                                  sess.graph)
        test_writer = tf.summary.FileWriter(os.path.join(LOG_DIR, 'test'))

        # Init variables
        init = tf.global_variables_initializer()
        # To fix the bug introduced in TF 0.12.1 as in
        # http://stackoverflow.com/questions/41543774/invalidargumenterror-for-tensor-bool-tensorflow-0-12-1
        #sess.run(init)
        sess.run(init, {is_training_pl: True})

        ops = {'pointclouds_pl': pointclouds_pl,
               'labels_pl': labels_pl,
               'is_training_pl': is_training_pl,
               'pred': pred,
               'loss': loss,
               'train_op': train_op,
               'merged': merged,
               'step': batch}


        

        #开始训练
        for epoch in range(MAX_EPOCH):
            Global_epoch = epoch
            log_string('**** EPOCH %03d ****' % (epoch))
            sys.stdout.flush()
             
            train_one_epoch(sess, ops, train_writer)
            eval_one_epoch(sess, ops, test_writer)
            
            fig, ax = plt.subplots(nrows=2, ncols=3)
            fig.set_figwidth(20)
            plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=0.5)
            
            ax[0][0].plot(mean_loss)
            ax[0][0].set_title('Training_mean loss')
            #print("Mean_loss",mean_loss)
            ax[0][1].plot(mean_accurcy)
            ax[0][1].set_title('Training_mean accuracy')
            ax[1][0].plot(Eval_mean_loss)
            ax[1][0].set_title('Eval_mean loss')
            ax[1][1].plot(Eval_accuracy)
            ax[1][1].set_title('Eval_accuracy')
            ax[1][2].plot(Eval_avg_class_acc)
            ax[1][2].set_title('Eval_avg class acc')
            plt.savefig('./Picture/'+'Epoch_'+str(Global_epoch)+'.png')
            plt.show()
            print("Epoch======>",Global_epoch)
            

            # Save the variables to disk.
            if epoch % 10 == 0:
                save_path = saver.save(sess, os.path.join(LOG_DIR, "model.ckpt"))
                log_string("Model saved in file: %s" % save_path)
       


def train_one_epoch(sess, ops, train_writer):
    """ ops: dict mapping from string to tf ops """
    is_training = True
    
    # Shuffle train files
    train_file_idxs = np.arange(0, len(TRAIN_FILES))
    np.random.shuffle(train_file_idxs)
    for fn in range(len(TRAIN_FILES)):
        log_string('----' + str(fn) + '-----')
        current_data, current_label = provider.loadDataFile(TRAIN_FILES[train_file_idxs[fn]])
        current_data = current_data[:,0:NUM_POINT,:]
        current_data, current_label, _ = provider.shuffle_data(current_data, np.squeeze(current_label))            
        current_label = np.squeeze(current_label)
        
        file_size = current_data.shape[0]
        num_batches = file_size // BATCH_SIZE
        
        total_correct = 0
        total_seen = 0
        loss_sum = 0
       
       
        for batch_idx in range(num_batches):
            start_idx = batch_idx * BATCH_SIZE
            end_idx = (batch_idx+1) * BATCH_SIZE
            
            # Augment batched point clouds by rotation and jittering
            rotated_data = provider.rotate_point_cloud(current_data[start_idx:end_idx, :, :])
            jittered_data = provider.jitter_point_cloud(rotated_data)
            feed_dict = {ops['pointclouds_pl']: jittered_data,
                         ops['labels_pl']: current_label[start_idx:end_idx],
                         ops['is_training_pl']: is_training,}
            summary, step, _, loss_val, pred_val = sess.run([ops['merged'], ops['step'],
                ops['train_op'], ops['loss'], ops['pred']], feed_dict=feed_dict)
            train_writer.add_summary(summary, step)
            pred_val = np.argmax(pred_val, 1)
            correct = np.sum(pred_val == current_label[start_idx:end_idx])
            total_correct += correct
            total_seen += BATCH_SIZE
            loss_sum += loss_val
        
        log_string('mean loss: %f' % (loss_sum / float(num_batches)))
        log_string('accuracy: %f' % (total_correct / float(total_seen)))



        mean_loss.append(loss_sum / float(num_batches))
        #print(mean_loss)
        mean_accurcy.append(total_correct / float(total_seen))
        #print(mean_accurcy)

        

def eval_one_epoch(sess, ops, test_writer):
    """ ops: dict mapping from string to tf ops """
    is_training = False
    total_correct = 0
    total_seen = 0
    loss_sum = 0
    total_seen_class = [0 for _ in range(NUM_CLASSES)]
    total_correct_class = [0 for _ in range(NUM_CLASSES)]
    
    for fn in range(len(TEST_FILES)):
        log_string('----' + str(fn) + '-----')
        current_data, current_label = provider.loadDataFile(TEST_FILES[fn])
        current_data = current_data[:,0:NUM_POINT,:]
        current_label = np.squeeze(current_label)
        
        file_size = current_data.shape[0]
        num_batches = file_size // BATCH_SIZE
        
        for batch_idx in range(num_batches):
            start_idx = batch_idx * BATCH_SIZE
            end_idx = (batch_idx+1) * BATCH_SIZE

            feed_dict = {ops['pointclouds_pl']: current_data[start_idx:end_idx, :, :],
                         ops['labels_pl']: current_label[start_idx:end_idx],
                         ops['is_training_pl']: is_training}
            summary, step, loss_val, pred_val = sess.run([ops['merged'], ops['step'],
                ops['loss'], ops['pred']], feed_dict=feed_dict)
            pred_val = np.argmax(pred_val, 1)
            correct = np.sum(pred_val == current_label[start_idx:end_idx])
            total_correct += correct
            total_seen += BATCH_SIZE
            loss_sum += (loss_val*BATCH_SIZE)
            for i in range(start_idx, end_idx):
                l = current_label[i]
                total_seen_class[l] += 1
                total_correct_class[l] += (pred_val[i-start_idx] == l)
    
    log_string('eval mean loss: %f' % (loss_sum / float(total_seen)))
    log_string('eval accuracy: %f'% (total_correct / float(total_seen)))
    log_string('eval avg class acc: %f' % (np.mean(np.array(total_correct_class)/np.array(total_seen_class,dtype=np.float))))

    Eval_mean_loss.append(loss_sum / float(total_seen))
    Eval_accuracy.append(total_correct / float(total_seen))
    Eval_avg_class_acc.append(np.mean(np.array(total_correct_class)/np.array(total_seen_class,dtype=np.float)))
  

if __name__ == "__main__":
    train()
    LOG_FOUT.close()

T-Net

import tensorflow as tf
import numpy as np
import sys
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
sys.path.append(os.path.join(BASE_DIR, '../utils'))
import tf_util
#Transform_net
#以原始点云作为输入,回归到3*3的矩阵
def input_transform_net(point_cloud, is_training, bn_decay=None, K=3):
    """ Input (XYZ) Transform Net, input is BxNx3 gray image
        Return:
            Transformation matrix of size 3xK """
    batch_size = point_cloud.get_shape()[0].value      # Batch    32
    num_point = point_cloud.get_shape()[1].value       # N   每个Batch点云的大小    1024
    #mlp(64,128,1024)
    input_image = tf.expand_dims(point_cloud, -1)      # 从【32,1024,3】变为【32,1024,3,1】
    net = tf_util.conv2d(input_image, 64, [1,3],       # 从【32,1024,3,1】--->【32,1024,1,64】   Kernal:1x3
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],              # 从【32,1024,1,64】--->【32,1024,1,128】   Kernal:1x1
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv2', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],             # 从【32,1024,1,128】--->【32,1024,1,1024】   Kernal:1x1
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv3', bn_decay=bn_decay)
    net = tf_util.max_pool2d(net, [num_point,1],       # 从【32,1024,1,1024】--->【32,1,1,1】
                             padding='VALID', scope='tmaxpool')

    net = tf.reshape(net, [batch_size, -1])    #[32,1,1,1]  -->   [32,1]
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,    #mlp:512    [32,1] --> [32,512]
                                  scope='tfc1', bn_decay=bn_decay)
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,    #mlp:256    [32,512] --> [32,256]
                                  scope='tfc2', bn_decay=bn_decay)
    #点云旋转矩阵
    with tf.variable_scope('transform_XYZ') as sc:
        assert(K==3)
        weights = tf.get_variable('weights', [256, 3*K],      #设置变量:[256,3*3]   造256为1了后面相乘
                                  initializer=tf.constant_initializer(0.0),
                                  dtype=tf.float32)
        biases = tf.get_variable('biases', [3*K],             #设置变量:[3*3]
                                 initializer=tf.constant_initializer(0.0),
                                 dtype=tf.float32)
        biases += tf.constant([1,0,0,0,1,0,0,0,1], dtype=tf.float32)    
        transform = tf.matmul(net, weights)              #[32,256] * [256,3*3] = [32,3*3]                相当于旋转
        transform = tf.nn.bias_add(transform, biases)    #[32,3*3] + [3*3] = [32,3*3]     将偏差bias加到点云旋转矩阵上    两个矩阵相加             相当于平移

    transform = tf.reshape(transform, [batch_size, 3, K]) #[32,3*3] --> [32,3,3]
    return transform


def feature_transform_net(inputs, is_training, bn_decay=None, K=64):
    """ Feature Transform Net, input is BxNx1xK
        Return:
            Transformation matrix of size KxK """
    batch_size = inputs.get_shape()[0].value
    num_point = inputs.get_shape()[1].value

    net = tf_util.conv2d(inputs, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv2', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv3', bn_decay=bn_decay)
    net = tf_util.max_pool2d(net, [num_point,1],
                             padding='VALID', scope='tmaxpool')

    net = tf.reshape(net, [batch_size, -1])
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,
                                  scope='tfc1', bn_decay=bn_decay)
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,
                                  scope='tfc2', bn_decay=bn_decay)

    with tf.variable_scope('transform_feat') as sc:
        weights = tf.get_variable('weights', [256, K*K],
                                  initializer=tf.constant_initializer(0.0),
                                  dtype=tf.float32)
        biases = tf.get_variable('biases', [K*K],
                                 initializer=tf.constant_initializer(0.0),
                                 dtype=tf.float32)
        biases += tf.constant(np.eye(K).flatten(), dtype=tf.float32)
        transform = tf.matmul(net, weights)
        transform = tf.nn.bias_add(transform, biases)

    transform = tf.reshape(transform, [batch_size, K, K])
    return transform

Classification

import tensorflow as tf
import numpy as np
import math
import sys
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
sys.path.append(os.path.join(BASE_DIR, '../utils'))
import tf_util
from transform_nets import input_transform_net, feature_transform_net

def placeholder_inputs(batch_size, num_point):
    pointclouds_pl = tf.placeholder(tf.float32, shape=(batch_size, num_point, 3))     #[B,N,3]--->[32,1024,3]
    labels_pl = tf.placeholder(tf.int32, shape=(batch_size))            #[B]--->[32]
    return pointclouds_pl, labels_pl


def get_model(point_cloud, is_training, bn_decay=None):
    """ Classification PointNet, input is BxNx3, output Bx40 """
    batch_size = point_cloud.get_shape()[0].value          #Batch size == 32
    num_point = point_cloud.get_shape()[1].value           #Num of points == 1024
    end_points = {}
    #Input_transform
    with tf.variable_scope('transform_net1') as sc:
        transform = input_transform_net(point_cloud, is_training, bn_decay, K=3)      #transform: [32,3,3]
    point_cloud_transformed = tf.matmul(point_cloud, transform)                       #point_cloud_transformed: 转换后的点云[32,1024,3](空间发生了变化)
    input_image = tf.expand_dims(point_cloud_transformed, -1)                         #input_image: [32,1024,3,1]
    #mlp(64,64)
    net = tf_util.conv2d(input_image, 64, [1,3],                                      #[32,1024,3,1]-->[32,1024,1,64]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 64, [1,1],                                              #[32,1024,1,64]-->[32,1024,1,64]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv2', bn_decay=bn_decay)
    #feature_transform
    with tf.variable_scope('transform_net2') as sc:                                   #[32,1024,1,64]-->[32,1024,1,64]
        transform = feature_transform_net(net, is_training, bn_decay, K=64)           #transform: [32,64,64]
    end_points['transform'] = transform
    net_transformed = tf.matmul(tf.squeeze(net, axis=[2]), transform)                 #net_transformed: [32,1024,64]
    net_transformed = tf.expand_dims(net_transformed, [2])                            #net_transformed: [32,1024,1,64]
    #mlp(64,128,2024)
    net = tf_util.conv2d(net_transformed, 64, [1,1],                                  #[32,1024,1,64]-->[32,1024,1,64]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv3', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],                                             #[32,1024,1,64]-->[32,1024,1,128]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv4', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],                                            #[32,1024,1,128]-->[32,1024,1,1024]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv5', bn_decay=bn_decay)

    # Symmetric function: max pooling
    net = tf_util.max_pool2d(net, [num_point,1],                                      #[32,1024,1,1024]-->[32,1,1,1024]
                             padding='VALID', scope='maxpool')
    #mlp(521,256,k)
    net = tf.reshape(net, [batch_size, -1])                                           #[32,1,1,1024]-->[32,1024]
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,         #[32,1024]-->[32,512]
                                  scope='fc1', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training,                #[32,512]-->[32,512]     drpout: 0.7
                          scope='dp1')
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,         #[32,512]-->[32,256]
                                  scope='fc2', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training,                #[32,256]-->[32,256]     drpout: 0.7
                          scope='dp2')
    net = tf_util.fully_connected(net, 40, activation_fn=None, scope='fc3')           #[32,256]-->[32,40]      

    return net, end_points

#loss
def get_loss(pred, label, end_points, reg_weight=0.001):                              #pred: [32,40](net), label: [32](label_s_pl)
    """ pred: B*NUM_CLASSES,
        label: B, """
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred, labels=label)  # softmax + cross_entropy   对输入的logits进行softmax,然后计算交叉熵。适用于每个类别相互独立且互斥的情况
    classify_loss = tf.reduce_mean(loss)                                           
    tf.summary.scalar('classify loss', classify_loss)

    # Enforce the transformation as orthogonal matrix
    transform = end_points['transform'] # BxKxK
    K = transform.get_shape()[1].value
    mat_diff = tf.matmul(transform, tf.transpose(transform, perm=[0,2,1]))            #[32,K,K]-->[32,K,K]     A*A^T
    mat_diff -= tf.constant(np.eye(K), dtype=tf.float32)                              #生成一个K*K的单位矩阵
    mat_diff_loss = tf.nn.l2_loss(mat_diff)                                           #利用L2范数计算误差
    tf.summary.scalar('mat loss', mat_diff_loss)

    return classify_loss + mat_diff_loss * reg_weight


if __name__=='__main__':
    with tf.Graph().as_default():
        inputs = tf.zeros((32,1024,3))
        outputs = get_model(inputs, tf.constant(True))
        print(outputs)

Segmentation

import tensorflow as tf
import numpy as np
import math
import sys
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
sys.path.append(os.path.join(BASE_DIR, '../utils'))
import tf_util
from transform_nets import input_transform_net, feature_transform_net

def placeholder_inputs(batch_size, num_point):
    pointclouds_pl = tf.placeholder(tf.float32,
                                     shape=(batch_size, num_point, 3))
    labels_pl = tf.placeholder(tf.int32,
                                shape=(batch_size, num_point))
    return pointclouds_pl, labels_pl


def get_model(point_cloud, is_training, bn_decay=None):
    """ Classification PointNet, input is BxNx3, output BxNx50 """
    batch_size = point_cloud.get_shape()[0].value                                   #Batch_size ---> 32
    num_point = point_cloud.get_shape()[1].value                                    #num_point ---> 1024
    end_points = {}
    #Input_transform
    with tf.variable_scope('transform_net1') as sc:
        transform = input_transform_net(point_cloud, is_training, bn_decay, K=3)    #[BxNx3] ---> [32,3,3]
    point_cloud_transformed = tf.matmul(point_cloud, transform)                     #point_cloud_transformed: 转换后的点云[32,1024,3](空间发生了变化)
    input_image = tf.expand_dims(point_cloud_transformed, -1)                       #input_image: [32,1024,3,1]
    #mlp(64,64)
    net = tf_util.conv2d(input_image, 64, [1,3],                                    #[32,1024,3,1]-->[32,1024,1,64]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 64, [1,1],                                            #[32,1024,1,64]-->[32,1024,1,64]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv2', bn_decay=bn_decay)

    with tf.variable_scope('transform_net2') as sc:                                 #[32,1024,1,64]-->[32,1024,1,64]
        transform = feature_transform_net(net, is_training, bn_decay, K=64)         #transform: [32,64,64]
    end_points['transform'] = transform
    net_transformed = tf.matmul(tf.squeeze(net, axis=[2]), transform)               #net_transformed: [32,1024,64]
    point_feat = tf.expand_dims(net_transformed, [2])                               #net_transformed: [32,1024,1,64]
    print(point_feat)
    #mlp(64,128,1024)
    net = tf_util.conv2d(point_feat, 64, [1,1],                                     #[32,1024,1,64]-->[32,1024,1,64]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv3', bn_decay=bn_decay)                          
    net = tf_util.conv2d(net, 128, [1,1],                                           #[32,1024,1,64]-->[32,1024,1,64]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv4', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],                                          #[32,1024,1,128]-->[32,1024,1,1024]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv5', bn_decay=bn_decay)
    #Max-pooling
    global_feat = tf_util.max_pool2d(net, [num_point,1],                            #[32,1024,1,1024]-->[32,1,1,1024]  
                                     padding='VALID', scope='maxpool')
    print(global_feat)
    #Aggregation(point_feat + global_feat)
    global_feat_expand = tf.tile(global_feat, [1, num_point, 1, 1])                 #[32,1,1,1024]-->[32,1024,1,1024]      
    concat_feat = tf.concat(3, [point_feat, global_feat_expand])                    #[32,1024,1,64+1024]
    print(concat_feat)
    #mlp(512,256,128)
    net = tf_util.conv2d(concat_feat, 512, [1,1],                                   #[32,1024,1,64+1024]-->[32,1024,1,512]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv6', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 256, [1,1],                                           #[32,1024,1,512]-->[32,1024,1,256]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv7', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],                                           #[32,1024,1,256]-->[32,1024,1,128]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv8', bn_decay=bn_decay)
    #mlp(128,m) m=50                    
    net = tf_util.conv2d(net, 128, [1,1],                                           #[32,1024,1,128]-->[32,1024,1,128]
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv9', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 50, [1,1],                                            #[32,1024,1,128]-->[32,1024,1,50]
                         padding='VALID', stride=[1,1], activation_fn=None,
                         scope='conv10')
    net = tf.squeeze(net, [2]) # BxNxC                                              #[32,1024,1,50]-->[32,1024,50]

    return net, end_points


def get_loss(pred, label, end_points, reg_weight=0.001):                            #给T-Net的权重很小
    """ pred: BxNxC,
        label: BxN, """
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred, labels=label)
    classify_loss = tf.reduce_mean(loss)
    tf.scalar_summary('classify loss', classify_loss)

    # Enforce the transformation as orthogonal matrix
    transform = end_points['transform'] # BxKxK
    K = transform.get_shape()[1].value
    mat_diff = tf.matmul(transform, tf.transpose(transform, perm=[0,2,1]))        #[32,K,K]-->[32,K,K]     A*A^T
    mat_diff -= tf.constant(np.eye(K), dtype=tf.float32)                          #生成一个K*K的单位矩阵
    mat_diff_loss = tf.nn.l2_loss(mat_diff)                                       #利用L2范数计算误差
    tf.scalar_summary('mat_loss', mat_diff_loss)

    return classify_loss + mat_diff_loss * reg_weight


if __name__=='__main__':
    with tf.Graph().as_default():
        inputs = tf.zeros((32,1024,3))
        outputs = get_model(inputs, tf.constant(True))
        print(outputs)

Reference

Qi, C., Su, H., Mo, K., & Guibas, L. (2017, April 10). PointNet: Deep Learning on point sets for 3D classification and segmentation. Retrieved May 24, 2022, from https://arxiv.org/abs/1612.00593

GitHub - charlesq34/pointnet: PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值