voxelnet tensorflow实现代码解析学习笔记

最新推荐文章于 2024-07-13 13:47:32 发布

星禾说

最新推荐文章于 2024-07-13 13:47:32 发布

阅读量375

点赞数 1

分类专栏：笔记

本文链接：https://blog.csdn.net/cindy9608/article/details/120362659

版权

笔记专栏收录该内容

5 篇文章 0 订阅

订阅专栏

非官方代码：github代码https://github.com/jeasinema/VoxelNet-tensorflow

在这里插入图片描述网络结构可以分为以上三个部分：feature learning network、convolutional middle layers、和最后生产预测结果和框的RPN部分(这部分参考了faster r-cnn)

一、feature learning network

这部分主要是将点云分割成小的体素，对每个包含点的体素进行全连接，将每个体素生成VFE层，最终得到四维的张量，即完成如下图所示的全部步骤。
在这里插入图片描述

VFE层

把非空体素V(体素内包含t<=T个点)进行处理，点Pi包括它的XYZ坐标和反射率信息[Xi,Yi,Zi,ri]。
以下是对单个V处理的步骤，所有非空V处理是一样的
1、V中的点计算平均值，得到（Vx,Vy,Vz）
2、把平均值扩展到每个点的张量里，维度为7，[Xi,Yi,Zi,ri,Xi-Vx,Yi-Vy,Zi-Vz]
3、把V中每个点通过全连接网络FCN转换到一个特征空间
FCN层包括线性层，BN层和ReLU层
4、最大值池化处理全连接层的每个点特征
5、最后，把最每个点大值池化处理结果和FCN层处理结果结合
在这里插入图片描述

代码解析一训练策略

学习速率的设置

        self.cls = cls
        self.single_batch_size = single_batch_size
        self.learning_rate = tf.Variable(
            float(learning_rate), trainable=False, dtype=tf.float32)
        self.global_step = tf.Variable(1, trainable=False)
        self.epoch = tf.Variable(0, trainable=False)
        self.epoch_add_op = self.epoch.assign(self.epoch + 1)
        self.alpha = alpha
        self.beta = beta
        self.avail_gpus = avail_gpus

        boundaries = [80, 120]
        values = [ self.learning_rate, self.learning_rate * 0.1, self.learning_rate * 0.01 ]
        lr = tf.train.piecewise_constant(self.epoch, boundaries, values)

使用tf.train.piecewise_constant( )在训练过程中更改学习率，主要有两种方式，第一个是学习率指数衰减，第二个就是迭代次数在某一范围指定一个学习率。
tf.train.piecewise_constant( )就是第二种，boundaries = [80, 120]是范围区间，values = [ self.learning_rate, self.learning_rate * 0.1, self.learning_rate * 0.01 ]定义了每个区间的学习速率是多少，即在0～80步时学习速率为self.learning_rate，80～120学习速率是self.learning_rate * 0.1，120之后学习速率是self.learning_rate * 0.01。self.epoch在函数中代表迭代次数global_step。

代码解析二—VFE层实现

代码model.py中引用了from model.group_pointcloud import FeatureNet来实现该层的工作
model.py中主要传入了以下几个参数

feature = FeatureNet(training=self.is_train,batch_size=self.single_batch_size)

group_pointcloud.py中为两个类FeatureNet类实现了第一层feature learning network的所有内容，VFElayer定义了其中的每一层该如何实现

class VFELayer(object):

    def __init__(self, out_channels, name):
        super(VFELayer, self).__init__()
        self.units = int(out_channels / 2)
        with tf.variable_scope(name, reuse=tf.AUTO_REUSE) as scope:
            self.dense = tf.layers.Dense(
                self.units, tf.nn.relu, name='dense', _reuse=tf.AUTO_REUSE, _scope=scope)#设置全连接层，函数具体解释见下，输出=m*k，k=out_channels / 2
            self.batch_norm = tf.layers.BatchNormalization(
                name='batch_norm', fused=True, _reuse=tf.AUTO_REUSE, _scope=scope)#设置BN层

    def apply(self, inputs, mask, training):
        # [K, T, 7] tensordot [7, units] = [K, T, units]
        pointwise = self.batch_norm.apply(self.dense.apply(inputs), training)

        #n [K, 1, units]
        aggregated = tf.reduce_max(pointwise, axis=1, keep_dims=True)#axis=1,按行求最值，即最大值池化处理FCN层得到的每个点特征
        # [K, T, units]
        repeated = tf.tile(aggregated, [1, cfg.VOXEL_POINT_COUNT, 1])#在每列上复制该列最大值的结果以便后续的点特征结合

        # [K, T, 2 * units]
        concatenated = tf.concat([pointwise, repeated], axis=2)#把每个点最大值池化处理结果和FCN层处理结果结合

        mask = tf.tile(mask, [1, 1, 2 * self.units])

        concatenated = tf.multiply(concatenated, tf.cast(mask, tf.float32))#把非空体素中的每个点最大值池化处理结果和FCN层处理结果结合

        return concatenated


class FeatureNet(object):

    def __init__(self, training, batch_size, name=''):
        super(FeatureNet, self).__init__()
        self.training = training

        # scalar
        self.batch_size = batch_size
        # [ΣK, 35/45, 7]
        self.feature = tf.placeholder(
            tf.float32, [None, cfg.VOXEL_POINT_COUNT, 7], name='feature')
        # [ΣK]
        self.number = tf.placeholder(tf.int64, [None], name='number')
        # [ΣK, 4], each row stores (batch, d, h, w)
        self.coordinate = tf.placeholder(
            tf.int64, [None, 4], name='coordinate')

        with tf.variable_scope(name, reuse=tf.AUTO_REUSE) as scope:
            self.vfe1 = VFELayer(32, 'VFE-1')
            self.vfe2 = VFELayer(128, 'VFE-2')

        # boolean mask [K, T, 2 * units]
        mask = tf.not_equal(tf.reduce_max(
            self.feature, axis=2, keep_dims=True), 0)#确定非空集合tf.not_equal(x,y)返回x!=y的真值
        x = self.vfe1.apply(self.feature, mask, self.training)
        x = self.vfe2.apply(x, mask, self.training)

        # [ΣK, 128]
        voxelwise = tf.reduce_max(x, axis=1)

        # car: [N * 10 * 400 * 352 * 128]
        # pedestrian/cyclist: [N * 10 * 200 * 240 * 128]
        self.outputs = tf.scatter_nd(
            self.coordinate, voxelwise, [self.batch_size, 10, cfg.INPUT_HEIGHT, cfg.INPUT_WIDTH, 128])#根据indices将updates散布到新的（初始为零）张量。得到最终的稀疏张量

1、tf.layers.dense( input, units=k )

作者：MrFenrir
链接：https://www.jianshu.com/p/3855908b4c29
来源：简书
函数在内部自动生成一个权矩阵kernel和偏移项bias，各变量具体尺寸如下：对于尺寸为[m, n]的二维张量input， tf.layers.dense()会生成：尺寸为[n, k]的权矩阵kernel，和尺寸为[m, k]的偏移项bias。内部的计算过程为y = input * kernel + bias，输出值y的维度为[m, k]。
例程：

import tensorflow as tf

# 1. 调用tf.layers.dense计算
input = tf.reshape(tf.constant([[1., 2.], [2., 3.]]), shape=[4, 1])    
b1 = tf.layers.dense(input,
                     units=2,
                     kernel_initializer=tf.constant_initializer(value=2),   # shape: [1,2]
                     bias_initializer=tf.constant_initializer(value=1))     # shape: [4,2]

# 2. 采用矩阵相乘的方式计算
kernel = tf.reshape(tf.constant([2., 2.]), shape=[1, 2])
bias = tf.reshape(tf.constant([1., 1., 1., 1., 1., 1., 1., 1.]), shape=[4, 2])
b2 = tf.add(tf.matmul(input, kernel), bias)

with tf.Session()as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(b1))
    print(sess.run(b2))

2、tf.reduce_max()

https://blog.csdn.net/lllxxq141592654/article/details/85345864

import tensorflow as tf
import numpy as np

a=np.array([[1, 2],
            [5, 3],
            [2, 6]])

b = tf.Variable(a)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(b))
    print('************')
    # 对于二维矩阵，axis=0轴可以理解为行增长方向（向下）,axis=1轴可以理解为列增长方向(向右）
    print(sess.run(tf.reduce_max(b, axis=1, keepdims=False)))  # keepdims=False,axis=1被消减
    print('************')
    print(sess.run(tf.reduce_max(b, axis=1, keepdims=True)))
    print('************')
    print(sess.run(tf.reduce_max(b, axis=0, keepdims=True)))

[[1 2]
 [5 3]
 [2 6]]
************
[2 5 6]
************
[[2]
 [5]
 [6]]
************
[[5 6]]

3、tf.title()

https://www.cnblogs.com/yibeimingyue/p/11869882.html
tf.tile() 用法介绍
tile() 平铺之意，用于在同一维度上的复制

tile(
    input,     #输入
    multiples,  #同一维度上复制的次数
    name=None
)

示例如下：

with tf.Graph().as_default():
    a = tf.constant([1,2],name='a')
    b = tf.tile(a,[3])
    sess = tf.Session()
    print(sess.run(b))

对[1,2]的同一维度上复制3次，multiples参数维度与input维度应一致，结果如下：

[1 2 1 2 1 2]

with tf.Graph().as_default():
    a = tf.constant([[1,2],[3,4]],name='a')   
    b = tf.tile(a,[2,3])
    sess = tf.Session()
    print(sess.run(b))

输出：

[[1 2 1 2 1 2]
 [3 4 3 4 3 4]
 [1 2 1 2 1 2]
 [3 4 3 4 3 4]]

4、tf.scatter_nd()

根据indices将updates散布到新的（初始为零）张量。

    indices = tf.constant([[4], [3], [1], [7]])
    updates = tf.constant([9, 10, 11, 12])
    shape = tf.constant([8])
    scatter = tf.scatter_nd(indices, updates, shape)
    with tf.Session() as sess:
      print(sess.run(scatter))

二、Convolutional Middle Layers

三、Region Proposal Network

程序中把后两个网络的内容放在了一起
model.py中如下定义：

rpn = MiddleAndRPN(input=feature.outputs, alpha=self.alpha, beta=self.beta, training=self.is_train)

该函数在rpn.py中

class MiddleAndRPN:
    def __init__(self, input, alpha=1.5, beta=1, sigma=3, training=True, name=''):
        # scale = [batchsize, 10, 400/200, 352/240, 128] should be the output of feature learning network
        self.input = input
        self.training = training
        # groundtruth(target) - each anchor box, represent as △x, △y, △z, △l, △w, △h, rotation
        self.targets = tf.placeholder(
            tf.float32, [None, cfg.FEATURE_HEIGHT, cfg.FEATURE_WIDTH, 14])
        # postive anchors equal to one and others equal to zero(2 anchors in 1 position)
        self.pos_equal_one = tf.placeholder(
            tf.float32, [None, cfg.FEATURE_HEIGHT, cfg.FEATURE_WIDTH, 2])
        self.pos_equal_one_sum = tf.placeholder(tf.float32, [None, 1, 1, 1])
        self.pos_equal_one_for_reg = tf.placeholder(
            tf.float32, [None, cfg.FEATURE_HEIGHT, cfg.FEATURE_WIDTH, 14])
        # negative anchors equal to one and others equal to zero
        self.neg_equal_one = tf.placeholder(
            tf.float32, [None, cfg.FEATURE_HEIGHT, cfg.FEATURE_WIDTH, 2])
        self.neg_equal_one_sum = tf.placeholder(tf.float32, [None, 1, 1, 1])

        with tf.variable_scope('MiddleAndRPN_' + name):
            # convolutinal middle layers
            temp_conv = ConvMD(3, 128, 64, 3, (2, 1, 1),
                               (1, 1, 1), self.input, name='conv1')
            temp_conv = ConvMD(3, 64, 64, 3, (1, 1, 1),
                               (0, 1, 1), temp_conv, name='conv2')
            temp_conv = ConvMD(3, 64, 64, 3, (2, 1, 1),
                               (1, 1, 1), temp_conv, name='conv3')
            temp_conv = tf.transpose(temp_conv, perm=[0, 2, 3, 4, 1])
            temp_conv = tf.reshape(
                temp_conv, [-1, cfg.INPUT_HEIGHT, cfg.INPUT_WIDTH, 128])

            # rpn
            # block1:
            temp_conv = ConvMD(2, 128, 128, 3, (2, 2), (1, 1),
                               temp_conv, training=self.training, name='conv4')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv5')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv6')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv7')
            deconv1 = Deconv2D(128, 256, 3, (1, 1), (0, 0),
                               temp_conv, training=self.training, name='deconv1')

            # block2:
            temp_conv = ConvMD(2, 128, 128, 3, (2, 2), (1, 1),
                               temp_conv, training=self.training, name='conv8')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv9')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv10')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv11')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv12')
            temp_conv = ConvMD(2, 128, 128, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv13')
            deconv2 = Deconv2D(128, 256, 2, (2, 2), (0, 0),
                               temp_conv, training=self.training, name='deconv2')

            # block3:
            temp_conv = ConvMD(2, 128, 256, 3, (2, 2), (1, 1),
                               temp_conv, training=self.training, name='conv14')
            temp_conv = ConvMD(2, 256, 256, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv15')
            temp_conv = ConvMD(2, 256, 256, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv16')
            temp_conv = ConvMD(2, 256, 256, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv17')
            temp_conv = ConvMD(2, 256, 256, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv18')
            temp_conv = ConvMD(2, 256, 256, 3, (1, 1), (1, 1),
                               temp_conv, training=self.training, name='conv19')
            deconv3 = Deconv2D(256, 256, 4, (4, 4), (0, 0),
                               temp_conv, training=self.training, name='deconv3')

            # final:
            temp_conv = tf.concat([deconv3, deconv2, deconv1], -1)
            # Probability score map, scale = [None, 200/100, 176/120, 2]
            p_map = ConvMD(2, 768, 2, 1, (1, 1), (0, 0), temp_conv,
                           training=self.training, activation=False, bn=False, name='conv20')
            # Regression(residual) map, scale = [None, 200/100, 176/120, 14]
            r_map = ConvMD(2, 768, 14, 1, (1, 1), (0, 0),
                           temp_conv, training=self.training, activation=False, bn=False, name='conv21')
            # softmax output for positive anchor and negative anchor, scale = [None, 200/100, 176/120, 1]
            self.p_pos = tf.sigmoid(p_map)
            #self.p_pos = tf.nn.softmax(p_map, dim=3)
            self.output_shape = [cfg.FEATURE_HEIGHT, cfg.FEATURE_WIDTH]

            self.cls_pos_loss = (-self.pos_equal_one * tf.log(self.p_pos + small_addon_for_BCE)) / self.pos_equal_one_sum
            self.cls_neg_loss = (-self.neg_equal_one * tf.log(1 - self.p_pos + small_addon_for_BCE)) / self.neg_equal_one_sum
            
            self.cls_loss = tf.reduce_sum( alpha * self.cls_pos_loss + beta * self.cls_neg_loss )
            self.cls_pos_loss_rec = tf.reduce_sum( self.cls_pos_loss )
            self.cls_neg_loss_rec = tf.reduce_sum( self.cls_neg_loss )


            self.reg_loss = smooth_l1(r_map * self.pos_equal_one_for_reg, self.targets *
                                      self.pos_equal_one_for_reg, sigma) / self.pos_equal_one_sum
            self.reg_loss = tf.reduce_sum(self.reg_loss)

            self.loss = tf.reduce_sum(self.cls_loss + self.reg_loss)

            self.delta_output = r_map
            self.prob_output = self.p_pos


def smooth_l1(deltas, targets, sigma=3.0):
    sigma2 = sigma * sigma
    diffs = tf.subtract(deltas, targets)
    smooth_l1_signs = tf.cast(tf.less(tf.abs(diffs), 1.0 / sigma2), tf.float32)

    smooth_l1_option1 = tf.multiply(diffs, diffs) * 0.5 * sigma2
    smooth_l1_option2 = tf.abs(diffs) - 0.5 / sigma2
    smooth_l1_add = tf.multiply(smooth_l1_option1, smooth_l1_signs) + \
        tf.multiply(smooth_l1_option2, 1 - smooth_l1_signs)
    smooth_l1 = smooth_l1_add

    return smooth_l1


def ConvMD(M, Cin, Cout, k, s, p, input, training=True, activation=True, bn=True, name='conv'):
    temp_p = np.array(p)
    temp_p = np.lib.pad(temp_p, (1, 1), 'constant', constant_values=(0, 0))
    with tf.variable_scope(name) as scope:
        if(M == 2):
            paddings = (np.array(temp_p)).repeat(2).reshape(4, 2)
            pad = tf.pad(input, paddings, "CONSTANT")
            temp_conv = tf.layers.conv2d(
                pad, Cout, k, strides=s, padding="valid", reuse=tf.AUTO_REUSE, name=scope)
        if(M == 3):
            paddings = (np.array(temp_p)).repeat(2).reshape(5, 2)
            pad = tf.pad(input, paddings, "CONSTANT")
            temp_conv = tf.layers.conv3d(
                pad, Cout, k, strides=s, padding="valid", reuse=tf.AUTO_REUSE, name=scope)
        if bn:
            temp_conv = tf.layers.batch_normalization(
                temp_conv, axis=-1, fused=True, training=training, reuse=tf.AUTO_REUSE, name=scope)
        if activation:
            return tf.nn.relu(temp_conv)
        else:
            return temp_conv

def Deconv2D(Cin, Cout, k, s, p, input, training=True, bn=True, name='deconv'):
    temp_p = np.array(p)
    temp_p = np.lib.pad(temp_p, (1, 1), 'constant', constant_values=(0, 0))
    paddings = (np.array(temp_p)).repeat(2).reshape(4, 2)
    pad = tf.pad(input, paddings, "CONSTANT")
    with tf.variable_scope(name) as scope:
        temp_conv = tf.layers.conv2d_transpose(
            pad, Cout, k, strides=s, padding="SAME", reuse=tf.AUTO_REUSE, name=scope)
        if bn:
            temp_conv = tf.layers.batch_normalization(
                temp_conv, axis=-1, fused=True, training=training, reuse=tf.AUTO_REUSE, name=scope)
        return tf.nn.relu(temp_conv)


if(__name__ == "__main__"):
    m = MiddleAndRPN(tf.placeholder(
        tf.float32, [None, 10, cfg.INPUT_HEIGHT, cfg.INPUT_WIDTH, 128]))

星禾说

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
voxelnet tensorflow实现代码解析学习笔记

非官方代码：github代码https://github.com/jeasinema/VoxelNet-tensorflow网络结构可以分为以上三个部分：feature learning network、convolutional middle layers、和最后生产预测结果和框的RPN部分(这部分参考了faster r-cnn)feature learning network这部分主要是将点云分割成小的体素，对每个包含点的体素进行全连接，将每个体素生成VFE层，最终得到四维的张量VFE层把非空
复制链接

扫一扫

专栏目录