JSNet:三维点云的联合实例和语义分割--个人笔记

最新推荐文章于 2024-05-10 09:31:00 发布

bingo-yy

最新推荐文章于 2024-05-10 09:31:00 发布

阅读量1.3k

点赞数 2

分类专栏： PointNet 深度学习文章标签：经验分享 python 深度学习计算机视觉

本文链接：https://blog.csdn.net/m0_46254797/article/details/121603780

版权

深度学习同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

PointNet

2 篇文章 0 订阅

订阅专栏

                              JSNet:三维点云的联合实例和语义分割：

JSNet学习框架

摘要
- 网络架构
- - 代码分析

摘要

在本文中，我们介绍了一种三维点云的联合实例语义分割神经网络JSNet来解决两个基本问题:语义分割和实例分割。所提出的网络JSNet包括四个部分:共享的特征编码器、两个并行分支解码器、每个解码器的特征融合模块、联合分割模块。基于PointNet++ (Qi et al. 2017b)和PointConv (Wu, Qi, and Fuxin 2019)构建特征编码器和解码器，以学习更有效的高层语义特征。为了获得更多的鉴别特征，我们提出了一个点云特征融合模块，融合高层和低层信息，细化输出特征。为了使两个任务相互促进，提出了一个联合实例和语义分割模块来同时处理实例和语义分割。具体来说，该模块通过一维卷积将语义特征转换为实例嵌入空间，然后将转换后的特征进一步与实例特征融合，便于实例分割。同时，该模块还通过内隐学习将实例特征聚合到语义特征空间中，以促进语义分割。因此，我们的方法可以用于学习实例感知的语义融合特征和语义感知的实例嵌入特征，从而使这些点的预测更加准确。

网络架构

在这里插入图片描述
图2(a)所示的整个网络由四个主要组件组成，包括一个共享编码器，两个并行解码器，每个解码器对应一个点云特征融合模块，最后一个联合分割模块。对于两个并行分支，一个是为每个点提取语义特征，另一个是为例如分割任务。特别是对于特性编码器和两个解码器，我们可以通过复制一个解码器来直接使用pointnet++或PointConv作为我们的主干网络，因为这两个解码器具有相同的结构。但是，如上所述，例如语义分割，由于最大的池操作，PointNet++可能会丢失详细的信息，并且在训练过程中，PointConv会消耗昂贵的GPU内存。在这项工作中，我们结合PointNet++和PointConv来建立一个更有效的骨干网和可接受的内存成本。骨干的编码器是通过连接PointNet++的集合抽象模块和PointConv的三个特性编码层来构建的。类似地，解码器由PointConv的三个深入的特性解码层和PointNet++的特性传播模块组成。对于整个管道，我们的网络取sizeNaas输入的点云，然后通过共享特征编码器将其编码成Ne×512形状的矩阵。接下来，特征编码器的输出被输入到两个并行解码器中，并由其后面的组件分别处理。语义分支解码共享特征，并将不同层的特征融合成一个以Na×128形状的语义特征矩阵FSS。类似地，实例分支在PCFF模块之后输出实例特征矩阵FIS。最后，由JISS模块获取并处理语义特征和实例特征，输出两个特征矩阵。其中一个由Na×C塑造的矩阵PSSI用于预测语义类别，其中C为语义类别的数量。另一个由Na×K塑造的EISS是一个实例特征矩阵，用于预测每个点的实例标签，其中K是嵌入向量的维数。在嵌入空间中，嵌入表示点的实例关系:属于同一实例对象的点很接近，而不同实例的点彼此远离。

U:带插值的上采样张量
C:沿一维收缩张量 //tf.concat 拼接一维张量
+：元素级加法 //对应元素加
x：元素积 //对应元素乘积
F：一维非线性卷积
R:跨越张量维数的元素平均值 //tf.reduce_mean计算张量沿指定轴的平均值
S: //tf.sigmoid将输出压缩至0-1范围
T：通过平铺给定的张量来构造张量 //tf.tile对张量进行扩张
M：张量跨维元的最小值 //tf.reduce_min
-：元素划分
1：连接点

JISS模块：
实例分割
语义特征矩阵（FSS）–>F(1D卷积)–>实例特征空间（FSST）–>FSST作为FISS的元素被添加到FIS中–>将特征FIS和FISSS连接成FISSC–>通过均值R（mean）和S（sigmoid）得到权重矩阵FISR–>FISSC×FISR=FISSR–>两次F（1D卷积）–>EISS（Na×K）
公式步骤：FISSC=Concat(FIS, FIS+Conv1D(FSS))
FISSR=FISSC·Sigmoid(M ean(FISSC))
EISS=Conv1D(Conv1D(FISSR))
语义分割
FISSR–>F(1D卷积) R(mean) T(tile)–>FISST–>FSS+FISST和FSS进行C(concat)=FSSI–>FSSI–>R(mean) S(sigmoid) 得到的结果和FSSI做（乘积）X得到FSSIR–>两次F（1D卷积）–>PSSI
公式步骤：FISST=T ile(M ean(Conv1D(FISSR)))
FSSI=Concat(FSS, FSS+FISST)
FSSIR=FSSI·Sigmoid(M ean(FSSI)),
PSSI=Conv1D(Conv1D(FSSIR))

代码分析

def get_model(point_cloud, is_training, num_class, num_embed=5, sigma=0.05, bn_decay=None, is_dist=False):
    """ Semantic segmentation PointNet, input is BxNx3, output Bxnum_class """
    batch_size = point_cloud.get_shape()[0].value
    num_point = point_cloud.get_shape()[1].value
    end_points = {}
    l0_xyz = point_cloud[:, :, :3]
    l0_points = point_cloud[:, :, 3:]
    end_points['l0_xyz'] = l0_xyz

    # shared encoder 共享编码器
    #pointnet_sa_module 特征提取
    #pointconv_encoding 编码
    l1_xyz, l1_points, l1_indices = pointnet_sa_module(l0_xyz, l0_points, npoint=1024, radius=0.1, nsample=32, mlp=[32, 32, 64], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, is_dist=is_dist, scope='layer1')
    l2_xyz, l2_points = pointconv_encoding(l1_xyz, l1_points, npoint=256, radius=0.2, sigma=2 * sigma, K=32, mlp=[ 64,  64, 128], is_training=is_training, bn_decay=bn_decay, is_dist=is_dist, weight_decay=None, scope='layer2')
    l3_xyz, l3_points = pointconv_encoding(l2_xyz, l2_points, npoint=64,  radius=0.4, sigma=4 * sigma, K=32, mlp=[128, 128, 256], is_training=is_training, bn_decay=bn_decay, is_dist=is_dist, weight_decay=None, scope='layer3')
    l4_xyz, l4_points = pointconv_encoding(l3_xyz, l3_points, npoint=32,  radius=0.8, sigma=8 * sigma, K=32, mlp=[256, 256, 512], is_training=is_training, bn_decay=bn_decay, is_dist=is_dist, weight_decay=None, scope='layer4')

    # semantic decoder 语义解码器
    #pointconv_decoding_depthwise 解码
    #pointnet_fp_module 特征插值
    l3_points_sem = pointconv_decoding_depthwise(l3_xyz, l4_xyz, l3_points, l4_points,     radius=0.8, sigma=8*sigma, K=16, mlp=[512, 512], is_training=is_training, bn_decay=bn_decay, is_dist=is_dist, weight_decay=None, scope='sem_fa_layer1')
    l2_points_sem = pointconv_decoding_depthwise(l2_xyz, l3_xyz, l2_points, l3_points_sem, radius=0.4, sigma=4*sigma, K=16, mlp=[256, 256], is_training=is_training, bn_decay=bn_decay, is_dist=is_dist, weight_decay=None, scope='sem_fa_layer2')  # 48x256x256
    l1_points_sem = pointconv_decoding_depthwise(l1_xyz, l2_xyz, l1_points, l2_points_sem, radius=0.2, sigma=2*sigma, K=16, mlp=[256, 128], is_training=is_training, bn_decay=bn_decay, is_dist=is_dist, weight_decay=None, scope='sem_fa_layer3')  # 48x1024x128
    l0_points_sem = pointnet_fp_module(l0_xyz, l1_xyz, l0_points, l1_points_sem, [128, 128, 128], is_training, bn_decay, is_dist=is_dist, scope='sem_fa_layer4')  # 48x4096x128

    # instance decoder 实例解码器
    #pointconv_decoding_depthwise 解码
    #pointnet_fp_module 特征插值
    l3_points_ins = pointconv_decoding_depthwise(l3_xyz, l4_xyz, l3_points, l4_points,     radius=0.8, sigma=8*sigma, K=16, mlp=[512, 512], is_training=is_training, bn_decay=bn_decay, is_dist=is_dist, weight_decay=None, scope='ins_fa_layer1')
    l2_points_ins = pointconv_decoding_depthwise(l2_xyz, l3_xyz, l2_points, l3_points_ins, radius=0.4, sigma=4*sigma, K=16, mlp=[256, 256], is_training=is_training, bn_decay=bn_decay, is_dist=is_dist, weight_decay=None, scope='ins_fa_layer2')  # 48x256x256
    l1_points_ins = pointconv_decoding_depthwise(l1_xyz, l2_xyz, l1_points, l2_points_ins, radius=0.2, sigma=2*sigma, K=16, mlp=[256, 128], is_training=is_training, bn_decay=bn_decay, is_dist=is_dist, weight_decay=None, scope='ins_fa_layer3')  # 48x1024x128
    l0_points_ins = pointnet_fp_module(l0_xyz, l1_xyz, l0_points, l1_points_ins, [128, 128, 128], is_training, bn_decay, is_dist=is_dist, scope='ins_fa_layer4')   # 48x4096x128

    # FC layers F_sem 语义全连接层  此处对应PCFF特征融合模块
    #pointnet_upsample 上采样,对应图中的U
    l2_points_sem_up = pointnet_upsample(l0_xyz, l2_xyz, l2_points_sem, scope='sem_up1')
    l1_points_sem_up = pointnet_upsample(l0_xyz, l1_xyz, l1_points_sem, scope='sem_up2')
    net_sem_0 = tf.add(tf.concat([l0_points_sem, l1_points_sem_up], axis=-1, name='sem_up_concat'), l2_points_sem_up, name='sem_up_add')
    net_sem_0 = tf_util.conv1d(net_sem_0, 128, 1, padding='VALID', bn=True, is_training=is_training, is_dist=is_dist, scope='sem_fc1', bn_decay=bn_decay)

    # FC layers F_ins 实例全连接层
    #pointnet_upsample 上采样,对应图中的U
    #concat 表示拼接张量 
    #add表示+
    # conv1d  表示一维卷积
    l2_points_ins_up = pointnet_upsample(l0_xyz, l2_xyz, l2_points_ins, scope='ins_up1')
    l1_points_ins_up = pointnet_upsample(l0_xyz, l1_xyz, l1_points_ins, scope='ins_up2')
    net_ins_0 = tf.add(tf.concat([l0_points_ins, l1_points_ins_up], axis=-1, name='ins_up_concat'), l2_points_ins_up, name='ins_up_add')
    net_ins_0 = tf_util.conv1d(net_ins_0, 128, 1, padding='VALID', bn=True, is_training=is_training, is_dist=is_dist, scope='ins_fc1', bn_decay=bn_decay)

    # Adaptation
    #reduce_mean 表示计算张量沿指定轴的平均值
    #sigmoid 表示将输出压缩至0-1范围
    net_sem_cache_0 = tf_util.conv1d(net_sem_0, 128, 1, padding='VALID', bn=True, is_training=is_training, is_dist=is_dist, scope='sem_cache_1', bn_decay=bn_decay)
    net_ins_1 = net_ins_0 + net_sem_cache_0

    net_ins_2 = tf.concat([net_ins_0, net_ins_1], axis=-1, name='net_ins_2_concat')
    net_ins_atten = tf.sigmoid(tf.reduce_mean(net_ins_2, axis=-1, keep_dims=True, name='ins_reduce'), name='ins_atten')
    net_ins_3 = net_ins_2 * net_ins_atten

    # Aggregation
    #tile 表示对张量进行扩张
    net_ins_cache_0 = tf_util.conv1d(net_ins_3, 128, 1, padding='VALID', bn=True, is_training=is_training, is_dist=is_dist, scope='ins_cache_1', bn_decay=bn_decay)
    net_ins_cache_1 = tf.reduce_mean(net_ins_cache_0, axis=1, keep_dims=True, name='ins_cache_2')
    net_ins_cache_1 = tf.tile(net_ins_cache_1, [1, num_point, 1], name='ins_cache_tile')
    net_sem_1 = net_sem_0 + net_ins_cache_1

    net_sem_2 = tf.concat([net_sem_0, net_sem_1], axis=-1, name='net_sem_2_concat')
    net_sem_atten = tf.sigmoid(tf.reduce_mean(net_sem_2, axis=-1, keep_dims=True, name='sem_reduce'), name='sem_atten')
    net_sem_3 = net_sem_2 * net_sem_atten

    # Output
    #drop 表示按照keep_prob=0.5的概率,将inputs中元素设置为0,其他元素按照1/(1-keep_prob)进行缩放,此处是1/(1-0.5)
    net_ins_3 = tf_util.conv1d(net_ins_3, 128, 1, padding='VALID', bn=True, is_training=is_training, is_dist=is_dist, scope='ins_fc2', bn_decay=bn_decay)
    net_ins_4 = tf_util.dropout(net_ins_3, keep_prob=0.5, is_training=is_training, scope='ins_dp_4')
    net_ins_4 = tf_util.conv1d(net_ins_4, num_embed, 1, padding='VALID', activation_fn=None, is_dist=is_dist, scope='ins_fc5')

    net_sem_3 = tf_util.conv1d(net_sem_3, 128, 1, padding='VALID', bn=True, is_training=is_training, is_dist=is_dist, scope='sem_fc2', bn_decay=bn_decay)
    net_sem_4 = tf_util.dropout(net_sem_3, keep_prob=0.5, is_training=is_training, scope='sem_dp_4')
    net_sem_4 = tf_util.conv1d(net_sem_4, num_class, 1, padding='VALID', activation_fn=None, is_dist=is_dist, scope='sem_fc5')

    return net_sem_4, net_ins_4

bingo-yy

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
打赏
1
评论
JSNet:三维点云的联合实例和语义分割--个人笔记

JSNet:三维点云的联合实例和语义分割：摘要：在本文中，我们介绍了一种三维点云的联合实例语义分割神经网络JSNet来解决两个基本问题:语义分割和实例分割。所提出的网络JSNet包括四个部分:共享的特征编码器、两个并行分支解码器、每个解码器的特征融合模块、联合分割模块。基于PointNet++ (Qi et al. 2017b)和PointConv (Wu, Qi, and Fuxin 2019)构建特征编码器和解码器，以学习更有效的高层语义...
复制链接

扫一扫