对于FlowNet3D论文代码的理解(pointnet++)

最新推荐文章于 2024-01-25 12:51:01 发布

亦徵

最新推荐文章于 2024-01-25 12:51:01 发布

阅读量1k

点赞数 3

分类专栏：深度学习-点云文章标签：深度学习 python

本文链接：https://blog.csdn.net/weixin_45736077/article/details/121567181

版权

深度学习-点云专栏收录该内容

1 篇文章 0 订阅

订阅专栏

这篇博客深入解析了一段点云处理代码，涉及点云数据的预处理、下采样、特征嵌入、局部区域池化等步骤。首先，通过最远点采样和球形邻域搜索进行点云的局部特征学习。接着，利用PointNet模块进行点特征嵌入，通过2D卷积和最大池化提取特征。最后，通过flow_embedding_module计算两帧点云之间的关系，得到场景流。整个过程展示了深度学习在处理三维点云数据中的应用。

摘要由CSDN通过智能技术生成

把这个作为自己学习的记录，鞭策自己努力推进任务，认真学习，保持更新。

train.py

已经大概理顺了，省略

model_concat_upsa.py

get_model

这里大概就是一些具体操作了，看上去比较烦人的是一大堆名称以及他们各自的dimension。

	l0_xyz_f1 = point_cloud[:, :num_point, 0:3]  # dim: b n 3
    l0_points_f1 = point_cloud[:, :num_point, 3:]  # dim: b n channel
    l0_xyz_f2 = point_cloud[:, num_point:, 0:3]  # dim: b n 3
    l0_points_f2 = point_cloud[:, num_point:, 3:]  # dim: b n channel

上面这一块代码是从最原始的数据(这所谓原始的数据还有待研究)建立的，l0_xyz_f1的意思就是第零层第一帧点云数据的坐标，它的维度也就是b n 3。

对于我所说的“原始数据”的研究：
point_cloud是传入的第一个参数，在train.py中，它的来源是下面的batch_data，简单来说，就是把TRAIN_DATASET里的pc1, pc2, color1, color2, flow, mask1进行拼接整合。

batch_data, batch_label, batch_mask = get_batch(TRAIN_DATASET, train_idxs, start_idx, end_idx)

在batch_data形成中比较关键的一步：第二个维度前半部分是pos1和color1(以第三个维度区分)，后半部分是pos2和color2(以第三个维度区分)。

		batch_data[i, :NUM_POINT, :3] = pc1[shuffle_idx]
        batch_data[i, :NUM_POINT, 3:] = color1[shuffle_idx]
        batch_data[i, NUM_POINT:, :3] = pc2[shuffle_idx]
        batch_data[i, NUM_POINT:, 3:] = color2[shuffle_idx]

在get_model中对batch_data进行分割，再次将pos1和pos2区分开，并且把它们各自的xyz与其他channel区分开。

    l0_xyz_f1 = point_cloud[:, :num_point, 0:3]  # dim: b n 3
    l0_points_f1 = point_cloud[:, :num_point, 3:]  # dim: b n channel
    l0_xyz_f2 = point_cloud[:, num_point:, 0:3]  # dim: b n 3
    l0_points_f2 = point_cloud[:, num_point:, 3:]  # dim: b n channel

定义了几个半径，主要用来描述某一个点的附近有一个多大的邻域，这个在论文原文有提到。但具体哪一层用多大的邻域并不清楚其中原理。

	RADIUS1 = 0.5
    RADIUS2 = 1.0
    RADIUS3 = 2.0
    RADIUS4 = 4.0

下面就开始一层一层对数据进行处理了。

Layer 1

# Frame 1, Layer 1    # npoint是最远点采样的点的个数
# Frame 1, Layer 1                                                                  # npoint是最远点采样的点的个数
   l1_xyz_f1, l1_points_f1, l1_indices_f1 = pointnet_sa_module(l0_xyz_f1, l0_points_f1, npoint=1024,
                                                                    radius=RADIUS1, nsample=16, mlp=[32, 32, 64],
                                                                    mlp2=None, group_all=False, is_training=is_training,
                                                                    bn_decay=bn_decay, scope='layer1')
	end_points['l1_indices_f1'] = l1_indices_f1

得到第一帧的第一层，
l1_xyz_f1的维度：b * npoint(1024) * 3;
l1_points_f1的维度：b * npoint(1024) * mlp[-1] (64);
l1_indices_f1的维度：batch_size, npoint, nsample(16)。

Layer 2

# Frame 1, Layer 2
	l2_xyz_f1, l2_points_f1, l2_indices_f1 = pointnet_sa_module(l1_xyz_f1, l1_points_f1, npoint=256, radius=RADIUS2,
                                                                    nsample=16, mlp=[64, 64, 128], mlp2=None,
                                                                    group_all=False, is_training=is_training,
                                                                    bn_decay=bn_decay, scope='layer2')
	end_points['l2_indices_f1'] = l2_indices_f1

得到第一帧的第二层，
l2_xyz_f1的维度：b * npoint(256) * 3;
l2_points_f1的维度：b * npoint(256) * mlp[-1] = 128;
l2_indices_f1的维度：batch_size, npoint, nsample(16)。

类似对第二帧进行了相同操作，结果列出如下：

l1_xyz_f2的维度：b * npoint(1024) * 3;
l1_points_f2的维度：b * npoint(1024) * mlp[-1] = 64;
l1_indices_f2的维度：batch_size, npoint, nsample(16)。

l2_xyz_f2的维度：b * npoint(256) * 3;
l2_points_f2的维度：b * npoint(256) * mlp[-1] = 128;
l2_indices_f2的维度：batch_size, npoint, nsample(16)。

然后是embedding。

# 经过下面这一个embedding以后，就输入两帧得到的输出是 frame1的坐标与flow embedding
    # batch_size, npoint, mlp[-1]
    _, l2_points_f1_new = flow_embedding_module(l2_xyz_f1, l2_xyz_f2, l2_points_f1, l2_points_f2, radius=10.0,
                                                nsample=64, mlp=[128, 128, 128], is_training=is_training,
                                                bn_decay=bn_decay, scope='flow_embedding', bn=True, pooling='max',
                                                knn=True, corr_func='concat')

pointnet_util.py

pointnet_sa_module

pointnet_sa_module 这个方法应该是进行下采样的，论文中写的三个部分(上采样，flow embedding，下采样)，我想大概是为了进行对于原始点集的特征学习。

在不仔细看代码之前，可以得知，这个方法改变了输入点集的点的个数(比较明显)，因为算法里使用了最远点采样，npoint就是这个采样方法得到的输出的点的个数。

这一层主要经过了以下几个步骤：
(1) sampling and grouping: 作用是将整个点云分散成局部的group，对每一个group都可以用PointNet单独的提取局部的全局特征
(2) Point Feature Embedding: 2维卷积
(3) Pooling in Local Regins: 池化

在代码下方进行详细解释。

def pointnet_sa_module(xyz, points, npoint, radius, nsample, mlp, mlp2, group_all, is_training, bn_decay, scope, bn=True, pooling='max', knn=False, use_xyz=True, use_nchw=False):
    ''' PointNet Set Abstraction (SA) Module
        Input:
            xyz: (batch_size, ndataset, 3) TF tensor
            points: (batch_size, ndataset, channel) TF tensor
            npoint: int32 -- #points sampled in farthest point sampling
            radius: float32 -- search radius in local region
            nsample: int32 -- how many points in each local region
            mlp: list of int32 -- output size for MLP on each point
            mlp2: list of int32 -- output size for MLP on each region
            group_all: bool -- group all points into one PC if set true, OVERRIDE
                npoint, radius and nsample settings
            use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features
            use_nchw: bool, if True, use NCHW data format for conv2d, which is usually faster than NHWC format
        Return:
            new_xyz: (batch_size, npoint, 3) TF tensor
            new_points: (batch_size, npoint, mlp[-1] or mlp2[-1]) TF tensor
            idx: (batch_size, npoint, nsample) int32 -- indices for local regions
    '''

(感谢良心注释)

第一部分 Sample and Grouping

# Sample and Grouping
        if group_all:
            nsample = xyz.get_shape()[1].value
            new_xyz, new_points, idx, grouped_xyz = sample_and_group_all(xyz, points, use_xyz)
        else:
            new_xyz, new_points, idx, grouped_xyz = sample_and_group(npoint, radius, nsample, xyz, points, knn, use_xyz)

所有作者用到这个方法的地方group_all都是false，即不把所有的点当成一个group。

在这里使用了sample_and_group方法：
文章中使用到的knn与use_xyz都是采用默认值，即knn = false, use_xyz = true。
这是sampling&Grouping的重要函数。

def sample_and_group(npoint, radius, nsample, xyz, points, knn=False, use_xyz=True):
    '''
    Input:
        npoint: int32
        radius: float32
        nsample: int32
        xyz: (batch_size, ndataset, 3) TF tensor
        points: (batch_size, ndataset, channel) TF tensor, if None will just use xyz as points
        knn: bool, if True use kNN instead of radius search
        use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features
    Output:
        new_xyz: (batch_size, npoint, 3) TF tensor
        new_points: (batch_size, npoint, nsample, 3+channel) TF tensor
        idx: (batch_size, npoint, nsample) TF tensor, indices of local points as in ndataset points
        grouped_xyz: (batch_size, npoint, nsample, 3) TF tensor, normalized point XYZs
            (subtracted by seed point XYZ) in local regions
    '''

    new_xyz = gather_point(xyz, farthest_point_sample(npoint, xyz)) # (batch_size, npoint, 3)
    if knn:  # 没有用可以不看
        _,idx = knn_point(nsample, xyz, new_xyz)
    else:
        # idx:[B, npoint, nsample] 
        idx, pts_cnt = query_ball_point(radius, nsample, xyz, new_xyz)
    grouped_xyz = group_point(xyz, idx) # (batch_size, npoint, nsample, 3)
    grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization
    if points is not None:
        grouped_points = group_point(points, idx) # (batch_size, npoint, nsample, channel)
        if use_xyz:
            new_points = tf.concat([grouped_xyz, grouped_points], axis=-1) # (batch_size, npoint, nample, 3+channel)
        else:
            new_points = grouped_points
    else:
        new_points = grouped_xyz

    return new_xyz, new_points, idx, grouped_xyz

先使用gather_point函数得到最远点采样的new_xyz。
再通过query_ball_point函数得到每个样本(以new_xyz作为球形邻域的中心)的每个球形领域的nsample个采样点集的索引idx。(此部分末尾有详细介绍)
idx: [B, npoint, nsample] 代表npoint个球形区域中每个区域的nsample个采样点的索引。
再通过group_point(xyz, idx) 得到grouped_points，也就是每一个batch的每一个npoint的每一个nsample的坐标。# (batch_size, npoint, nsample, 3)
接下来是注释说叫translation normalization的东西，就是下面这一行代码，它的目的就是在grouped_xyz中减去中心的点(最远点采样得到的new_xyz)。

grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization
#new_xyz:(b, npoint, 3) --> (b, npoint, 1, 3) --> (b, npoint, nsample, 3)

最后是如果每个点上面有新的特征的维度，则用新的特征与旧的特征拼接，否则直接返回旧的特征。这里因为use_xyz是true，所以拼接 XYZ三维坐标和channel个local point features。

参考

有关tf.tile与tf.expand_dims
关于tf.tile与tf.expand_dims
其中tf.expand_dims对tensor进行了维度扩展，第一个参数是处理的tensor，第二个参数是在原tensor的哪里增加一个维度。
tf.tile对当前张量内的数据进行一定规则的复制。最终的输出张量维度不变。
有关query_ball_point
query_ball_point函数，参考下面：

感谢xd的介绍
 这个感觉很详细

     大概就是：
     这一层使用Ball query方法生成N’个局部区域，根据论文中的意思，这里有两个变量，一个是每个区域中点的数量K，另一个是球的半径。这里半径应该是占主导的，会在某个半径的球内找点，上限是K。球的半径和每个区域中点的数量都是人指定的。
     query_ball_point函数用于寻找球形领域中的点。输入中radius为球形领域的半径，nsample为每个领域中要采样的点，new_xyz为S个球形领域的中心（由最远点采样在前面得出），xyz为所有的点云；输出为每个样本的每个球形领域的nsample个采样点集的索引[B, S, nsample][B,S,nsample]，详细的解析都在备注里。

第二部分 Point Feature Embedding

 # Point Feature Embedding
        if use_nchw: new_points = tf.transpose(new_points, [0,3,1,2])
        for i, num_out_channel in enumerate(mlp):
            new_points = tf_util.conv2d(new_points, num_out_channel, [1,1],
                                        padding='VALID', stride=[1,1],
                                        bn=bn, is_training=is_training,
                                        scope='conv%d'%(i), bn_decay=bn_decay,
                                        data_format=data_format)
        if use_nchw: new_points = tf.transpose(new_points, [0,2,3,1])

这里主要使用了tf_util.conv2d。
use_nchw的默认值是false，作者没有传入，使用默认值。
mlp三维，每次调用的时候传入，以Frame 1, Layer 1为例，mlp=[32, 32, 64]。
输入的new_points: (batch_size, npoint, nsample, 3+channel)

def conv2d(inputs,
           num_output_channels,
           kernel_size,
           scope,
           stride=[1, 1],
           padding='SAME',
           data_format='NHWC',
           use_xavier=True,
           stddev=1e-3,
           weight_decay=None,
           activation_fn=tf.nn.relu,
           bn=False,
           bn_decay=None,
           is_training=None):
    """ 2D convolution with non-linear operation.

  Args:
    inputs: 4-D tensor variable BxHxWxC
    num_output_channels: int
    kernel_size: a list of 2 ints
    scope: string
    stride: a list of 2 ints
    padding: 'SAME' or 'VALID'
    data_format: 'NHWC' or 'NCHW'
    use_xavier: bool, use xavier_initializer if true
    stddev: float, stddev for truncated_normal init
    weight_decay: float
    activation_fn: function
    bn: bool, whether to use batch norm
    bn_decay: float or float tensor variable in [0,1]
    is_training: bool Tensor variable

  Returns:
    Variable tensor
  """
    with tf.variable_scope(scope) as sc:
        kernel_h, kernel_w = kernel_size
        assert (data_format == 'NHWC' or data_format == 'NCHW')

        # 根据不同的数据类型得到输入图像的通道数
        # 输出图像的通道数由mlp[i]决定
        if data_format == 'NHWC':
            num_in_channels = inputs.get_shape()[-1].value
        elif data_format == 'NCHW':
            num_in_channels = inputs.get_shape()[1].value
            
        kernel_shape = [kernel_h, kernel_w,
                        num_in_channels, num_output_channels]
        kernel = _variable_with_weight_decay('weights',
                                             shape=kernel_shape,
                                             use_xavier=use_xavier,
                                             stddev=stddev,
                                             wd=weight_decay)
        stride_h, stride_w = stride

        # inputs: batch * npoint * nsample * 3+channel
        # kenel: 1 * 1 * 3+channel * mlp[i]
        # outputs: batch * npoint-2 * nsample-2 * mlp[i]
        outputs = tf.nn.conv2d(inputs, kernel,
                               [1, stride_h, stride_w, 1],
                               padding=padding,
                               data_format=data_format)

        # 给outputs每个点的mlp[i]channnels加上biases
        biases = _variable_on_cpu('biases', [num_output_channels],
                                  tf.constant_initializer(0.0))
        outputs = tf.nn.bias_add(outputs, biases, data_format=data_format)

        # 输出batch方向的归一化tensor
        if bn:
            outputs = batch_norm_for_conv2d(outputs, is_training,
                                            bn_decay=bn_decay, scope='bn',
                                            data_format=data_format)

        if activation_fn is not None:
            outputs = activation_fn(outputs)
        return outputs

使用tf_util.conv2d更新了new_points(整合了每一个邻域采样的点的坐标与channel)。

首先根据传入的2维kernel确定kernel的长和宽，再根据输入的具体格式找到输入的通道数，这里用的是NHWC，所以num_in_channels = inputs.get_shape()[-1].value，在构造四维的kernel(h, w, 输入的通道，输出的通道)
使用variable_with_weight_decay来进行一些处理，作用是返回一个weight的tensor，不改变kernel的大小，目的是为了保障不同layer的weight近乎相同，具体啥意思我目前也不是很清楚。
设置步长是1，1
进行核心操作：二维卷积tf.nn.conv2d，stride虽然是四维的，但是第一维(对应input的batch是1，第四维也是1)。卷积核的大小是11，步长是11，padding是VALID。outputs的维度是()
给outputs每个点的mlp[i]channnels加上biases偏差，我也不知道在干嘛。
输出batch方向的归一化tensor，我也不知道在干嘛。
激励函数

参考

_variable_with_weight_decay
写的很好很透彻：tf.nn.cov
batch_norm_for_conv2d
别人的很全面的详细的讲解

第三部分 Pooling in Local Regions

采取了最大池化，由于axis = [2]，所以在new_point的第三维进行最大池化，new_points的维度变化：(batch, npoint, nsample, mlp) -> ((batch, npoint, mlp)，但由于keep_dims = true，所以保留第三个维度，大小为1，new_points的最终维度：(batch, npoint, 1, mlp)

        if pooling=='max':
            new_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')

tf.reduce_max例子

第四部分 [Optional] Further Processing

如果传入了第二个mlp参数，就进行第二次卷积。

第五部分

最后进行维度的修正，删除在池化中被改为1的第三个维度。
new_points: (batch, npoint, 1 , mlp) -> ((batch, npoint , mlp))

# tf.squeeze删除了指定的那一个维度，但要求这一维度一定是1
        new_points = tf.squeeze(new_points, [2]) # (batch_size, npoints, mlp2[-1])
        return new_xyz, new_points, idx

flow_embedding_module

输入是经过两层下采样的frame1和frame2点集，输出是大小没有变化的新frame1点集。

 """
    Input:
        xyz1: (batch_size, npoint, 3)
        xyz2: (batch_size, npoint, 3)
        feat1: (batch_size, npoint, channel)
        feat2: (batch_size, npoint, channel)
    Output:
        xyz1: (batch_size, npoint, 3)
        feat1_new: (batch_size, npoint, mlp[-1])
    """

通过这一层得到两帧之间点云的关系，得到场景流。
关于这一层到底做了什么，文章中这么描述：
对于 frame1 中的某一个给定点 pi , 在 frame2 中找到 pi 给定半径内邻域内的所有点 qj（图中蓝色的点）。如果我们能知道 frame1 点云中 pi 点对应于 frame2 点云中的哪一点，那么场景流很容易得到，但是我们不知道，所以采取一个神经网络，用对半径内每个点进行投票的方式来确定最大可能是哪个点。
原论文中，主要是投入两个点的特征，以此希望得到一些weights来获取点集之间的对应关系。

we input two point features to h, expecting it to learn to compute the “weights” to aggregate all potential flow vectors d_ij=y_j−x_i.

首先得到frame2中在 frame1 的点邻域中的点的坐标，即xyz2_grouped, xyz-diff是所有的距离。

    if knn:
        # 由论文倒退，我觉得这里是在 找可能与xyz1中的点有关系的xyz2中的点
        _, idx = knn_point(nsample, xyz2, xyz1)  # npooint = 256 nsample = 64
    else:
        # 如果不用knn，这里在找到xyz1的radius半径内的nsample个xyz2点
        idx, cnt = query_ball_point(radius, nsample, xyz2, xyz1)
        _, idx_knn = knn_point(nsample, xyz2, xyz1)
        cnt = tf.tile(tf.expand_dims(cnt, -1), [1, 1, nsample])
        idx = tf.where(cnt > (nsample - 1), idx, idx_knn)  # 如果cnt > (nsample - 1) 那么返回idx，否则，相反

    # xyz2_grouped就是那些蓝色的点
    xyz2_grouped = group_point(xyz2, idx)  # batch_size, npoint, nsample, 3
    xyz1_expanded = tf.expand_dims(xyz1, 2)  # batch_size, npoint, 1, 3
    # 相当于找了xyz2中蓝色的点到xyz1的店(中心点)的距离
    xyz_diff = xyz2_grouped - xyz1_expanded  # batch_size, npoint, nsample, 3

再得到相应的各自feature, 如论文中所说，直接把各自的feature投入网络，这样的效果最好，所以代码中采用了tf.concat拼接两帧点云的feature。

# 找到这些操作点的feature
    feat2_grouped = group_point(feat2, idx)  # batch_size, npoint, nsample, channel
    feat1_expanded = tf.expand_dims(feat1, 2)  # batch_size, npoint, 1, channel

    # TODO: change distance function
    
    # 选择了这一个，我不能理解的点是，原文中明明说了把两个特征的距离差投进去效果不好，不如直接投入两个各自的特征，但这里依然是投入了两个的差
    # 好像明白了，虽然这里是叫做diff，但其实它只是concat了而已。
    elif corr_func == 'concat':
        feat_diff = tf.concat(axis=-1, values=[feat2_grouped, tf.tile(feat1_expanded, [1, 1, nsample,
                                                                                       1])])  # batch_size, npoint, sample, channel*2

接下来就是mlp部分，非线性函数来进行一些操作。以及最后的max池化操作，由于keep_dims = False，所以经过池化之后直接四维变成三维。

# TODO: move scope to outer indent
    with tf.variable_scope(scope) as sc:
        for i, num_out_channel in enumerate(mlp):
            feat1_new = tf_util.conv2d(feat1_new, num_out_channel, [1, 1],
                                       padding='VALID', stride=[1, 1],
                                       bn=True, is_training=is_training,
                                       scope='conv_diff_%d' % (i), bn_decay=bn_decay)
    if pooling == 'max':
        feat1_new = tf.reduce_max(feat1_new, axis=[2], keep_dims=False, name='maxpool_diff')
    elif pooling == 'avg':
        feat1_new = tf.reduce_mean(feat1_new, axis=[2], keep_dims=False, name='avgpool_diff')
    return xyz1, feat1_new

set_upconv_module

主要的作用是把学习到的部分点的场景流放大，推出所有点的场景流。

flying_things_dataset.py

和数据集有关的一个类，其间有各种路径，所以我会觉得有一点困难。
因为我觉得这个部分的代码比较简短易理解，主要就是很多储备知识不足，所以先搞一点背景知识。(正文在背景知识后面)

背景知识

glob

glob模块是python自己带的一个文件操作相关模块，用它可以查找符合自己目的的文件。
glob的主要方法就是glob。该方法返回所有匹配的文件路径列表(list)，该方法需要一个参数用来指定匹配的路径字符串（字符串可以为绝对路径也可以为相对路径，其返回的文件名只包括当前目录里的文件名，不包括子文件夹里的文件。

比如：
glob.glob(r’c:*.txt’)
获得C盘下的所有txt文件
glob.glob(r’E:\pic**.jpg’)
获得指定目录下的所有jpg文件
详细的例子

.npz文件的读写

首先介绍.npy文件，它是Numpy专用的二进制格式，使用时，数组会以未压缩的原始二进制格式保存在扩展名为.npy的文件中。
npz文件是一种压缩文件，可以将多个数组保存到同一个文件中。
对于.npz，使用的主要函数是：

np.savez() - 将多个数据保存到一个文件中
np.load() - 读取文件，返回的是一个类似于字典的对象

参考：numpy——.npy和.npz文件

import numpy as np

# 将多个数组保存到磁盘
a = np.arange(5)
b = np.arange(6)
c = np.arange(7)
np.savez('test', a, b, c_array=c)  # c_array是数组c的命名
# 读取数组
data = np.load('test.npz')  #类似于字典{‘arr_0’:a,’arr_1’:b,’c_array’:c}
print('arr_0 : ', data['arr_0'])
print('arr_1 : ', data['arr_1'])
print('c_array : ', data['c_array'])

--------------------------------------------------------------------------------
arr_0 :  [0 1 2 3 4]
arr_1 :  [0 1 2 3 4 5]
c_array :  [0 1 2 3 4 5 6]

numpy 中的 shape 方法

numpy 中的 shape 方法返回的是一个数组的大小。

a.shape()
a.shape.[i] 返回a的第i维的大小

from numpy import *
a = array([[1, 2], [3, 4], [5, 6], [7, 8]])
print(a)
print(a.shape)
print(a.shape[0])
------------------------------------------------------
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
 
(4, 2)

4

np.random.choice

numpy.random.choice(a, size=None, replace=True, p=None)
从a(只要是ndarray都可以，但必须是一维的)中随机抽取数字，并组成指定大小(size)的数组
replace:True表示可以取相同数字，False表示不可以取相同数字
数组p：与数组a相对应，表示取数组a中每个元素的概率，默认为选取每个元素的概率相同。
cr: np.random.choice

为什么要对color的RGB归一化

在神经网络里，输入RGB图片的时候，通常要除以255，把像素值对应到0和1之间。
参考：深度学习中图像为什么要归一化？
灰度数据表示（为什么要除255）

正文

这样，这一部分的代码意思比较明朗了，npoints表示从原始数据中保留的点数，root表示data存放文件夹的目录，datapath表示从数据文件夹中找到所有的TRAIN.npz或者TEST.npz，使用datapath[index]来表示第几个npz文件。

import os
import os.path
import json
import numpy as np
import sys
import pickle
import glob


class SceneflowDataset():
    def __init__(self, root='data_preprocessing/data_processed_maxcut_35_both_mask_20k_2k', npoints=2048, train=True):
        self.npoints = npoints
        self.train = train
        self.root = root
        if self.train:
            self.datapath = glob.glob(os.path.join(self.root, 'TRAIN*.npz'))  # 读取文件
        else:
            self.datapath = glob.glob(os.path.join(self.root, 'TEST*.npz'))
        self.cache = {}
        self.cache_size = 30000

        ###### deal with one bad datapoint with nan value
        self.datapath = [d for d in self.datapath if 'TRAIN_C_0140_left_0006-0' not in d]  # ???
        ######

    def __getitem__(self, index):
        if index in self.cache:
            pos1, pos2, color1, color2, flow, mask1 = self.cache[index]
        else:
            fn = self.datapath[index]
            # 'rb': 以二进制格式打开一个文件用于只读。文件指针将会放在文件的开头
            with open(fn, 'rb') as fp:
                data = np.load(fp)
                pos1 = data['points1']
                pos2 = data['points2']
                # 这里对RGB的值进行了归一化处理
                color1 = data['color1'] / 255
                color2 = data['color2'] / 255
                flow = data['flow']
                mask1 = data['valid_mask1']

            if len(self.cache) < self.cache_size:
                self.cache[index] = (pos1, pos2, color1, color2, flow, mask1)
                # 所以cache[index]的值是上面这样一个元组
                # cache = {index: (pos1, pos2, color1, color2, flow, mask1)}

        # 如果这一个是训练数据
        if self.train:
            # n1是pos1第一维的大小
            n1 = pos1.shape[0]
            # 从n1中不放回地随机抽取npoints个数字
            sample_idx1 = np.random.choice(n1, self.npoints, replace=False)
            n2 = pos2.shape[0]
            sample_idx2 = np.random.choice(n2, self.npoints, replace=False)
            # 进行采样之后的一组新的数据
            pos1_ = np.copy(pos1[sample_idx1, :])
            pos2_ = np.copy(pos2[sample_idx2, :])
            color1_ = np.copy(color1[sample_idx1, :])
            color2_ = np.copy(color2[sample_idx2, :])
            flow_ = np.copy(flow[sample_idx1, :])
            mask1_ = np.copy(mask1[sample_idx1])
        # 如果不是训练数据，直接取出前npoints个点
        else:
            pos1_ = np.copy(pos1[:self.npoints, :])
            pos2_ = np.copy(pos2[:self.npoints, :])
            color1_ = np.copy(color1[:self.npoints, :])
            color2_ = np.copy(color2[:self.npoints, :])
            flow_ = np.copy(flow[:self.npoints, :])
            mask1_ = np.copy(mask1[:self.npoints])

        return pos1_, pos2_, color1_, color2_,![在这里插入图片描述](https://img-blog.csdnimg.cn/aa991b8d9cc94705bdc022d7e968d77e.jpg?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA5Lqm5b61,size_20,color_FFFFFF,t_70,g_se,x_16#pic_center)
 flow_, mask1_

    def __len__(self):
        return len(self.datapath)

写了一个简单的脚本，具体感受一下这个类。
作为原始数据文件夹的data中有6个TRAIN.npz文件，3个TEST.npz。

如下图所示：

if __name__ == '__main__':
    # import mayavi.mlab as mlab
    d = SceneflowDataset(npoints=2048)

    print("length of d: ", len(d))  # 看看datapath里面有多少个TRAIN.npz
    # 输出：6

    print("d.cache: ", d.get_cache())  # 这个时候cache是空的
    # 输出：{}

    print("length of d[1]: ", len(d[1]))  # cache字典的第一个 是一个长度为6的元组
    # 输出：6
    
    pos1_, pos2_, color1_, color2_, flow_, mask1_ = d[1]
    print("pos1: ", pos1_.shape)
    # 输出： pos1:  (2048, 3)

    print("pos2: ", pos2_.shape)
    # 输出： pos2:  (2048, 3)

    print("color1_: ", color1_.shape)
    # 输出： color1_:  (2048, 3)

    print("color2_: ", color2_.shape)
    # 输出： color2_:  (2048, 3)

    print("flow: ", flow_.shape)
    # 输出： flow:  (2048, 3)
    
    print("mask: ",mask1_.shape)
    # 输出：  mask:  (2048,)

亦徵

关注

3
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
对于FlowNet3D论文代码的理解(pointnet++)

对于FlowNet3D论文代码的理解包括train.py，model_concat_upsa.py，pointnet_util.py，flying_things_dataset.py, pointnet_sa_module, flow_embedding_module, set_upconv_module结合各位优秀博主的讲解，努力消化，努力整合
复制链接

扫一扫

专栏目录