【MMDetection3D】VoxelNet Related Code In MMDetection3D

倔强一撮毛

已于 2022-10-14 20:29:35 修改

阅读量615

点赞数 2

分类专栏： MMDetection3D 文章标签：目标检测人工智能深度学习

于 2022-10-14 20:22:32 首次发布

本文链接：https://blog.csdn.net/weixin_41658139/article/details/127327093

版权

MMDetection3D 专栏收录该内容

1 篇文章 1 订阅

订阅专栏

VoxelNet Related Code In MMDetection3D

`mmdetection3d/mmdet3d/models/task_modules/voxel`

第一个有关VoxelNet的代码位于mmdetection3d/mmdet3d/models/task_modules/voxel这个package中。由于MMDetection3D的版本从2023年开始将默认主分支为1.1分支，所以将以1.1版本作为讲解。其它版本可以在MMDetection3D的GitHub Repository主页点击T，然后输入VoxelNet搜索相关文件，找到和下述代码类似的.py文件即可。

这个package中主要文件是voxel_generator.py，定义的类和函数如下：

class VoxelGenerator()
def points_to_voxel()
def _points_to_voxel_reverse_kernel()
def _points_to_voxel_kernel()

测试代码位于：mmdetection3d/tests/test_models/test_task_modules/test_voxel/test_voxel_generator.py

`class VoxelGenerator()`

用一个类来实现voxel的generating。类中除了一些辅助函数（@property装饰的打印属性的函数，__repr__的打印类的函数）外，主要就是初始化函数和生成函数了，代码段如下：

class VoxelGenerator(object):
    """用numpy实现的voxel generator.
    Args:
        voxel_size (list[float]): 一个体素的大小.
        point_cloud_range (list[float]): 点云的范围.
        max_num_points (int): 每个体素中最大的点数量.
        max_voxels (int, optional): 最大体素数量. 默认为: 20000.
    """

    def __init__(self,
                 voxel_size,
                 point_cloud_range,
                 max_num_points,
                 max_voxels=20000):

				# [0, -40, -3, 70.4, 40, 1], xyzxyz, minmax, 参考VoxelNet论文
        point_cloud_range = np.array(point_cloud_range, dtype=np.float32)
        # [0.2, 0.2, 0.4], xyz, 参考VoxelNet论文
        voxel_size = np.array(voxel_size, dtype=np.float32)
				# [352, 400, 10], xyz, 参考VoxelNet论文
        grid_size = (point_cloud_range[3:] -
                     point_cloud_range[:3]) / voxel_size
        grid_size = np.round(grid_size).astype(np.int64)

        self._voxel_size = voxel_size
        self._point_cloud_range = point_cloud_range
        self._max_num_points = max_num_points
        self._max_voxels = max_voxels
        self._grid_size = grid_size

    def generate(self, points):
        """给定点云points生成voxels."""
        return points_to_voxel(points, self._voxel_size,
                               self._point_cloud_range, self._max_num_points,
                               True, self._max_voxels)

`def points_to_voxel()`

在下面展示的代码中舍弃一些数据类型的转换部分，只提取比较核心的变量和代码。

def points_to_voxel(points,
                    voxel_size,
                    coors_range,
                    max_points=35,
                    reverse_index=True,
                    max_voxels=20000):
    """将KITTI点云(N, >=3)转换为体素.
    Args:
        points (np.ndarray): [N, ndim]. points[:, :3] 包含xyz, points[:, 3:] 包含像反射
						强度这样的其它信息.
        voxel_size (list, tuple, np.ndarray): [3] xyz, 指示体素大小.
        coors_range (list[float] | tuple[float] | ndarray]): 体素的范围 (也是点云的范围).
            格式为: xyzxyz, minmax
        max_points (int): 指定一个体素中包含的最多点数.
        reverse_index (bool): 是否返回反转坐标.
            如果点是xyz格式坐标并且这个参数为True, 那么输出坐标格式就是zyx, 但是特征中点通常
						为xyz格式.
        max_voxels (int): 这个函数创建的最多体素个数. 
            对于SECOND论文方法, 20000是不错的选择. 在这个函数之前, 点应该被随机打乱, 因为这
						个参数会导致舍弃一些点.

    Returns:
        tuple[np.ndarray]:
            voxels: [M, max_points, ndim] float tensor. only contain points.
            coordinates: [M, 3] int32 tensor.
            num_points_per_voxel: [M] int32 tensor.
    """
		# (10, 400, 352)
    voxelmap_shape = (coors_range[3:] - coors_range[:3]) / voxel_size
    voxelmap_shape = tuple(np.round(voxelmap_shape).astype(np.int32).tolist())
    if reverse_index:
        voxelmap_shape = voxelmap_shape[::-1]
    # don't create large array in jit(nopython=True) code.
		# 维护一个(20000,)的矩阵来记录每个体素中的点数量, 初始化为0
    num_points_per_voxel = np.zeros(shape=(max_voxels, ), dtype=np.int32)
		# 维护一个(10, 400, 352)的矩阵来记录有效体素编号, 初始化为-1
    coor_to_voxelidx = -np.ones(shape=voxelmap_shape, dtype=np.int32)
		# 维护一个(20000, 35, 3)的矩阵来记录每个有效体素中每个点的坐标反射强度等信息, 初始化为0
    voxels = np.zeros(
        shape=(max_voxels, max_points, points.shape[-1]), dtype=points.dtype)
		# 维护一个(20000, 3)的矩阵来记录有效体素在所有体素中的三维索引
    coors = np.zeros(shape=(max_voxels, 3), dtype=np.int32)
    if reverse_index:
        voxel_num = _points_to_voxel_reverse_kernel(
            points, voxel_size, coors_range, num_points_per_voxel,
            coor_to_voxelidx, voxels, coors, max_points, max_voxels)

    else:
        voxel_num = _points_to_voxel_kernel(points, voxel_size, coors_range,
                                            num_points_per_voxel,
                                            coor_to_voxelidx, voxels, coors,
                                            max_points, max_voxels)
		# 得到有效体素在所有体素中的三维索引, [M, 3]
    coors = coors[:voxel_num]
		# 每个有效体素中每个点的坐标反射强度等信息
    voxels = voxels[:voxel_num]
		# 每个有效体素中的点数量
    num_points_per_voxel = num_points_per_voxel[:voxel_num]

    return voxels, coors, num_points_per_voxel

`def _points_to_voxel_reverse_kernel()`

这是整个转换过程的核心部分。

在下面展示的代码中舍弃一些无关的注释，只展示核心代码。

def _points_to_voxel_reverse_kernel(points,
                                    voxel_size,
                                    coors_range,
                                    num_points_per_voxel,
                                    coor_to_voxelidx,
                                    voxels,
                                    coors,
                                    max_points=35,
                                    max_voxels=20000):
    """将KITTI点(N, >=3)转换为体素.
    Args:
        points (np.ndarray): [N, ndim]. points[:, :3] 包含xyz. points[:, 3:] 包含像反射
						强度这样的其它信息.
        voxel_size (list, tuple, np.ndarray): [3] xyz, 指明体素大小
        coors_range (list[float] | tuple[float] | ndarray]): 体素范围.
            格式为: xyzxyz, minmax
        num_points_per_voxel (int): 每个体素中点的数量.
        coor_to_voxel_idx (np.ndarray): 形状为 (D, H, W) 的体素格, 表明了每个对应体素的
						索引.
        voxels (np.ndarray): 创建空体素.
        coors (np.ndarray): 创建每个体素的坐标.
        max_points (int): 指定一个体素中包含的最多点数.
        max_voxels (int): 这个函数创建的最多体素数.

    Returns:
        tuple[np.ndarray]:
            voxels: Shape [M, max_points, ndim], only contain points.
            coordinates: Shape [M, 3].
            num_points_per_voxel: Shape [M].
    """
		# 点云中共N个点
    N = points.shape[0]
		# xyz维度为3
    ndim = 3
    ndim_minus_1 = ndim - 1
		# (352, 400, 10)
    grid_size = (coors_range[3:] - coors_range[:3]) / voxel_size
    grid_size = np.round(grid_size, 0, grid_size).astype(np.int32)
		# 一个点的坐标(3,), 初始化为0
    coor = np.zeros(shape=(3, ), dtype=np.int32)
		# 体素编号, 从0开始
    voxel_num = 0
		# 指示当前处理的点是否在范围内
    failed = False
		# 遍历点云中每一个点
    for i in range(N):
				# 重置failed
        failed = False
				# 遍历这个点的xyz坐标信息
        for j in range(ndim):
						# 得到点在x或y或z维度上的体素序号
            c = np.floor((points[i, j] - coors_range[j]) / voxel_size[j])
						# 根据这个维度的体素序号判断这个点是否有效
            if c < 0 or c >= grid_size[j]:
								# 如果无效, 设置failed
                failed = True
								# 直接跳出循环, 不再判断其它xyz的维度
                break
						# 如果点是有效的就存储这个点所在体素各维度的索引值到coor, 按zyx
            coor[ndim_minus_1 - j] = c
				# 如果这个点不在有效范围内就继续处理下一个点
        if failed:
            continue
				# 如果点在有效范围内
				# 获取当前这个点的体素索引值, 如果之前该体素没有点, 得到-1, 否则为k
        voxelidx = coor_to_voxelidx[coor[0], coor[1], coor[2]]
				# 如果这个体素之前没有点
        if voxelidx == -1:
						# 那么更新当前的voxelidx
            voxelidx = voxel_num
						# 如果当前有效体素总数已经超过最大体素数量则提前结束对当前点的处理
            if voxel_num >= max_voxels:
                continue
						# 否则将体素总数+1
            voxel_num += 1
						# 更新当前点所在体素的编号
            coor_to_voxelidx[coor[0], coor[1], coor[2]] = voxelidx
						# 存储这个体素的索引值
            coors[voxelidx] = coor
				# 获取当前点所在体素的点数
        num = num_points_per_voxel[voxelidx]
				# 如果没有超过当前体素采样的最大点数
        if num < max_points:
						# 将当前点的坐标信息存入
            voxels[voxelidx, num] = points[i]
						# 这个体素中的点数量+1
            num_points_per_voxel[voxelidx] += 1
		# 最后返回整个体素中有效的体素数量
    return voxel_num

`def _points_to_voxel_kernel()`

这也是整个转换过程的核心部分。和前面的代码只有一行不同：coor[ndim_minus_1 - j] = c。在此不赘述。

`mmdetection3d/mmdet3d/models/voxel_encoders`

第二个有关VoxelNet的代码位于mmdetection3d/mmdet3d/models/voxel_encoders这个package中。由于MMDetection3D的版本从2023年开始将默认主分支为1.1分支，所以将以1.1版本作为讲解。其它版本可以在MMDetection3D的GitHub Repository主页点击T，然后输入VoxelNet搜索相关文件，找到和下述代码类似的.py文件即可。

这个package中主要文件是utils.py，定义的类和函数如下：

class VFELayer()

`class VFELayer()`

这个类所定义的应该就是VoxelNet论文中的Voxel Feature Encoding Layer.

class VFELayer(nn.Module):
    """Voxel Feature Encoder layer.
		voxel encoder包括一系列这样的层.
    这个模块不支持平均池化, 只支持使用最大池化来聚集VFE内的特征.
    
    Args:
        in_channels (int): 输入通道数.
        out_channels (int): 输出通道数.
        norm_cfg (dict): normalization层的配置字典.
        max_out (bool): 是否聚合每个体素内点特征并且只返回体素特征. 
        cat_max (bool): 是否拼接聚合的特征和点特征. 
    """

    def __init__(self,
                 in_channels,
                 out_channels,
                 norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.01),
                 max_out=True,
                 cat_max=True):
        super(VFELayer, self).__init__()
        self.fp16_enabled = False  # 是否使用float point 16 type
        self.cat_max = cat_max
        self.max_out = max_out
        # self.units = int(out_channels / 2)

        self.norm = build_norm_layer(norm_cfg, out_channels)[1]  # 参考mmcv.cnn中函数
        self.linear = nn.Linear(in_channels, out_channels, bias=False)

    def forward(self, inputs):
        """前向传播函数.
        Args:
            inputs (torch.Tensor): 形状为 (N, M, C) 的体素特征.
                N是体素个数, M是体素中点个数, C是点特征的通道数.
        Returns:
            torch.Tensor: 体素特征. 下面有三种模式, 不同模式下特征有不同含义.
                - `max_out=False`: 返回形状为(N, M, C)的point-wise features.
                - `max_out=True` and `cat_max=False`: 返回形状为(N, C)的聚合体素特征.
                - `max_out=True` and `cat_max=True`: 返回形状为(N, M, C)的拼接
											point-wise features.
        """
        # [K, T, 7] tensordot [7, units] = [K, T, units]
        voxel_count = inputs.shape[1]

        x = self.linear(inputs)
        x = self.norm(x.permute(0, 2, 1).contiguous()).permute(0, 2,
                                                               1).contiguous()
        pointwise = F.relu(x)
        # [K, T, units]
        if self.max_out:
            aggregated = torch.max(pointwise, dim=1, keepdim=True)[0]
        else:
            # this is for fusion layer
            return pointwise

        if not self.cat_max:
            return aggregated.squeeze(1)
        else:
            # [K, 1, units]
            repeated = aggregated.repeat(1, voxel_count, 1)
            concatenated = torch.cat([pointwise, repeated], dim=2)
            # [K, T, 2 * units]
            return concatenated