Openpcdet 系列 Pointpillar代码逐行解析之Voxel Feature Encoding (VFE)模块

Jack_Man_N

已于 2023-11-30 11:58:40 修改

阅读量1.4k

点赞数 27

分类专栏： OpenPCDet专栏文章标签：目标跟踪人工智能计算机视觉

于 2023-11-25 12:48:52 首次发布

本文链接：https://blog.csdn.net/Jack_Man_N/article/details/134613118

版权

OpenPCDet专栏专栏收录该内容

11 篇文章 12 订阅

订阅专栏

OpenPCdet的 VFE模块

在这里插入图片描述

整个VFE模块包含这些不同的Voxel Feature
Encoding的模块，后续会逐一讲解，Pointpillar使用的是PillarVFE模块。

__all__ = {
    'VFETemplate': VFETemplate,
    'MeanVFE': MeanVFE,
    'PillarVFE': PillarVFE,
    'ImageVFE': ImageVFE,
    'DynMeanVFE': DynamicMeanVFE,
    'DynPillarVFE': DynamicPillarVFE,
    'DynamicPillarVFESimple2D': DynamicPillarVFESimple2D,
    'DynamicVoxelVFE': DynamicVoxelVFE,
}

PillarVFE代码讲解

pillar操作

PillarVFE模块是OpenPCDet（Open Point Cloud
Detection）中的一个特征编码模块，用于将点云数据转换为结构化的特征表示。它是PointPillars算法中的关键组件。
PointPillars是一种基于单张俯视图的点云目标检测算法，它将点云数据投影到二维的俯视图平面上，并使用PillarVFE模块对每个投影区域（称为pillar）进行特征编码。
PillarVFE模块的主要功能包括：

构建pillar：将点云数据投影到二维的俯视图平面，并将投影区域划分为多个小的正方形区域，称为pillar。每个pillar代表一个局部区域。
特征编码：对每个pillar中的点云数据进行特征编码。PillarVFE模块使用一个卷积神经网络（通常是多层的3D卷积操作）来提取pillar内点云的特征表示。这些特征表示捕捉了点云数据的局部结构信息。
特征聚合：PillarVFE模块还执行特征聚合操作，将每个pillar的特征表示合并为一个全局的特征表示。这样可以利用全局特征进行目标检测和定位。
通过PillarVFE模块的特征编码和特征聚合操作，PointPillars算法能够有效地处理大规模点云数据并提取有用的特征表示，从而实现高效的点云目标检测。

代码实现

整个pillarVFE 操作的代码在

OpenPCDet/pcdet/models/backbones_3d/vfe/pillar_vfe.py

这段代码实现了PillarVFE模块的前向传播过程。我将逐行解释其功能和操作：

class PillarVFE(VFETemplate):
    def __init__(self, model_cfg, num_point_features, voxel_size, point_cloud_range, **kwargs):
        super().__init__(model_cfg=model_cfg)

这是PillarVFE类的定义，它继承自VFETemplate类。在初始化函数中，它接收模型配置(model_cfg)、点云特征的数量(num_point_features)、体素的大小(voxel_size)和点云范围(point_cloud_range)等参数。

        self.use_norm = self.model_cfg.USE_NORM
        self.with_distance = self.model_cfg.WITH_DISTANCE
        self.use_absolute_xyz = self.model_cfg.USE_ABSLOTE_XYZ

这些变量存储了模型配置中的一些标志位，用来控制特征编码过程中的不同选项。

        num_point_features += 6 if self.use_absolute_xyz else 3
        if self.with_distance:
            num_point_features += 1

根据标志位的设置，调整输入点云特征的数量。如果设置了"use_absolute_xyz"标志位，将会增加6个坐标特征（x、y、z的原始坐标和相对坐标），否则增加3个坐标特征。

如果设置了"with_distance"标志位，还会增加一个距离特征。

        self.num_filters = self.model_cfg.NUM_FILTERS
        assert len(self.num_filters) > 0
        num_filters = [num_point_features] + list(self.num_filters)

从模型配置中获取特征编码的卷积层的通道数配置。将输入点云特征的数量(num_point_features)作为第一个通道数，然后依次添加后续的通道数。

        pfn_layers = []
        for i in range(len(num_filters) - 1):
            in_filters = num_filters[i]
            out_filters = num_filters[i + 1]
            pfn_layers.append(
                PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
            )
        self.pfn_layers = nn.ModuleList(pfn_layers)

创建PFNLayer的列表，PFNLayer是特征编码模块的基本单元。根据通道数配置，构建多个PFNLayer，并添加到列表中。

        self.voxel_x = voxel_size[0]
        self.voxel_y = voxel_size[1]
        self.voxel_z = voxel_size[2]
        self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
        self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
        self.z_offset = self.voxel_z / 2 + point_cloud_range[2]

存储体素的大小和点云范围，并计算体素的中心偏移量。这些偏移量将在后续的特征编码中使用。

    def get_output_feature_dim(self):
        return self.num_filters[-1]

返回特征编码后的输出特征维度，即最后一个卷积层的通道数。

    def get_paddings_indicator(self, actual_num, max_num, axis=0):
        actual_num = torch.unsqueeze(actual_num, axis + 1)
        max_num_shape = [1] * len(actual_num.shape)
        max_num_shape[axis + 1] = -1
        max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape)
        paddings_indicator = actual_num.int() > max_num
        return paddings_indicator

定义了一个辅助函数，用于生成一个指示填充位置的张量。根据输入的实际数量(actual_num)和最大数量(max_num)，生成一个形状相同的张量，其中填充位置为False，其他位置为True。

    def forward(self, batch_dict, **kwargs):
        voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batchdict['voxel_coords']

前向传播函数。接收一个批次的输入数据(batch_dict)，包括体素特征(voxel_features)、体素中点的数量(voxel_num_points)和体素坐标(coords)。

        points_mean = voxel_features[:, :, :3].sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(-1, 1, 1)
        f_cluster = voxel_features[:, :, :3] - points_mean

计算点云的平均位置(points_mean)，并将每个点的坐标减去平均位置得到聚类特征(f_cluster)。

        f_center = torch.zeros_like(voxel_features[:, :, :3])
        f_center[:, :, 0] = voxel_features[:, :, 0] - (coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x + self.x_offset)
        f_center[:, :, 1] = voxel_features[:, :, 1] - (coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y + self.y_offset)
        f_center[:, :, 2] = voxel_features[:, :, 2] - (coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z + self.z_offset)

计算点云中心特征(f_center)，通过将每个点的x、y、z坐标减去相应的体素偏移量得到。

        if self.use_absolute_xyz:
            features = [voxel_features, f_cluster, f_center]
        else:
            features = [voxel_features[..., 3:], f_cluster, f_center]

根据"use_absolute_xyz"标志位的设置，选择不同的特征组合方式。如果设置为True，将使用原始的点云特征、聚类特征和中心特征作为输入特征。否则，将只使用点云特征的后3个维度（排除坐标特征），并与聚类特征和中心特征一起作为输入特征。

        if self.with_distance:
            points_dist = torch.norm(voxel_features[:, :, :3], 2, 2, keepdim=True)
            features.append(points_dist)
        features = torch.cat(features, dim=-1)

如果设置了"with_distance"标志位，计算点云特征的欧几里得距离，并将其作为额外的特征添加到输入特征中。

最后，将所有的特征按最后一个维度拼接起来，形成最终的输入特征。

        voxel_count = features.shape[1]
        mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
        mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
        features *= mask

根据体素中点的数量生成一个掩码(mask)，并将其应用于输入特征。即将不包含点的位置的特征置为0。

        for pfn in self.pfn_layers:
            features = pfn(features)
        features = features.squeeze()
        batch_dict['pillar_features'] = features
        return batch_dict