MVX-Net: PointFusion 在mmdetection3d中的实现

最新推荐文章于 2024-01-29 14:13:51 发布

符号看象限_Wangerer

最新推荐文章于 2024-01-29 14:13:51 发布

阅读量2.9k

点赞数 11

分类专栏： 3D目标检测

本文链接：https://blog.csdn.net/weixin_38362784/article/details/112220364

版权

3D目标检测专栏收录该内容

18 篇文章 17 订阅

订阅专栏

写在前面
MVX-Net中融合激光点云和图像数据的两种方法（PointFusion, VoxelFusion）
介绍：本篇博客拟介绍PointFusion方法
说明：mmdetection3d中已经实现PointFusion，暂未实现VoxelFusion
目标：本文将分析PointFusion的实现，后续博客将讲解如何实现VoxelFusion

（一）数据融合的关键代码`DynamicVFE.forward`

Step0: 代码调用

代码调用链DynamicMVXFasterRCNN(MVXTwoStageDetector)：
-> self.forward_train
-> img_feats, pts_feats = self.extract_feat
-> img_feats = self.extract_img_feat
… pts_feats = self.extract_pts_feat 【参数包括img_feats】

# mmdet3d/models/detectors/mvx_faster_rcnn.py
 def extract_pts_feat(self, points, img_feats, img_metas):
        ...
        voxel_features, feature_coors = self.pts_voxel_encoder(
            voxels, coors, points, img_feats, img_metas)
        ...

简要说明

数据融合实现在特征提取阶段。

首先，提取多尺度图像特征img_feats。
接着，融合图像特征img_feats和点云数据points生成融合特征pts_feats。
本文主要研究这一部分。

代码调用

关键代码调用发生位置：
DynamicMVXFasterRCNN.extract_pts_feat()中的self.pts_voxel_encoder，该部分融合图像特征img_feats和点云数据points。
此处self.pts_voxel_encoder即为DynamicVFE类。因此，主要研究DynamicVFE.forward。

Step1: `DynamicVFE.forward`参数说明

注意：这里使用动态体素表示。即在配置文件的pts_voxel_layer中，将max_num_points设置为-1。
动态体素划分即记录点云与体素间的映射关系，因此下面参数中，features保持原始点云特征，coors记录该映射关系。

# mmdet3d/models/voxel_encoders/voxel_encoder.py
def forward(self, features, coors, points=None, img_feats=None, img_metas=None):

参数	features	coors	points	img_feats	img_metas
介绍	体素特征	体素坐标	原始点云数据	图像特征	图像原始信息
尺寸	[34096, 4]	[34096, 4]	points[0]: [16921, 4]; points[1]: [17175, 4]	size = 5	img_metas[0]: 字典; img_metas[1]: 字典
说明	34096表示2个batch的点云数	第1维表示batch_id; 第2-4维表示该点云所对应的体素坐标z, y, x	分开表示2个batch的点云	[2, 256, 48, 160]; [2, 256, 24, 80]; [2, 256, 12, 40]; [2, 256, 6, 20]; [2, 256, 3, 10]	分开表示2个batch的图像信息
特别说明	由于使用动态体素，features保持原始点云特征	体素坐标(voxel_coors)在mmdetection3d中的理解

Step2: 初始特征提取

1. 关键代码举例

# mmdet3d/models/voxel_encoders/voxel_encoder.py
# function DynamicVFE.forward()
        if self._with_cluster_center:                                                       # voxel_mean [15212, 4] 15212个体素，记录4维平均特征
            voxel_mean, mean_coors = self.cluster_scatter(features, coors)                  # mean_coors [15212, 4] 记录batch_size和体素的坐标
            points_mean = self.map_voxel_center_to_point(                               
                coors, voxel_mean, mean_coors)                                              # points_mean [18220, 4]
            # TODO: maybe also do cluster for reflectivity
            f_cluster = features[:, :3] - points_mean[:, :3]                                # f_cluster [18220, 3]
            features_ls.append(f_cluster)

(1) 将动态体素转换成体素，得到每个体素的平均特征`self.cluster_scatter`

得到：

voxel_mean: 体素特征 [28381, 4] （每个体素内所有点云的平均特征）
mean_coors: 体素坐标 [28381, 4]

上表中2个batch共34096个点云，共产生28381个非空体素。

(2) 将体素特征转换到每个点云上去，即转换成动态体素表示`self.map_voxel_center_to_point`

得到：

points_mean: 点云特征 [34096, 4]

体素内所有点云，均获得该体素的特征（即体素内所有点云的特征相同）。

(3) 计算得到点云偏移特征`f_cluster`

2. 初始特征总结

# mmdet3d/models/voxel_encoders/voxel_encoder.py
# function DynamicVFE.forward()
features = torch.cat(features_ls, dim=-1)

按照类似举例的方法，提取若干特征，最终获得初始特征：

features：[34096, 10]

上表中2个batch共34096个点云，共提取到10维特征。

Step3: 特征融合阶段

1. 写在前面

根据上述得到的动态体素特征features（点云形式），经过如下循环，得到最终融合后的体素特征。下述代码主要做的事情：

继续提取动态体素特征（点云形式）
将动态体素特征（点云形式）与图像特征融合
将融合后的动态体素特征（点云形式），转换成体素特征。

下面的流程图描述了这段循环所做的事情。其中，fusion_layer实现特征融合，将在后续详细介绍。

# mmdet3d/models/voxel_encoders/voxel_encoder.py
# function DynamicVFE.forward()
        for i, vfe in enumerate(self.vfe_layers):
            point_feats = vfe(features)
            if (i == len(self.vfe_layers) - 1 and self.fusion_layer is not None
                    and img_feats is not None):
                point_feats = self.fusion_layer(img_feats, points, point_feats,
                                                img_metas)
            voxel_feats, voxel_coors = self.vfe_scatter(point_feats, coors)
            if i != len(self.vfe_layers) - 1:
                # need to concat voxel feats if it is not the last vfe
                feat_per_point = self.map_voxel_center_to_point(
                    coors, voxel_feats, voxel_coors)
                features = torch.cat([point_feats, feat_per_point], dim=1)

在这里插入图片描述

2. `fusion_layer`

(1) 输入参数说明

配置文件中，fusion_layer为PointFusion

# mmdet3d/models/fusion_layers/point_fusion.py
# Class PointFusion
 def forward(self, img_feats, pts, pts_feats, img_metas):

参数	img_feats	pts	pts_feats	img_metas
介绍	图像特征	原始点云数据	点云特征	图像原始信息
尺寸	size = 5	points[0]: [16921, 4]; points[1]: [17175, 4]	[34096, 64]	img_metas[0]: 字典; img_metas[1]: 字典
说明	[2, 256, 48, 160]; [2, 256, 24, 80]; [2, 256, 12, 40]; [2, 256, 6, 20]; [2, 256, 3, 10]	分开表示2个batch的点云	34096表示2个batch的点云数	分开表示2个batch的图像信息

(2) 转换图像特征的函数`obtain_mlvl_feats`

# mmdet3d/models/fusion_layers/point_fusion.py
# function PointFusion.forward()
 	img_pts = self.obtain_mlvl_feats(img_feats, pts, img_metas)

主要思想

首先，pts保留所有原始点云数据。于是，将所有原始点云映射至图像中，将图像特征作为该对应点云的特征。若点云映射点不在图像合理范围内，则该点云的特征置零。

Step1: 提取1个batch的数据，提取1种尺寸的图像特征

# mmdet3d/models/fusion_layers/point_fusion.py
# function obtain_mlvl_feats()
self.sample_single(img_ins[level][i:i + 1], pts[i][:, :3], img_metas[i])

level表示第几层图像特征，i表示batch_id

img_ins[level][i:i + 1]: 1个batch & 1种尺寸的图像特征，尺寸举例 [1, 128, 48, 160]
pts[i][:, :3]: 1个batch的原始点云的坐标 x, y, z，尺寸举例 [16921, 3]
img_metas[i]: 1个batch的图像信息

Step2: 点云映射，获取图像特征`point_sample()`

该函数针对1个batch的数据，1种尺寸的图像特征。
img_features 尺寸为 1 x C x H x W；points 尺寸为 N x 3。

首先，进行点云坐标转换，转换至图像坐标系下。同时，考虑所有的图像预处理操作，最终得到点云在图像坐标系下的坐标 coor_y, coor_x。
根据Crop Pooling公式、F.affine_grid、 F.grid_sample等，获得特征图下的特征。【这一部分还需要详细讲解，稍后写TODO】