pointpillars代码阅读--网络结构篇

最新推荐文章于 2024-08-07 11:03:44 发布

Little_sky_jty

最新推荐文章于 2024-08-07 11:03:44 发布

阅读量5.1k

点赞数 3

分类专栏：深度学习-点云基础网路-分类 CV-Detection

本文链接：https://blog.csdn.net/weixin_40805392/article/details/101426087

版权

深度学习-点云基础网路-分类同时被 2 个专栏收录

35 篇文章 52 订阅

订阅专栏

CV-Detection

33 篇文章 36 订阅

订阅专栏

Brief

细节阅读

1. voxel_generator

代码

    voxel_generator = voxel_builder.build(model_cfg.voxel_generator)

用处

用于每一个voxel_generater的生成器，内容如下，也就是对应的分割pc为voxel的参数。

在这里插入图片描述

拥有函数

  def generate(self, points, max_voxels=None):

可以根据输入的点云坐标转换到voxels中。

2. box_coder

函数

    box_coder = box_coder_builder.build(model_cfg.box_coder)

用处

对每一个voxel进行编码的设置

存在函数

    def _encode(self, boxes, anchors):
        return box_np_ops.second_box_encode(boxes, anchors, self.vec_encode, self.linear_dim)

    def _decode(self, encodings, anchors):
        return box_np_ops.second_box_decode(encodings, anchors, self.vec_encode, self.linear_dim)

_encode函数：
def second_box_encode(boxes, anchors,）中的参数定义：
boxes ([N, 7 + ?] Tensor): normal boxes: $x, y, z, w, l, h, r, c u s t o m v a l u e s$ ,对应ground truth
anchors ([N, 7] Tensor): anchors

3 target_assigner_builder

代码1

    target_assigner = target_assigner_builder.build(target_assigner_cfg, bv_range, box_coder)

这一部分把box_coder变成了target_assigner的一部分。

代码2

 for class_setting in classes_cfg:
        anchor_generator = anchor_generator_builder.build(class_setting)
        if anchor_generator is not None:
            anchor_generators.append(anchor_generator)
        else:
            assert target_assigner_config.assign_per_class is False
        classes.append(class_setting.class_name)
        feature_map_sizes.append(class_setting.feature_map_size)

这一部分对应着配置文件中多分类的anchor_size大小的设置。对应的配置文件在下：，一共只有四类，对应着只检测的四类物体，该部分用于生成类别大小不同的anchor_size。

class_settings: {
        anchor_generator_range: {
          sizes: [0.6, 0.8, 1.73] # wlh
          # anchor_ranges: [0, -40.0, -1.00, 70.4, 40.0, -1.00] # carefully set z center
          anchor_ranges: [0, -32.0, -0.6, 52.8, 32.0, -0.6] # carefully set z center
          rotations: [0, 1.57] # DON'T modify this unless you are very familiar with my code.
        }
      class_name: "Pedestrian"
      class_settings: {
        anchor_generator_range: {
          sizes: [1.87103749, 5.02808195, 2.20964255] # wlh
          # anchor_ranges: [0, -40.0, -1.00, 70.4, 40.0, -1.00] # carefully set z center
          anchor_ranges: [0, -32.0, -1.41, 52.8, 32.0, -1.41] # carefully set z center
          rotations: [0, 1.57] # DON'T modify this unless you are very familiar with my code.
        }
        class_name: "Van"
        。。。

代码3

    for class_setting in classes_cfg:
        similarity_calcs.append(similarity_calculator_builder.build(
            class_setting.region_similarity_calculator))

对应着配置文件中的下面部分，怎么什么有没有啊。在kitti数据集的多分类任务中是什么都没有的。

        region_similarity_calculator: {
          nearest_iou_similarity: {
          }
        }

4 net

代码 1

    net = second_builder.build(
        model_cfg, voxel_generator, target_assigner, measure_time=measure_time)

代码2 ---------特征提取前馈层。

	    points_mean = features[:, :, :self.num_input_features].sum(dim=1, keepdim=False) / num_voxels.type_as(features).view(-1, 1)
        feature = torch.norm(points_mean[:, :2], p=2, dim=1, keepdim=True)
        res = torch.cat([feature, points_mean[:, 2:self.num_input_features]],dim=1)

输入的特征features大小是[点个数（最大voxels个数），5,4]；其中，voxels表示最大voxel个数，5表示每个voxel中含有的最大的点个数，4是对应的维度
（1）第一层特征提取层，对一个voxel中的所有点提取均值特征points_mean为[voxels,4]，然后求points_mean的关于x轴和y轴的2范数feature[voxels,1],最后在结合feature和points_mean就得到了第一层的特征[voxels,3],最后一维： $[\sqrt{\overline{x}^2+\overline{y}^2},\overline{z},\overline{r}]$ ,最后作者说了,z is important for z position regression, but x, y is not，所以每一个含有点的voxel中是没有独自的x,y信息的。

res = torch.cat([feature, points_mean[:, 2:self.num_input_features]], dim=1)

代码3 ------------中间特征提取层

这个版本的voxelnet很大的扩大了稀疏卷积的使用，在第一层的特征提取层直接丢掉了VFE结构，在中间层采用稀疏卷积的block进行代替。具体结构可以去看该系列的第一篇文章。

        spatial_features = self.middle_feature_extractor(voxel_features, coors, batch_size)

midcov中间层之前。

ret = spconv.SparseConvTensor(voxel_features, coors, self.sparse_shape, batch_size)

（1）输入的特征是 $[v o x e l s, 3]$ 通过索引indices[voxels，4],最后一个维度表示 $[b, w, h, l]$ 的格子中。也就是其对应的坐标coor，作者在最开始就把坐标voxel化。
（2）spatial_shape大小为 $[41, 1280, 1056]$ ，空间的总的size，对应着空间所有voxels（包含着没有点和存在点的voxel） $41\times 1280 \times 1056 = 55418880$ ；如下

mid层

        ret = self.middle_conv(ret)

由于这一层采用了外部自己写的库文件，导致了debug看不明白是怎么运行的，最后输出的ret是如下；

在这里插入图片描述
进一步通过debug中的indice_dict查看层次如下，并能一一看它使的shape变化过程：

在这里插入图片描述

对每一个submx打开如下，分别对应着每一层的的输出。这里是subm0

因此，总结如下

在“subm0”中： $[54365,4]\rightarrow[54365,4]\rightarrow[27,2,54365]\rightarrow[27]$
在"None"中： $[54365,4]\rightarrow[20702,4]\rightarrow[24726,4]\rightarrow[3,2，24726]\rightarrow[3,1]$
在“subm1”中： $[87753,4]\rightarrow[87753,4]\rightarrow[27,2,87753]\rightarrow[27]$
在“subm2”中： $[55332,4]\rightarrow[55332,4]\rightarrow[27,2,55332]\rightarrow[27]$
3 在“subm2”中： $[24726,4]\rightarrow[24726,4]\rightarrow[27,2,24726]\rightarrow[27]$

???，猜测这是稀疏卷积的自组织形式。但是在每一个的subx的第4个维度上表示了全局shape形式，其变化为：

$[41,1280,1056]\rightarrow[21,640,528]\rightarrow[11,320,264]\rightarrow[5,160,132]\rightarrow[2,160,132]$ ，正正好好对应了数据的流动，这里是没有包含特征维度和batch的，最后输出加上特征维度应该是 $[3, 2, 160, 132, 64]$ 但是疑惑是输入的为什么是 $[41, 1280, 1056]$ 的大小。代码如下得到：

        sparse_shape = np.array(output_shape[1:4]) + [1, 0, 0]

其中output_shape实际上算作是mid层的输入（作者为什么取output而不是input我也不知道），其次，该大小是如下计算：

  dense_shape = [1] + grid_size[::-1].tolist() + [vfe_num_filters[-1]]

其中，grid_size为 $[1056, 1280, 40]$ 也就是mid层的输入大小，grad_size计算是：

（1）计算pc的范围[0,-32,-3,52.8,32,1.0]—>[52.8,64,4]
（2）除以每个voxel的size,再取整得到对应的个数 [52.8,64,4]/[0.05,0.05,0.1]=[1056,1240,40]

因此这样就刚刚好对应到输入的size的大小。最后在mid层输出的大小[2,160,132]应该是测试出来比较好然后就使用的吧！！！

mid后续处理

        ret = ret.dense()
        N, C, D, H, W = ret.shape
        ret = ret.view(N, C * D, H, W)

这一步把输出的形式整合成一般形式 $[B, C, D, H, W]$ ,再把深度维度和特征维度融合为[3,128,160,132]的feature-map。

RPN层

        preds_dict = self.rpn(spatial_features)

（1）在究极父类RPNNoHeadBase中forward如下，可以看出经过多个（=2）个blocks

        for i in range(len(self.blocks)):
            x = self.blocks[i](x)
            stage_outputs.append(x)
            if i - self._upsample_start_idx >= 0:
                ups.append(self.deblocks[i - self._upsample_start_idx](x))

当i=1时，其对应的结构如下，也就是很一般的二维卷积。总结一下shape变化：

经过第一个block： $[3,128,160,132]\rightarrow[3,64,160,132]$
经过第二个block： $[3,64,160,132]\rightarrow[3,128,80,66]$ （保持特征维度，开始压缩几何维）

在这里插入图片描述

由于该过程需要上采样，在每一次block之后的输出上进行，记录：

第一个block的输出x1,上采样的shape： $[3,64,160,132]\rightarrow[3,128,160,132]$ ，也就是采样到原始的维度。
第一个block的输出x2,上采样的shape： $[3,128,80,66]\rightarrow[3,128,160,132]$ 。

最后输出的是结合了两个上采样的特征 $[3, 256, 160, 132]$ 。

（2）随后才真正进入了RPN2的的管辖范围：这里的x也就是述的最后cancat的两个上采样的输出 $[3, 256, 160, 132]$

        box_preds = self.conv_box(x)
        cls_preds = self.conv_cls(x)

第一个 self.conv_box(x)实际上就是一个二维卷积，定义如下，为了方便看，我把参数直接换成了数字；这里的num_cls=32=num_anchor_per_loc（=8） * num_class（=4），num_anchor_per_loc是因为每一个size大小的anchor都需要预测两个方向，那么每一个一种类别的一个anchor就需要预测2个，一共四个类别，对应起来也就是8.并且num_class值也就是配置文件中class_seeting的个数，即需要检测的类别数。

		self.conv_cls = nn.Conv2d(256, 32, 1)

同理有对于self.conv_cls,输出维度为num_anchor_per_loc=8乘上num_direction_bins=7，其中num_anchor_per_loc的含义是同上，num_direction_bins表示7个维度

        self.conv_box = nn.Conv2d(256,8* 7, 1)

上述两个层分别预测box_pred $[3, 56, 160, 132]$ 和cls_pred $[3, 32, 160, 132]$ 。现在的通道维里就都是预测的结果了。后续需要把56层和32层分开：

        box_preds = box_preds.view(-1, self._num_anchor_per_loc, self._box_code_size, H, W).permute(0, 1, 3, 4, 2).contiguous()
                cls_preds = cls_preds.view(-1, self._num_anchor_per_loc, self._num_class, H, W).permute( 0, 1, 3, 4, 2).contiguous()

也就是变成box_pred $[3, 8, 160, 132, 7]$ 和cls_pred $[3, 8, 160, 132, 4]$ 。最后一个维度相当于对一个anchor的预测吧，这里的8就当4个种类的每一个anchor都会预测两个方向。

（3）此外，如果还有方向预测，那么会存在这个，预测的方向dir_cls_preds 为 $[3, 8, 160, 132, 2]$

		    dir_cls_preds = self.conv_dir_cls(x)
            dir_cls_preds = dir_cls_preds.view(-1, self._num_anchor_per_loc, self._num_direction_bins, H, W).permute(0, 1, 3, 4, 2).contiguous()

至此，网络的前馈部分介绍，后续会有loss部分。