3D-Dection系列论文1:Pointpillars ---数据流动篇

数据流动篇

这篇文章讲的是当example数据形成后,数据在网络里面流动的方式:
首先是example如下:

在这里插入图片描述

数据名称维度含义
Voxels[9918,100,4]9918:pillar数量,100:每个pillar内的最大点数,4:XYZI
Num_points[9918]9918个int,每个pillar的真实点数
coordinates[9918,4]待补充
rect[2,4,4]2:batch_size;4*4 rt矩阵.
Trv2c[2,4,4]如上
P2[2,4,4]p2相机内参
anchors[2,107136],72:batach_size,107136:anchor数量,7:xyz,长宽高,偏航.
anchor_mask[2,107136]2是batch_size,另一个是anchor的数量…
image_idx(2,1)图片序号
image_shape(2,2)375*1242

example生成过程详看链接:example生成过程

一、PointPillar的网络结构:
网络结构
网络结构如下:

VoxelNet(
  (voxel_feature_extractor): PillarFeatureNet(
    (pfn_layers): ModuleList(
      (0): PFNLayer(
        (linear): DefaultArgLayer(in_features=9, out_features=64, bias=False)
        (norm): DefaultArgLayer(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      )
    )
  )
  (middle_feature_extractor): PointPillarsScatter()
  (rpn): RPN(
    (block1): Sequential(
      (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
      (1): DefaultArgLayer(64, 64, kernel_size=(3, 3), stride=(2, 2), bias=False)
      (2): DefaultArgLayer(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (3): ReLU()
      (4): DefaultArgLayer(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (5): DefaultArgLayer(64, eps=0.001, mom828entum=0.01, affine=True, track_running_stats=True)
      (6): ReLU()
      (7): DefaultArgLayer(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (8): DefaultArgLayer(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (9): ReLU()
      (10): DefaultArgLayer(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (11): DefaultArgLayer(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (12): ReLU()
    )
    (deconv1): Sequential(
      (0): DefaultArgLayer(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (block2): Sequential(
      (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
      (1): DefaultArgLayer(64, 128, kernel_size=(3, 3), stride=(2, 2), bias=False)
      (2): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (3): ReLU()
      (4): DefaultArgLayer(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (5): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (6): ReLU()
      (7): DefaultArgLayer(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (8): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (9): ReLU()
      (10): DefaultArgLayer(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (11): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (12): ReLU()
      (13): DefaultArgLayer(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (14): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (15): ReLU()
      (16): DefaultArgLayer(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (17): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (18): ReLU()
    )
    (deconv2): Sequential(
      (0): DefaultArgLayer(128, 128, kernel_size=(2, 2), stride=(2, 2), bias=False)
      (1): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (block3): Sequential(
      (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
      (1): DefaultArgLayer(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False)
      (2): DefaultArgLayer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (3): ReLU()
      (4): DefaultArgLayer(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (5): DefaultArgLayer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (6): ReLU()
      (7): DefaultArgLayer(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (8): DefaultArgLayer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (9): ReLU()
      (10): DefaultArgLayer(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (11): DefaultArg、Layer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (12): ReLU()
      (13): DefaultArgLayer(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (14): DefaultArgLayer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (15): ReLU()
      (16): DefaultArgLayer(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (17): DefaultArgLayer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (18): ReLU()
    )
    (deconv3): Sequential(
      (0): DefaultArgLayer(256, 128, kernel_size=(4, 4), stride=(4, 4), bias=False)
      (1): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (conv_cls): Conv2d(384, 2, kernel_size=(1, 1), stride=(1, 1))
    (conv_box): Conv2d(384, 14, kernel_size=(1, 1), stride=(1, 1))
    (conv_dir_cls): Conv2d(384, 4, kernel_size=(1, 1), stride=(1, 1))
  )
  (rpn_acc): Accuracy()
  (rpn_precision): Precision()
  (rpn_recall): Recall()
  (rpn_metrics): PrecisionRecall()
  (rpn_cls_loss): Scalar()
  (rpn_loc_loss): Scalar()
  (rpn_total_loss): Scalar()

结构其实很简单,其实主要有三部分组成:

  1. Pillar特征提取层,也就是pfn层.
  2. 中间网络特征提取: PointPillarsScatter()
  3. RPN(相对复杂些)
    本文不再赘述,主要看下数据如何流动的!

二、网络分解_数据流动

首先我们把得到的example转化成tensor的形式,然后进入predict_kitti_to_anno(..)函数

1.PFN层:
 
进入forward函数,网络内容差不多如下了:

        voxel_features = self.voxel_feature_extractor(voxels, num_points, coors)
        if self._use_sparse_rpn:
            preds_dict = self.sparse_rpn(voxel_features, coors, batch_size_dev)
        else:
            spatial_features = self.middle_feature_extractor(
                voxel_features, coors, batch_size_dev)
            if self._use_bev:
                preds_dict = self.rpn(spatial_features, example["bev_map"])
            else:
                preds_dict = self.rpn(spatial_features)

下面是pfn网络的定义:

num_filters = [num_input_features] + list(num_filters)
        pfn_layers = []
        for i in range(len(num_filters) - 1):
            in_filters = num_filters[i]
            out_filters = num_filters[i + 1]
            if i < len(num_filters) - 2:
                last_layer = False
            else:
                last_layer = True
            pfn_layers.append(PFNLayer(in_filters, out_filters, use_norm, last_layer=last_layer))
        self.pfn_layers = nn.ModuleList(pfn_layers)

pfn层就是这个进入:voxel_features = self.voxel_feature_extractor(voxels, num_points, coors),接着进入pfn网络的forward函数:
1.将point的4维特征变成9维:
points_mean:torch.Size([9918, 1, 3])

nameshapeinfo
points_mean[9918, 1, 3]每一个pillar的均值xyz
f_cluster[9918, 100, 3]pillar里面每个点与pillar均值的偏差=points[xyz]-points_mean
f_center[9918, 100, 2]Find distance of x, y, and z from pillar center
features[9918, 100, 9]最终输入网络的points特征->9维

下面关于两个mask的操作:

        mask = get_paddings_indicator(num_voxels, voxel_count, axis=0)
        mask = torch.unsqueeze(mask, -1).type_as(features)

额,没太搞懂在干嘛,应该是去除了一些点的信息,然后进入网络里面:

        for pfn in self.pfn_layers:
            features = pfn(features)

进行linear,norm,relu的操作,这里面的norm是归一化操作BatchNorm1dm,我们在经过linear操作后维度变成了64维,然后经过self.norm操作以及Relu操作.生成的形状如下.

features[9918, 100, 64]

然后又进行了一个:x_max = torch.max(x, dim=1, keepdim=True)[0]的操作,然后返回结果,最后.

features[9918, 1, 64]

最终forward函数返回的是return features.squeeze(),也就是最后的形状是:

features[9918, 64]

也就是我们经过了一层pfn后从输入的fearures[9918, 100, 64]变成了features[9918,64],至此,PFN层结束.

2.PointPillarsScatter层

我们会进入spatial_features = self.middle_feature_extractor(…)的forward函数进行前向运算:
同样的输入为pfn层的结果features[9918,64].
先看一下PointPillarsScatter的定义:

class PointPillarsScatter(nn.Module):
    def __init__(self,
                 output_shape,
                 num_input_features=64):
        """
        Point Pillar's Scatter.
        Converts learned features from dense tensor to sparse pseudo image. This replaces SECOND's
        second.pytorch.voxelnet.SparseMiddleExtractor.
        :param output_shape: ([int]: 4). Required output shape of features.
        :param num_input_features: <int>. Number of input features.
        """
        super().__init__()
        self.name = 'PointPillarsScatter'
        self.output_shape = output_shape
        self.ny = output_shape[2]
        self.nx = output_shape[3]
        self.nchannels = num_input_features

可以看出PointPillarsScatter这个网络其实是把我们在pfn网络中学到的特征从dense tensor 转成伪图像的形式,
相关参数:

nchannels = 64
nx =  432
ny = 496
outshape: [1, 1, 496, 432, 64]

其实这部分不是不涉及什么CNN的网络层,这也就解释了为什么我们在导出onnx的时候,这部分回出现错误.
这部分最终的输出是形状是[2, 64, 496, 432],2是batch_size,这就是我们常见的图片形式了;
这里有很多细节没有讲,后续补充;

3.rpn层
这个层比较复杂,涉及3个block和unblock模块.
我们首先看下定义:

  (rpn): RPN(
    (block1): Sequential(..)
    (deconv1): Sequential(..)
    (block2): Sequential(..)
    (deconv2): Sequential(..)
    (block3): Sequential(..)
    (deconv3): Sequential(..)
    (conv_cls): Conv2d(384, 2, kernel_size=(1, 1), stride=(1, 1))
    (conv_box): Conv2d(384, 14, kernel_size=(1, 1), stride=(1, 1))
    (conv_dir_cls): Conv2d(384, 4, kernel_size=(1, 1), stride=(1, 1))
  )

数据的输入就是scatter网络中输出[2, 64, 496, 432],其实每个block和unblock里面有好多的操作,我们看一下每一个操作之后的一个数据动态:

nameshape
Original[2, 64, 496, 432]
block1Down1 -> [2, 64, 248, 216]
deconv1Up1 ->[2, 128, 248, 216]
block2Down2 ->[2, 128, 124, 108]
deconv2Up2 -> [2, 128, 248, 216]
block3Down3 -> [2, 256, 62, 54]
deconv3Up3 -> [2, 128, 248, 216]
Xtorch.cat([up1, up2, up3], dim=1) -> [2, 384, 248, 216]
RESULT
conv_cls[2, 2, 248, 216]
conv_box[2, 14, 248, 216]
conv_dir_cls[2, 4, 248, 216]

至此,RPN部分基本完成.最后结果两个回归.一个分类.

后续还有很多函数,到时候单独拿出一个博客来讲吧.到现在网络部分已经基本完成,可以看到还是比较清晰的.

评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值