数据流动篇
这篇文章讲的是当example数据形成后,数据在网络里面流动的方式:
首先是example如下:
数据名称 | 维度 | 含义 |
---|---|---|
Voxels | [9918,100,4] | 9918:pillar数量,100:每个pillar内的最大点数,4:XYZI |
Num_points | [9918] | 9918个int,每个pillar的真实点数 |
coordinates | [9918,4] | 待补充 |
rect | [2,4,4] | 2:batch_size;4*4 rt矩阵. |
Trv2c | [2,4,4] | 如上 |
P2 | [2,4,4] | p2相机内参 |
anchors | [2,107136],7 | 2:batach_size,107136:anchor数量,7:xyz,长宽高,偏航. |
anchor_mask | [2,107136] | 2是batch_size,另一个是anchor的数量… |
image_idx | (2,1) | 图片序号 |
image_shape | (2,2) | 375*1242 |
example生成过程详看链接:example生成过程
一、PointPillar的网络结构:
网络结构如下:
VoxelNet(
(voxel_feature_extractor): PillarFeatureNet(
(pfn_layers): ModuleList(
(0): PFNLayer(
(linear): DefaultArgLayer(in_features=9, out_features=64, bias=False)
(norm): DefaultArgLayer(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
)
)
(middle_feature_extractor): PointPillarsScatter()
(rpn): RPN(
(block1): Sequential(
(0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
(1): DefaultArgLayer(64, 64, kernel_size=(3, 3), stride=(2, 2), bias=False)
(2): DefaultArgLayer(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(3): ReLU()
(4): DefaultArgLayer(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(5): DefaultArgLayer(64, eps=0.001, mom828entum=0.01, affine=True, track_running_stats=True)
(6): ReLU()
(7): DefaultArgLayer(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(8): DefaultArgLayer(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(9): ReLU()
(10): DefaultArgLayer(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(11): DefaultArgLayer(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(12): ReLU()
)
(deconv1): Sequential(
(0): DefaultArgLayer(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
(block2): Sequential(
(0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
(1): DefaultArgLayer(64, 128, kernel_size=(3, 3), stride=(2, 2), bias=False)
(2): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(3): ReLU()
(4): DefaultArgLayer(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(5): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(6): ReLU()
(7): DefaultArgLayer(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(8): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(9): ReLU()
(10): DefaultArgLayer(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(11): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(12): ReLU()
(13): DefaultArgLayer(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(14): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(15): ReLU()
(16): DefaultArgLayer(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(17): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(18): ReLU()
)
(deconv2): Sequential(
(0): DefaultArgLayer(128, 128, kernel_size=(2, 2), stride=(2, 2), bias=False)
(1): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
(block3): Sequential(
(0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
(1): DefaultArgLayer(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False)
(2): DefaultArgLayer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(3): ReLU()
(4): DefaultArgLayer(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(5): DefaultArgLayer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(6): ReLU()
(7): DefaultArgLayer(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(8): DefaultArgLayer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(9): ReLU()
(10): DefaultArgLayer(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(11): DefaultArg、Layer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(12): ReLU()
(13): DefaultArgLayer(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(14): DefaultArgLayer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(15): ReLU()
(16): DefaultArgLayer(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(17): DefaultArgLayer(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(18): ReLU()
)
(deconv3): Sequential(
(0): DefaultArgLayer(256, 128, kernel_size=(4, 4), stride=(4, 4), bias=False)
(1): DefaultArgLayer(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv_cls): Conv2d(384, 2, kernel_size=(1, 1), stride=(1, 1))
(conv_box): Conv2d(384, 14, kernel_size=(1, 1), stride=(1, 1))
(conv_dir_cls): Conv2d(384, 4, kernel_size=(1, 1), stride=(1, 1))
)
(rpn_acc): Accuracy()
(rpn_precision): Precision()
(rpn_recall): Recall()
(rpn_metrics): PrecisionRecall()
(rpn_cls_loss): Scalar()
(rpn_loc_loss): Scalar()
(rpn_total_loss): Scalar()
结构其实很简单,其实主要有三部分组成:
- Pillar特征提取层,也就是pfn层.
- 中间网络特征提取: PointPillarsScatter()
- RPN(相对复杂些)
本文不再赘述,主要看下数据如何流动的!
二、网络分解_数据流动
首先我们把得到的example转化成tensor的形式,然后进入predict_kitti_to_anno(..)
函数
1.PFN层:
进入forward函数,网络内容差不多如下了:
voxel_features = self.voxel_feature_extractor(voxels, num_points, coors)
if self._use_sparse_rpn:
preds_dict = self.sparse_rpn(voxel_features, coors, batch_size_dev)
else:
spatial_features = self.middle_feature_extractor(
voxel_features, coors, batch_size_dev)
if self._use_bev:
preds_dict = self.rpn(spatial_features, example["bev_map"])
else:
preds_dict = self.rpn(spatial_features)
下面是pfn网络的定义:
num_filters = [num_input_features] + list(num_filters)
pfn_layers = []
for i in range(len(num_filters) - 1):
in_filters = num_filters[i]
out_filters = num_filters[i + 1]
if i < len(num_filters) - 2:
last_layer = False
else:
last_layer = True
pfn_layers.append(PFNLayer(in_filters, out_filters, use_norm, last_layer=last_layer))
self.pfn_layers = nn.ModuleList(pfn_layers)
pfn层就是这个进入:voxel_features = self.voxel_feature_extractor(voxels, num_points, coors)
,接着进入pfn网络的forward函数:
1.将point的4维特征变成9维:
points_mean:torch.Size([9918, 1, 3])
name | shape | info |
---|---|---|
points_mean | [9918, 1, 3] | 每一个pillar的均值xyz |
f_cluster | [9918, 100, 3] | pillar里面每个点与pillar均值的偏差=points[xyz]-points_mean |
f_center | [9918, 100, 2] | Find distance of x, y, and z from pillar center |
features | [9918, 100, 9] | 最终输入网络的points特征->9维 |
下面关于两个mask的操作:
mask = get_paddings_indicator(num_voxels, voxel_count, axis=0)
mask = torch.unsqueeze(mask, -1).type_as(features)
额,没太搞懂在干嘛,应该是去除了一些点的信息,然后进入网络里面:
for pfn in self.pfn_layers:
features = pfn(features)
进行linear,norm,relu的操作,这里面的norm是归一化操作BatchNorm1dm,我们在经过linear操作后维度变成了64维,然后经过self.norm操作以及Relu操作.生成的形状如下.
features | [9918, 100, 64] |
---|
然后又进行了一个:x_max = torch.max(x, dim=1, keepdim=True)[0]
的操作,然后返回结果,最后.
features | [9918, 1, 64] |
---|
最终forward函数返回的是return features.squeeze()
,也就是最后的形状是:
features | [9918, 64] |
---|
也就是我们经过了一层pfn后从输入的fearures[9918, 100, 64]变成了features[9918,64],至此,PFN层结束.
2.PointPillarsScatter层
我们会进入spatial_features = self.middle_feature_extractor(…)的forward函数进行前向运算:
同样的输入为pfn层的结果features[9918,64].
先看一下PointPillarsScatter的定义:
class PointPillarsScatter(nn.Module):
def __init__(self,
output_shape,
num_input_features=64):
"""
Point Pillar's Scatter.
Converts learned features from dense tensor to sparse pseudo image. This replaces SECOND's
second.pytorch.voxelnet.SparseMiddleExtractor.
:param output_shape: ([int]: 4). Required output shape of features.
:param num_input_features: <int>. Number of input features.
"""
super().__init__()
self.name = 'PointPillarsScatter'
self.output_shape = output_shape
self.ny = output_shape[2]
self.nx = output_shape[3]
self.nchannels = num_input_features
可以看出PointPillarsScatter这个网络其实是把我们在pfn网络中学到的特征从dense tensor 转成伪图像的形式,
相关参数:
nchannels = 64
nx = 432
ny = 496
outshape: [1, 1, 496, 432, 64]
其实这部分不是不涉及什么CNN的网络层,这也就解释了为什么我们在导出onnx的时候,这部分回出现错误.
这部分最终的输出是形状是[2, 64, 496, 432],2是batch_size,这就是我们常见的图片形式了;
这里有很多细节没有讲,后续补充;
3.rpn层
这个层比较复杂,涉及3个block和unblock模块.
我们首先看下定义:
(rpn): RPN(
(block1): Sequential(..)
(deconv1): Sequential(..)
(block2): Sequential(..)
(deconv2): Sequential(..)
(block3): Sequential(..)
(deconv3): Sequential(..)
(conv_cls): Conv2d(384, 2, kernel_size=(1, 1), stride=(1, 1))
(conv_box): Conv2d(384, 14, kernel_size=(1, 1), stride=(1, 1))
(conv_dir_cls): Conv2d(384, 4, kernel_size=(1, 1), stride=(1, 1))
)
数据的输入就是scatter网络中输出[2, 64, 496, 432],其实每个block和unblock里面有好多的操作,我们看一下每一个操作之后的一个数据动态:
name | shape |
---|---|
Original | [2, 64, 496, 432] |
block1 | Down1 -> [2, 64, 248, 216] |
deconv1 | Up1 ->[2, 128, 248, 216] |
block2 | Down2 ->[2, 128, 124, 108] |
deconv2 | Up2 -> [2, 128, 248, 216] |
block3 | Down3 -> [2, 256, 62, 54] |
deconv3 | Up3 -> [2, 128, 248, 216] |
X | torch.cat([up1, up2, up3], dim=1) -> [2, 384, 248, 216] |
– | RESULT |
conv_cls | [2, 2, 248, 216] |
conv_box | [2, 14, 248, 216] |
conv_dir_cls | [2, 4, 248, 216] |
至此,RPN部分基本完成.最后结果两个回归.一个分类.
后续还有很多函数,到时候单独拿出一个博客来讲吧.到现在网络部分已经基本完成,可以看到还是比较清晰的.