我们一次按照网络结构里面的执行顺序依次解读:
首先我们运行python ./pytorch/train.py train --config_path=./configs/pointpillars/car/xyres_16.proto --model_dir=/path/to/model_dir
下面是源码:
def train(config_path,
model_dir,
result_path=None,
create_folder=False,
display_step=50,
summary_step=5,
pickle_result=True)
首先进行的操作是读取配置文件,就是xyres_16.proto这个文件,读写完之后要做的操作就是BUILD VOXEL GENERATORvoxel_generator = voxel_builder.build(model_cfg.voxel_generator)
我们先看一下model_cfg.voxel_generator内的数据是什么
voxel_generator {
point_cloud_range : [0, -39.68, -3, 69.12, 39.68, 1] //点云范围
voxel_size : [0.16, 0.16, 4] //pillar形状
max_number_of_points_per_voxel : 100 //每个pillar最多点云形状
}
接下来看下看voxel_builder模块内的build()这个函数:
def build(voxel_config):
"""Builds a tensor dictionary based on the InputReader config.
Args:
input_reader_config: A input_reader_pb2.InputReader object.
Returns:
A tensor dict based on the input_reader_config.
Raises:
ValueError: On invalid input reader proto.
ValueError: If no input paths are specified.
"""
if not isinstance(voxel_config, (voxel_generator_pb2.VoxelGenerator)):
raise ValueError('input_reader_config not of type '
'input_reader_pb2.InputReader.')
voxel_generator = VoxelGenerator(
voxel_size=list(voxel_config.voxel_size),
point_cloud_range=list(voxel_config.point_cloud_range),
max_num_points=voxel_config.max_number_of_points_per_voxel,
max_voxels=20000)
return voxel_generator
返回的是一个voxel的字典,里面有voxel_size,point_cloud_range,max_num_points,max_voxels的属性。
接下来是bv_range = voxel_generator.point_cloud_range[[0, 1, 3, 4]]//x_min,y_min,x_max,y_max box_coder = box_coder_builder.build(model_cfg.box_coder)//对voxelnet的每个box进行编码
box_coder: {
ground_box3d_coder: {
linear_dim: false
encode_angle_vector: false
}
}
这里面调用了box_coder_builder.build()函数:
def build(box_coder_config):
"""Create optimizer based on config.
Args:
optimizer_config: A Optimizer proto message.
Returns:
An optimizer and a list of variables for summary.
Raises:
ValueError: when using an unsupported input data type.
"""
box_coder_type = box_coder_config.WhichOneof('box_coder')
if box_coder_type == 'ground_box3d_coder':
cfg = box_coder_config.ground_box3d_coder
return GroundBox3dCoderTorch(cfg.linear_dim, cfg.encode_angle_vector)
elif box_coder_type == 'bev_box_coder':
cfg = box_coder_config.bev_box_coder
return BevBoxCoderTorch(cfg.linear_dim, cfg.encode_angle_vector, cfg.z_fixed, cfg.h_fixed)
else:
raise ValueError("unknown box_coder type")
其中:
def bev_box_encode(boxes, anchors, encode_angle_to_vector=False, smooth_dim=False):
"""box encode for VoxelNet
Args:
boxes ([N, 7] Tensor): normal boxes: x, y, z, l, w, h, r
anchors ([N, 7] Tensor): anchors
"""
def bev_box_decode(box_encodings, anchors, encode_angle_to_vector=False, smooth_dim=False):
"""box decode for VoxelNet in lidar
Args:
boxes ([N, 7] Tensor): normal boxes: x, y, z, w, l, h, r
anchors ([N, 7] Tensor): anchors
"""
然后又嵌套了GroundBox3dCoderTorch():继承了GroundBox3dCoder,这里面是个循环继承,大家可以去看下。
class GroundBox3dCoderTorch(GroundBox3dCoder):
def encode_torch(self, boxes, anchors):
return box_torch_ops.second_box_encode(boxes, anchors, self.vec_encode, self.linear_dim)
def decode_torch(self, boxes, anchors):
return box_torch_ops.second_box_decode(boxes, anchors, self.vec_encode, self.linear_dim)
紧接着是target_assigner_cfg = model_cfg.target_assigner target_assigner = target_assigner_builder.build(target_assigner_cfg, bv_range, box_coder)
这里面的target_assigner如下:
target_assigner: {
anchor_generators: {
anchor_generator_stride: {
sizes: [1.6, 3.9, 1.56] # wlh
strides: [0.32, 0.32, 0.0] # if generate only 1 z_center, z_stride will be ignored
offsets: [0.16, -39.52, -1.78] # origin_offset + strides / 2
rotations: [0, 1.57] # 0, pi/2
matched_threshold : 0.6
unmatched_threshold : 0.45
}
}
sample_positive_fraction : -1
sample_size : 512
region_similarity_calculator: {
nearest_iou_similarity: {
}
}
}
target_assigner_builder.build的定义为:build(target_assigner_config, bv_range, box_coder)
返回的是一个tensor字典;
函数里面anchor_generator = anchor_generator_builder.build(a_cfg) anchor_generators.append(anchor_generator)
(er) 大戏来了,网络部分
net = second_builder.build(model_cfg, voxel_generator, target_assigner)
刚开始读取一些参数,注意dense_shape为[1,1,496,432,64]
最开始的网络是pfn(PillarFeatureNet)
随后进入PFNLayer的定义,进入点的特征数为9,输出为64维:
This layer performs a similar role as second.pytorch.voxelnet.VFELayer.
这个一个pfn的结构,输入的是点的9个特征,输出的是64维特征,中间经过了BatchNomal和linear层,这里与VFE稍有不同的是没有加Relu:
# Need pillar (voxel) size and x/y offset in order to calculate pillar offset
self.vx = voxel_size[0]
self.vy = voxel_size[1]
self.x_offset = self.vx / 2 + pc_range[0]
self.y_offset = self.vy / 2 + pc_range[1]
因为点云的输入只有4个特征xyz,i,我们需要按照论文里面提到转换成9维特征.
其实上一步就是VFE(voxel_feature_extractor),我们最先开始的步骤.只不过这个VFE是由pfn组成的.pfn的组成上边已经提及.forward的函数我们稍后再提及,这里只看初始化的内容,forward如下.
(三) 中间特征提取网络(PointPillarsScatter)
#函数入口
if middle_class_name == "PointPillarsScatter":
self.middle_feature_extractor = PointPillarsScatter(output_shape=output_shape,
num_input_features=vfe_num_filters[-1])
里面output_shape为<class ‘list’>: [1, 1, 496, 432, 64],上面的dense_map
num_input_features为上一层的输出:64.
(四)RPN层
rpn: {
module_class_name: "RPN"
layer_nums: [3, 5, 5]
layer_strides: [2, 2, 2]
num_filters: [64, 128, 256]
upsample_strides: [1, 2, 4]
num_upsample_filters: [128, 128, 128]
use_groupnorm: false
num_groups: 32
}
后续补充\