BEVDet
BEVDet继承于CenterPoint–>MVTwoStageDetector
模型实现基于MMlab MMdet3D框架
该算法基于Centeroint点云检测,通过多视角图像估计深度,形成层视锥形点云,进而生成BEV视角下的pillar点云主体,完成点云检测。
- 下面是根据代码绘制结构
模型
bevdet-r50
模块 | type | 模块 | type | |
---|---|---|---|---|
img_backbone | 'ResNet' | |||
img_neck | CustomFPN | [1024,2048]->512 | ||
img_view_transformer | LSSViewTransformer | 512->80 | ||
img_bev_encoder_backbone | CustomResNet | 80->[80x2,80x4,80x8 | ||
img_bev_encoder_neck | FPN_LSS | 80x8+80*2->256 | ||
pts_bbox_head | CenterHead | 256-> | bbox_coder | CenterPointBBoxCoder |
separate_head | SeparateHead | |||
loss_cls | GaussianFocalLoss | |||
loss_bbox | L1Loss |
model = dict(
type='BEVDet',
img_backbone=dict(
pretrained='torchvision://resnet50',
type='ResNet',
depth=50,
num_stages=4, # 该网络共有4个阶段
out_indices=(2, 3), ## 要网络的第2个和第3个阶段的特征图作为输出
frozen_stages=-1, # 将所有层的权重都冻结,只训练最后一层或几层的权重
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=False, # 当norm_eval=False时,归一化层将处于训练模式,它将使用当前的batch的均值和方差来归一化输入数据。当norm_eval=True时,归一化层将处于评估模式,它将使用先前存储的移动平均均值和方差来归一化输入数据。
with_cp=True,# 特征金字塔网络在进行特征融合时会使用copy操作
style='pytorch'),
img_neck=dict(
type='CustomFPN',
in_channels=[1024, 2048],
out_channels=512,
num_outs=1,
start_level=0, # 从网络的第0层开始进行特征提取
out_ids=[0]), # 特征金字塔网络中的第0个特征图
img_view_transformer=dict(
type='LSSViewTransformer',
grid_config=grid_config,
input_size=data_config['input_size'],
in_channels=512,
out_channels=numC_Trans,
downsample=16),
img_bev_encoder_backbone=dict(
type='CustomResNet',
numC_input=numC_Trans,
num_channels=[numC_Trans * 2, numC_Trans * 4, numC_Trans * 8]),
img_bev_encoder_neck=dict(
type='FPN_LSS',
in_channels=numC_Trans * 8 + numC_Trans * 2,
out_channels=256),
pts_bbox_head=dict(
type='CenterHead', # BEVDet继承Centerpoints
in_channels=256,
tasks=[
dict(num_class=1, class_names=['car']),
dict(num_class=2, class_names=['truck', 'construction_vehicle']),
dict(num_class=2, class_names=['bus', 'trailer']),
dict(num_class=1, class_names=['barrier']),
dict(num_class=2, class_names=['motorcycle', 'bicycle']),
dict(num_class=2, class_names=['pedestrian', 'traffic_cone']),
],
common_heads=dict(
reg=(2, 2), height=(1, 2), dim=(3, 2), rot=(2, 2), vel=(2, 2)),
share_conv_channel=64,
bbox_coder=dict(
type='CenterPointBBoxCoder',
pc_range=point_cloud_range[:2],
post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
max_num=500,
score_threshold=0.1,
out_size_factor=8,
voxel_size=voxel_size[:2],
code_size=9),
separate_head=dict(
type='SeparateHead', init_bias=-2.19, final_kernel=3),
loss_cls=dict(type='GaussianFocalLoss', reduction='mean'),
loss_bbox=dict(type='L1Loss', reduction='mean', loss_weight=0.25),
norm_bbox=True),
# model training and testing settings
train_cfg=dict(
pts=dict(
point_cloud_range=point_cloud_range,
grid_size=[1024, 1024, 40],
voxel_size=voxel_size,
out_size_factor=8,
dense_reg=1,
gaussian_overlap=0.1,
max_objs=500,
min_radius=2,
code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2])),
test_cfg=dict(
pts=dict(
pc_range=point_cloud_range[:2],
post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
max_per_img=500,
max_pool_nms=False,
min_radius=[4, 12, 10, 1, 0.85, 0.175],
score_threshold=0.1,
out_size_factor=8,
voxel_size=voxel_size[:2],
pre_max_size=1000,
post_max_size=83,
# Scale-NMS
nms_type=[
'rotate', 'rotate', 'rotate', 'circle', 'rotate', 'rotate'
],
nms_thr=[0.2, 0.2, 0.2, 0.2, 0.2, 0.5],
nms_rescale_factor=[
1.0, [0.7, 0.7], [0.4, 0.55], 1.1, [1.0, 1.0], [4.5, 9.0]
])))
训练配置
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
train_pipeline | test_pipeline |
---|---|
PrepareImageInputs | PrepareImageInputs |
LoadAnnotationsBEVDepth | LoadAnnotationsBEVDepth |
ObjectRangeFilter | LoadPointsFromFile |
ObjectNameFilter | MultiScaleFlipAug3D |
DefaultFormatBundle3D | (DefaultFormatBundle3D |
Collect3D | Collect3D) |
Scale NMS
# Scale-NMS
nms_type=[
'rotate', 'rotate', 'rotate', 'circle', 'rotate', 'rotate'
],
nms_thr=[0.2, 0.2, 0.2, 0.2, 0.2, 0.5],
nms_rescale_factor=[
1.0, [0.7, 0.7], [0.4, 0.55], 1.1, [1.0, 1.0], [4.5, 9.0]
]
优化配置
optimizer | lr | lr_config |
---|---|---|
AdamW | 2e-4 | policy=step |
推理记录
模块 | 子模块 | 子模块 | x_size块 | mean |
---|---|---|---|---|
extract_img_feat | image_encoder | img_backbone `ResNet` | ([1, 1024, 16, 44]) ([1, 2048, 8, 22]) | 2,3特征图 |
img_neck `CustomFPN` | ([1, 512, 16, 44]) | 融合后特征 | ||
img_view_transformer | ([1, 59, 16, 44]) | depth | ||
bev_encoder | `CustomResNet` `FPN_LSS` | [1, 256, 128, 128] | BEV特征 | |
pts_bbox_head | CenterHead | `SeparateHead` | Loss | 多任务检测 |
注册
注册机制通过cfg中关键字type
对已经注册类进行对应实现。
obj_type = args.pop('type')
if isinstance(obj_type, str):
obj_cls = registry.get(obj_type)
if obj_cls is None:
raise KeyError(
f'{obj_type} is not in the {registry.name} registry')
elif inspect.isclass(obj_type) or inspect.isfunction(obj_type):
obj_cls = obj_type
else:
raise TypeError(
f'type must be a str or valid type, but got {type(obj_type)}')
try:
return obj_cls(**args)
注意:利用deepcopy
实现参数的传递和隔离
随机种子
在相同中下,随机数相同,即此通过函数实现的随机数为伪随机数。类似为一元函数关系,相同输入产生同一个随机值。特别的是在产生随机数后将会产生新的随机种子,所以在重复使用随机函数时会输出不同的随机值,因为第二次的随机‘种子x’已经不一样了
总结
mmlab框架已经对各个基础模块进行封装,和一些功能模块的解耦。在使用的时候可以不用深究细节,==严禁重复造轮子!!!!==
严禁重复造轮子!!!!