目录
代码地址:mind3d: mindspore 3D toolkit developed by 3D Lab of Southeast University (gitee.com)
1 背景
3D多目标跟踪是自动驾驶感知模块的关键技术,因而在近年来吸引了学术界和产业界的持续关注。在2D多目标跟踪领域中,tracking-by-detection是常用的跟踪方法,该方法首先在每一帧上得到检测框,然后匹配帧间的检测框来完成跟踪任务。基于这种方法,很多学者在研究如何更好地利用运动信息、外观特征来定义帧间检测框的匹配度。
对于3D多目标跟踪领域,tracking-by-detection的方法更是居于主导地位。最近有一些论文研究如何基于已有的3D目标检测算法通过优化数据关联(data association)来提高跟踪器性能,比如AB3DMOT、CenterPoint、PnPNet等。基于tracking-by-detection方法最大的弊端是,启发式匹配步骤(heuristic matching step)通常需要人工设计匹配规则和调试相关参数。这在实际的工程应用中带来了诸多困难:
- 人工设计的规则受限于工程师的领域和先验知识,其效果往往不如基于数据驱动的方法好,后者可以通过机器学习从大量数据中自主学习发掘规律;
- 调试匹配规则参数时,往往费时费力。比如在无人驾驶场景中需要检测和跟踪多种类别目标(车、行人、两轮车等等),手动调参时,需要针对每一类别进行特定调试。
- 传统方法可扩展性比较差,容易重复劳动——这个数据场景调好的参数,可能在另一个数据场景效果不佳,需要重新调试。
2 模型介绍
SimTrack可以替换传统的tracking-by-detection模式,用于点云3D目标一体化检测和跟踪。该方法可以很方便地与基于pillar或者voxel的3D目标检测算法结合。SimTrack将目标关联、dead object清除、new-born object检测集成在了一起,降低了跟踪系统的复杂程度。。网络结构如下:
给定原始点云数据,首先使用pillar或voxel方法将其体素化(voxelize),然后使用PointNet提取每个pillar或voxel的特征,在backbone中使用2D或3D卷积操作得到鸟瞰图特征。在detection head中使用centerness map上的位置表示目标所在位置,除了输出centerness map外,detection head还输出目标尺寸和朝向等信息。
具体代码实现可从如下链接获得:
链接:https://gitee.com/gai-shaoyan/ms3d
实现过程中的网络模型功能函数说明如下:
网络输出3个分支,其一为hybrid-time centerness map分支,用于检测目标在输入的多个点云中首次出现的位置;以方便读取目标的跟踪身份(tracking identity);其二为motion updating分支,预测目标在输入的多个点云中的运动偏移量,用来把目标由首次出现的位置更新到当前所在位置;其三为回归分支,预测目标的其他属性,比如尺寸和朝向等。在推理时,首先将上一时刻推理得到的 updated centerness map 通过自车位姿(ego-motion)转换到当前坐标系下,然后将其与当前时刻的hybrid-time centerness map 融合,并进行阈值判断以去掉dead object;其次从上一时刻的updated centerness map读取跟踪身份到当前时刻的hybrid-time centerness map;最后使用motion updating分支输出的motion信息更新当前目标的位置,得到。结合回归分支输出的目标属性信息,得到最终结果。下面以Head为例展示实现代码:
class SepHead(nn.Cell):
def __init__(
self,
in_channels,
heads,
head_conv=64,
final_kernel=1,
bn=False,
init_bias=-2.19,
**kwargs,
):
super(SepHead, self).__init__(**kwargs)
self.heads = heads
for head in self.heads:
classes, num_conv = self.heads[head]
fc = nn.SequentialCell()
if 'hm' in head:
for i in range(num_conv - 1):
fc.append(nn.Conv2d(in_channels, head_conv,
kernel_size=final_kernel, padding=final_kernel // 2,
pad_mode='pad', has_bias=True, weight_init="he_normal"))
if bn:
fc.append(nn.BatchNorm2d(head_conv, momentum=0.90))
fc.append(nn.ReLU())
fc.append(nn.Conv2d(head_conv, classes,
kernel_size=final_kernel, padding=final_kernel // 2, pad_mode='pad', has_bias=True, weight_init="he_normal"))
else:
for i in range(num_conv - 1):
fc.append(nn.Conv2d(in_channels, head_conv,
kernel_size=final_kernel, padding=final_kernel // 2, weight_init='HeUniform',
pad_mode='pad', has_bias=True))
if bn:
fc.append(nn.BatchNorm2d(head_conv, momentum=0.90))
fc.append(nn.ReLU())
fc.append(nn.Conv2d(head_conv, classes,
kernel_size=final_kernel, padding=final_kernel // 2, weight_init='HeUniform',
pad_mode='pad', has_bias=True))
self.__setattr__(head, fc)
def construct(self, x):
ret_dict = dict()
for head in self.heads:
ret_dict[head] = self.__getattr__(head)(x)
return ret_dict
class CenterHeadV2(nn.Cell):
def __init__(
self,
in_channels,
tasks,
weight,
code_weights,
common_heads,
logger=None,
init_bias=-2.19,
share_conv_channel=64,
num_hm_conv=2,
):
super(CenterHeadV2, self).__init__()
num_classes = []
for t in tasks:
num_classes.append(len(t["class_names"]))
self.class_names = [t["class_names"] for t in tasks]
self.code_weights = code_weights
self.weight = weight # weight between hm loss and loc loss
self.in_channels = in_channels
self.num_classes = num_classes
self.crit = FastFocalLoss()
self.crit_reg = RegLoss()
if not logger:
logger = logging.getLogger("CenterHead")
self.logger = logger
logger.info(
f"num_classes: {num_classes}"
)
# a shared convolution
self.shared_conv = nn.SequentialCell(
nn.Conv2d(in_channels, share_conv_channel,
kernel_size=3, padding=1, pad_mode='pad', has_bias=True, weight_init="he_normal"),
nn.BatchNorm2d(share_conv_channel),
nn.ReLU()
)
self.tasks = nn.CellList([])
for num_cls in num_classes:
heads = copy.deepcopy(common_heads)
heads.update(dict(hm=(num_cls, num_hm_conv)))
self.tasks.append(
SepHead(share_conv_channel, heads, bn=True, init_bias=init_bias, final_kernel=3)
)
logger.info("Finish CenterHead Initialization")
def construct(self, x):
ret_dicts = []
x = self.shared_conv(x)
for task in self.tasks:
ret_dicts.append(task(x))
return ret_dicts
def _sigmoid(self, x):
min_value = Tensor(1e-4, mindspore.float32)
max_value = Tensor(1 - 1e-4, mindspore.float32)
sigmoid = nn.Sigmoid()
y = ops.clip_by_value(sigmoid(x), min_value, max_value)
return y
def loss(self, example, preds_dicts, **kwargs):
rets = []
for task_id, preds_dict in enumerate(preds_dicts):
# heatmap focal loss
preds_dict['hm'] = self._sigmoid(preds_dict['hm'])
hm_idx = 'hm' + str(task_id)
hm_loss = self.crit(preds_dict['hm'], example[hm_idx], example['ind'][task_id,:,:],
example['mask'][task_id,:,:], example['cat'][task_id,:,:])
target_box = example['anno_box'][task_id,:,:,:]
# reconstruct the anno_box from multiple reg heads
cat = ops.Concat(axis=1)
preds_dict['anno_box'] = cat((preds_dict['reg'], preds_dict['height'], preds_dict['dim'],
preds_dict['vel'], preds_dict['rot']))
ret = {}
# Regression loss for dimension, offset, height, rotation
box_loss = self.crit_reg(preds_dict['anno_box'], example['mask'][task_id,:,:], example['ind'][task_id,:,:],
target_box)
loc_loss = (box_loss * self.code_weights).sum()
loss = hm_loss + self.weight * loc_loss
ret.update({'loss': loss, 'hm_loss': hm_loss, 'loc_loss': loc_loss,
'loc_loss_elem': box_loss, 'num_positive': example['mask'][task_id,:,:].sum()})
rets.append(ret)
"""convert batch-key to key-batch
"""
rets_merged = defaultdict(list)
for ret in rets:
for k, v in ret.items():
rets_merged[k].append(v)
return rets_merged
hybrid-time centerness map:能够关联前一时刻与当前时刻的检测信息,同时还能滤除消失的目标,也可以检测新出现的目标。使用第时刻和第时刻的点云数据作为网络输入,要求hybrid-time centerness map能够表示目标在输入点云序列中第一次出现的位置,使用中心点表示目标物体。令和分别表示第帧和第帧目标位置的ground truth。对于不同类型的目标,hybrid-time centerness map真值的构造策略如下:
- 对于在第时刻和第时刻均出现的目标,使用目标在时刻的位置构造centerness map真值。
- 对于在第时刻出现,但在第时刻消失的目标,将其看作负样本。
- 对于在第时刻没有,但在第时刻出现的目标,使用目标在第时刻的位置构造centerness map真值。
Motion Updating Branch:为了达到跟踪目的,需要引入motionupdating分支来估计输入多帧点云中目标的移动位置。令表示目标的中心点坐标,motion updating 分支用来回归第帧和第帧目标的位置偏差。将该分支的结果作用于hybrid-time centerness map更新目标位置。
其他回归分支:其他回归分支用于输出目标高度、尺寸、朝向。对于朝向,在回归时使用。
3 准备工作
3.1代码运行环境,安装MindSpore1.8版本
Ubuntu 20.04
python=3.8
MindSpore=1.8
cuda=11.1
3.2数据集准备
NuScenes 数据集:
数据集大小:约 300G
训练集: 700 个 scene
验证集: 150 个 scene
测试集: 150 个 scene
数据格式: .json 文件
网址: https://www.nuscenes.org/nuscenes#overview
3.3导入python包
import os, gc, objgraph, time, datetime
import random
import argparse
import sys
import logging
import psutil
import mindspore.nn as nn
import mindspore as ms
from mindspore.communication import get_rank, get_group_size, init
from mindspore import context, ops, load_checkpoint, load_param_into_net
from mindspore import save_checkpoint
from datasets.utils.builder import build_dataset
from datasets.utils.batch_utils import train_collate, eval_collate
from ms_model.build_sim_model import build_model, get_config
from mindspore import dataset as de
from pathlib import Path
from math import pow
from mindspore.common import set_seed
from mindspore.parallel._utils import (_get_device_num, _get_gradients_mean,
_get_parallel_mode)
4 模型训练
4.1加载yaml文件
parser.add_argument("--config",
default='/home/zhangcan/zad/ms_sim/simtrack.yaml')
args = parser.parse_args()
cfg_path = Path(args.config)
cfg = get_config(cfg_path)
def get_config(cfg_path):
"""get config"""
with open(cfg_path, 'r') as f:
cfg = yaml.load(f, yaml.Loader)
return cfg
4.2初始化数据集
train_dataset = build_dataset(cfg['data']['train'])
batch_size = cfg['data']['samples_per_gpu']
if args.device_num==1:
train_ds = de.GeneratorDataset(train_dataset, column_names=cfg['train_column_names'], shuffle=False, num_parallel_workers=1)
else:
rank_id = get_rank()
rank_size = get_group_size()
print(f"using {rank_size} GPUs, id is {rank_id}")
train_ds = de.GeneratorDataset(train_dataset, column_names=cfg['train_column_names'], shuffle=True, num_shards=rank_size, shard_id=rank_id, num_parallel_workers=2, python_multiprocessing=False) #, num_parallel_workers=2, python_multiprocessing=False
train_ds = train_ds.batch(batch_size=batch_size, input_columns=cfg['train_column_names'], drop_remainder=True, per_batch_map=train_collate)
4.3初始化模型
model = build_model(model_cfg=cfg['model'])
model.CLASSES = train_dataset.CLASSES
ckpt = args.checkpoint
if ckpt and (args.start_epoch!=-1) and (args.start_epoch!=1):
ms_checkpoint = load_checkpoint(ckpt+"/epoch_{}.ckpt".format(args.start_epoch-1))
ms_checkpoint.items()
load_param_into_net(model, ms_checkpoint)
milestone=[]
if args.start_epoch==-1:
cfg['total_epochs']=20
else:
cfg['total_epochs']=20-args.start_epoch+1
for i in range(cfg['total_epochs']):
milestone.append(steps_per_epoch*(i+1))
learning_rates =[x*0.4 for x in [ 0.00040, 0.00054, 0.00094, 0.00152, 0.00221, 0.00290, 0.00348, 0.00387, 0.00400, 0.00393, 0.00373,
0.00299, 0.00251, 0.00199, 0.00147, 0.00099, 0.00058, 0.00026, 0.00007, 0.00001]]
lr = nn.piecewise_constant_lr(milestone, learning_rates[20-cfg['total_epochs']:])
opt = nn.AdamWeightDecay(model.trainable_params(), learning_rate=lr, beta2=0.99, weight_decay=0.01)
loss_net = MyWithLossCell(model)
train_net = nn.TrainOneStepCell(loss_net, opt)
4.4保存网络的checkpoint
filename_tmpl = "epoch_{}"
filename = filename_tmpl.format(epoch+21-cfg['total_epochs'])
print(filename)
savepath = os.path.join(output_dir, filename)
save_checkpoint(model, savepath)
5 模型评估
5.1初始化数据集
dataset = build_dataset(cfg['data']['val'])
ds = de.GeneratorDataset(dataset, column_names=cfg['eval_column_names'], shuffle=False) drop_remainder=True)
5.2初始化模型
ms_model = build_model(model_cfg=cfg['model'])
5.3加载训练好的checkpoint
ckpt = args.checkpoint
print(ckpt)
ms_checkpoint = load_checkpoint(ckpt)
ms_checkpoint.items()
load_param_into_net(ms_model, ms_checkpoint)
5.4开始评估
prev_detections = {}
nusc = NuScenes(version='v1.0-trainval', dataroot='/data0/HR_dataset/2023AAAI/2_liu/mit_bevfusion/data/nuscenes/', verbose=True)
grids = meshgrid(size_w, size_h)
all_predictions = [detections]
predictions = {}
for p in all_predictions:
predictions.update(p)
if not os.path.exists(args.work_dir):
os.makedirs(args.work_dir)
# args.eval_det = True
if args.eval_det:
result_dict, _ = dataset.evaluation(copy.deepcopy(predictions), output_dir=args.work_dir, testset=False)
if result_dict is not None:
for k, v in result_dict["results"].items():
print(f"Evaluation {k}: {v}")
# eval tracking
dataset.evaluation_tracking(copy.deepcopy(predictions), output_dir=args.work_dir, testset=False)
6 参考文献
# simtrack
@InProceedings{Luo_2021_ICCV,
author = {Luo, Chenxu and Yang, Xiaodong and Yuille, Alan},
title = {Exploring Simple 3D Multi-Object Tracking for Autonomous Driving},
booktitle = {International Conference on Computer Vision (ICCV)},
year = {2021}
}