利用MindSpore实现SimTrack

SimTrack是一种用于3D点云目标检测和跟踪的模型,它整合了目标关联、死对象清除和新出生对象检测,降低了跟踪系统的复杂性。该模型使用MindSpore框架实现,能与基于pillar或voxel的3D目标检测算法结合。SimTrack包含检测、运动更新和回归三个分支,通过hybrid-timecenternessmap进行目标关联和位置更新。文章介绍了模型结构、训练和评估过程,并提供了代码链接。
摘要由CSDN通过智能技术生成

目录

1 背景

2 模型介绍

3 准备工作

3.1代码运行环境,安装MindSpore1.8版本

3.2数据集准备

3.3导入python包

4 模型训练

4.1加载yaml文件

4.2初始化数据集

4.3初始化模型

4.4保存网络的checkpoint

5 模型评估

5.1初始化数据集

5.2初始化模型

5.3加载训练好的checkpoint

5.4开始评估

6 参考文献


代码地址:mind3d: mindspore 3D toolkit developed by 3D Lab of Southeast University (gitee.com)

1 背景

        3D多目标跟踪是自动驾驶感知模块的关键技术,因而在近年来吸引了学术界和产业界的持续关注。在2D多目标跟踪领域中,tracking-by-detection是常用的跟踪方法,该方法首先在每一帧上得到检测框,然后匹配帧间的检测框来完成跟踪任务。基于这种方法,很多学者在研究如何更好地利用运动信息、外观特征来定义帧间检测框的匹配度。

        对于3D多目标跟踪领域,tracking-by-detection的方法更是居于主导地位。最近有一些论文研究如何基于已有的3D目标检测算法通过优化数据关联(data association)来提高跟踪器性能,比如AB3DMOT、CenterPoint、PnPNet等。基于tracking-by-detection方法最大的弊端是,启发式匹配步骤(heuristic matching step)通常需要人工设计匹配规则和调试相关参数。这在实际的工程应用中带来了诸多困难:

  1. 人工设计的规则受限于工程师的领域和先验知识,其效果往往不如基于数据驱动的方法好,后者可以通过机器学习从大量数据中自主学习发掘规律;
  2. 调试匹配规则参数时,往往费时费力。比如在无人驾驶场景中需要检测和跟踪多种类别目标(车、行人、两轮车等等),手动调参时,需要针对每一类别进行特定调试。
  3. 传统方法可扩展性比较差,容易重复劳动——这个数据场景调好的参数,可能在另一个数据场景效果不佳,需要重新调试。

2 模型介绍

        SimTrack可以替换传统的tracking-by-detection模式,用于点云3D目标一体化检测和跟踪。该方法可以很方便地与基于pillar或者voxel的3D目标检测算法结合。SimTrack将目标关联、dead object清除、new-born object检测集成在了一起,降低了跟踪系统的复杂程度。。网络结构如下:

       给定原始点云数据,首先使用pillar或voxel方法将其体素化(voxelize),然后使用PointNet提取每个pillar或voxel的特征,在backbone中使用2D或3D卷积操作得到鸟瞰图特征。在detection head中使用centerness map上的位置表示目标所在位置,除了输出centerness map外,detection head还输出目标尺寸和朝向等信息。

具体代码实现可从如下链接获得:

链接:https://gitee.com/gai-shaoyan/ms3d

实现过程中的网络模型功能函数说明如下:

        网络输出3个分支,其一为hybrid-time centerness map分支,用于检测目标在输入的多个点云中首次出现的位置;以方便读取目标的跟踪身份(tracking identity);其二为motion updating分支,预测目标在输入的多个点云中的运动偏移量,用来把目标由首次出现的位置更新到当前所在位置;其三为回归分支,预测目标的其他属性,比如尺寸和朝向等。在推理时,首先将上一时刻推理得到的 updated centerness map 通过自车位姿(ego-motion)转换到当前坐标系下,然后将其与当前时刻的hybrid-time centerness map 融合,并进行阈值判断以去掉dead object;其次从上一时刻的updated centerness map读取跟踪身份到当前时刻的hybrid-time centerness map;最后使用motion updating分支输出的motion信息更新当前目标的位置,得到。结合回归分支输出的目标属性信息,得到最终结果。下面以Head为例展示实现代码:

class SepHead(nn.Cell):
    def __init__(
            self,
            in_channels,
            heads,
            head_conv=64,
            final_kernel=1,
            bn=False,
            init_bias=-2.19,
            **kwargs,
    ):
        super(SepHead, self).__init__(**kwargs)

        self.heads = heads
        for head in self.heads:
            classes, num_conv = self.heads[head]
            fc = nn.SequentialCell()
            if 'hm' in head:
                for i in range(num_conv - 1):
                    fc.append(nn.Conv2d(in_channels, head_conv,
                                        kernel_size=final_kernel, padding=final_kernel // 2,
                                        pad_mode='pad', has_bias=True, weight_init="he_normal"))
                    if bn:
                        fc.append(nn.BatchNorm2d(head_conv, momentum=0.90))
                    fc.append(nn.ReLU())
                fc.append(nn.Conv2d(head_conv, classes,
                                    kernel_size=final_kernel, padding=final_kernel // 2, pad_mode='pad', has_bias=True, weight_init="he_normal"))
            else:
                for i in range(num_conv - 1):
                    fc.append(nn.Conv2d(in_channels, head_conv,
                                        kernel_size=final_kernel, padding=final_kernel // 2, weight_init='HeUniform',
                                        pad_mode='pad', has_bias=True))
                    if bn:
                        fc.append(nn.BatchNorm2d(head_conv, momentum=0.90))
                    fc.append(nn.ReLU())
                fc.append(nn.Conv2d(head_conv, classes,
                                    kernel_size=final_kernel, padding=final_kernel // 2, weight_init='HeUniform',
                                    pad_mode='pad', has_bias=True))

            self.__setattr__(head, fc)

    def construct(self, x):
        ret_dict = dict()
        for head in self.heads:
            ret_dict[head] = self.__getattr__(head)(x)

        return ret_dict


class CenterHeadV2(nn.Cell):      
    def __init__(
            self,
            in_channels,
            tasks,
            weight,
            code_weights,
            common_heads,
            logger=None,
            init_bias=-2.19,
            share_conv_channel=64,
            num_hm_conv=2,
    ):
        super(CenterHeadV2, self).__init__()

        num_classes = []
        for t in tasks:
            num_classes.append(len(t["class_names"]))
        self.class_names = [t["class_names"] for t in tasks]
        self.code_weights = code_weights
        self.weight = weight  # weight between hm loss and loc loss

        self.in_channels = in_channels
        self.num_classes = num_classes

        self.crit = FastFocalLoss()
        self.crit_reg = RegLoss()

        if not logger:
            logger = logging.getLogger("CenterHead")
        self.logger = logger

        logger.info(
            f"num_classes: {num_classes}"
        )

        # a shared convolution
        self.shared_conv = nn.SequentialCell(
            nn.Conv2d(in_channels, share_conv_channel,
                      kernel_size=3, padding=1, pad_mode='pad', has_bias=True, weight_init="he_normal"),
            nn.BatchNorm2d(share_conv_channel),            
            nn.ReLU()
        )

        self.tasks = nn.CellList([])

        for num_cls in num_classes:
            heads = copy.deepcopy(common_heads)
            heads.update(dict(hm=(num_cls, num_hm_conv)))
            self.tasks.append(
                SepHead(share_conv_channel, heads, bn=True, init_bias=init_bias, final_kernel=3)
            )

        logger.info("Finish CenterHead Initialization")

    def construct(self, x):
        ret_dicts = []
        x = self.shared_conv(x)
        
        for task in self.tasks:
            ret_dicts.append(task(x))
            
        return ret_dicts

    def _sigmoid(self, x):
        min_value = Tensor(1e-4, mindspore.float32)
        max_value = Tensor(1 - 1e-4, mindspore.float32)
        sigmoid = nn.Sigmoid()
        y = ops.clip_by_value(sigmoid(x), min_value, max_value)
        return y

    def loss(self, example, preds_dicts, **kwargs):
        rets = []
        
        for task_id, preds_dict in enumerate(preds_dicts):
            # heatmap focal loss
            preds_dict['hm'] = self._sigmoid(preds_dict['hm'])
            hm_idx = 'hm' + str(task_id)
            hm_loss = self.crit(preds_dict['hm'], example[hm_idx], example['ind'][task_id,:,:],
                                example['mask'][task_id,:,:], example['cat'][task_id,:,:])

            target_box = example['anno_box'][task_id,:,:,:]
            # reconstruct the anno_box from multiple reg heads
            cat = ops.Concat(axis=1)
            preds_dict['anno_box'] = cat((preds_dict['reg'], preds_dict['height'], preds_dict['dim'],
                                          preds_dict['vel'], preds_dict['rot']))

            ret = {}

            # Regression loss for dimension, offset, height, rotation
            box_loss = self.crit_reg(preds_dict['anno_box'], example['mask'][task_id,:,:], example['ind'][task_id,:,:],
                                     target_box)

            loc_loss = (box_loss * self.code_weights).sum()

            loss = hm_loss + self.weight * loc_loss

            ret.update({'loss': loss, 'hm_loss': hm_loss, 'loc_loss': loc_loss,
                        'loc_loss_elem': box_loss, 'num_positive': example['mask'][task_id,:,:].sum()})

            rets.append(ret)

        """convert batch-key to key-batch
        """
        rets_merged = defaultdict(list)
        for ret in rets:
            for k, v in ret.items():
                rets_merged[k].append(v)
        return rets_merged

hybrid-time centerness map能够关联前一时刻与当前时刻的检测信息,同时还能滤除消失的目标,也可以检测新出现的目标。使用第时刻和第时刻的点云数据作为网络输入,要求hybrid-time centerness map能够表示目标在输入点云序列中第一次出现的位置,使用中心点表示目标物体。令和分别表示第帧和第帧目标位置的ground truth。对于不同类型的目标,hybrid-time centerness map真值的构造策略如下:

  • 对于在第时刻和第时刻均出现的目标,使用目标在时刻的位置构造centerness map真值。
  • 对于在第时刻出现,但在第时刻消失的目标,将其看作负样本。
  • 对于在第时刻没有,但在第时刻出现的目标,使用目标在第时刻的位置构造centerness map真值。

Motion Updating Branch为了达到跟踪目的,需要引入motionupdating分支来估计输入多帧点云中目标的移动位置。令表示目标的中心点坐标,motion updating 分支用来回归第帧和第帧目标的位置偏差。将该分支的结果作用于hybrid-time centerness map更新目标位置。

其他回归分支:其他回归分支用于输出目标高度、尺寸、朝向。对于朝向,在回归时使用。

3 准备工作

3.1代码运行环境,安装MindSpore1.8版本

Ubuntu 20.04

python=3.8

MindSpore=1.8

cuda=11.1

3.2数据集准备

NuScenes 数据集:

数据集大小:约 300G

    训练集: 700 个 scene

    验证集: 150 个 scene

    测试集: 150 个 scene

    数据格式: .json 文件

网址: https://www.nuscenes.org/nuscenes#overview

3.3导入python包

import os, gc, objgraph, time, datetime
import random
import argparse
import sys
import logging
import psutil
import mindspore.nn as nn
import mindspore as ms
from mindspore.communication import get_rank, get_group_size, init
from mindspore import context, ops, load_checkpoint, load_param_into_net
from mindspore import save_checkpoint
from datasets.utils.builder import build_dataset
from datasets.utils.batch_utils import train_collate, eval_collate
from ms_model.build_sim_model import build_model, get_config
from mindspore import dataset as de
from pathlib import Path
from math import pow
from mindspore.common import set_seed
from mindspore.parallel._utils import (_get_device_num, _get_gradients_mean,
                                       _get_parallel_mode)

4 模型训练

4.1加载yaml文件

parser.add_argument("--config",
                        default='/home/zhangcan/zad/ms_sim/simtrack.yaml')
args = parser.parse_args()
cfg_path = Path(args.config)
cfg = get_config(cfg_path)

def get_config(cfg_path):
    """get config"""
    with open(cfg_path, 'r') as f:
        cfg = yaml.load(f, yaml.Loader)
    return cfg

4.2初始化数据集

train_dataset = build_dataset(cfg['data']['train'])
batch_size = cfg['data']['samples_per_gpu']
 if args.device_num==1:
        train_ds = de.GeneratorDataset(train_dataset, column_names=cfg['train_column_names'], shuffle=False, num_parallel_workers=1)
else:
        rank_id = get_rank()
        rank_size = get_group_size()
        print(f"using {rank_size} GPUs, id is {rank_id}")
        train_ds = de.GeneratorDataset(train_dataset, column_names=cfg['train_column_names'], shuffle=True, num_shards=rank_size, shard_id=rank_id, num_parallel_workers=2, python_multiprocessing=False) #, num_parallel_workers=2, python_multiprocessing=False
train_ds = train_ds.batch(batch_size=batch_size, input_columns=cfg['train_column_names'], drop_remainder=True, per_batch_map=train_collate)

4.3初始化模型

model = build_model(model_cfg=cfg['model'])
model.CLASSES = train_dataset.CLASSES
ckpt = args.checkpoint
if ckpt and (args.start_epoch!=-1) and (args.start_epoch!=1):
        ms_checkpoint = load_checkpoint(ckpt+"/epoch_{}.ckpt".format(args.start_epoch-1))
        ms_checkpoint.items()
        load_param_into_net(model, ms_checkpoint)

    milestone=[]
    if args.start_epoch==-1:
        cfg['total_epochs']=20
    else:
        cfg['total_epochs']=20-args.start_epoch+1
    for i in range(cfg['total_epochs']):
        milestone.append(steps_per_epoch*(i+1))
    learning_rates =[x*0.4 for x in [ 0.00040, 0.00054, 0.00094, 0.00152, 0.00221, 0.00290, 0.00348, 0.00387, 0.00400, 0.00393, 0.00373,
                        0.00299, 0.00251, 0.00199, 0.00147, 0.00099, 0.00058, 0.00026, 0.00007, 0.00001]] 
    
lr = nn.piecewise_constant_lr(milestone, learning_rates[20-cfg['total_epochs']:])
opt = nn.AdamWeightDecay(model.trainable_params(), learning_rate=lr, beta2=0.99, weight_decay=0.01)
    loss_net = MyWithLossCell(model)
    train_net = nn.TrainOneStepCell(loss_net, opt)

4.4保存网络的checkpoint

filename_tmpl = "epoch_{}"
filename = filename_tmpl.format(epoch+21-cfg['total_epochs'])
print(filename)
savepath = os.path.join(output_dir, filename)
save_checkpoint(model, savepath)

5 模型评估

5.1初始化数据集

dataset = build_dataset(cfg['data']['val'])
ds = de.GeneratorDataset(dataset, column_names=cfg['eval_column_names'], shuffle=False) drop_remainder=True)

5.2初始化模型

 ms_model = build_model(model_cfg=cfg['model'])

5.3加载训练好的checkpoint

ckpt = args.checkpoint
print(ckpt)
ms_checkpoint = load_checkpoint(ckpt)
ms_checkpoint.items()
load_param_into_net(ms_model, ms_checkpoint)

5.4开始评估

prev_detections = {}
nusc = NuScenes(version='v1.0-trainval', dataroot='/data0/HR_dataset/2023AAAI/2_liu/mit_bevfusion/data/nuscenes/', verbose=True)
grids = meshgrid(size_w, size_h)
all_predictions = [detections]
predictions = {}
for p in all_predictions:
        predictions.update(p)

if not os.path.exists(args.work_dir):
        os.makedirs(args.work_dir)

# args.eval_det = True
if args.eval_det:
        result_dict, _ = dataset.evaluation(copy.deepcopy(predictions), output_dir=args.work_dir, testset=False)
        if result_dict is not None:
            for k, v in result_dict["results"].items():
                print(f"Evaluation {k}: {v}")

# eval tracking
 dataset.evaluation_tracking(copy.deepcopy(predictions), output_dir=args.work_dir, testset=False)

6 参考文献

# simtrack
            @InProceedings{Luo_2021_ICCV,
   author    = {Luo, Chenxu and Yang, Xiaodong and Yuille, Alan},
    title     = {Exploring Simple 3D Multi-Object Tracking for Autonomous Driving},
    booktitle = {International Conference on Computer Vision (ICCV)},
    year      = {2021}
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值