网上很多文章只提了NuScenes坐标系有哪些,但是没有找到一篇文章把NuScenes坐标系的转换结合代码说全说明白了,结合工作需要,针对OpenPCDet里读取NuScenes数据集对数据做坐标转换的代码说一下。
OpenPCDet是3D点云目标检测框架,顾名思义,模型的输入数据都是点云数据!只有在近来OpenPCDet把多模态融合模型BEVFusion加入进来后才考虑了图像和点云两种输入数据!所以模型在加载NuScenes数据集或者其他数据集时都得把其他坐标系下的标注数据转换到激光雷达坐标系下!
另外,由于NuScenes数据集采集数据时是使用的32线扫描式激光雷达,这种雷达的点数实在是有限,和现在市场上自驾使用的128线扫描式激光雷达或固态激光雷达的密集点云比,成像效果真是太差了(有点云的范围内稍远一点的目标身上就那么几个点,让人去分辨是目标是人还是车还是其他物体都困难,就别说让模型学习特征差异了)!所以纯单帧点云的检测效果肯定很差!所以一般模型使用NuScenes数据集做实验时都对点云做了多帧叠加(也就是以ref frame为基准把多个sweep frame里的点云和ref frame的点云合并,因为标注是标在ref frame上的)!这时也需要用到坐标转换!因为采集数据的自车是运动的,扫描帧采集时的激光雷达坐标原点和朝向和关键帧采集时的激光雷达坐标的原点和朝向是不一样的,所以做点云叠加时必须要考虑运动补偿做坐标转换!
NuScenes数据集经常参与数据的坐标系转换的主要有四个坐标系,下面列一下要点:
1. 四个坐标系:全局坐标系(地图坐标系)、自车坐标系、相机坐标系、雷达坐标系(含激光雷达和毫米波雷达,激光雷达只有一个安装在车顶的TOP雷达,以TOP为reference),像素坐标系和图像坐标系和相机坐标系之间是相机内参决定的,通常换算到相机坐标系下,不单独提。
2. 标注数据的bbox是在全局坐标系下标注的
3. 重要数据的概念:
-
scene:采集场景
[token, log_token,nbr_samples,first_sample_token,last_sample_token,
name,description]
-
sensor:传感器。
[token,channel,modality], modelity指{camera,lidar,radar}
-
calibrated_sensor:传感器相对自车坐标系的标定参数[token,sensor_token,translation,roation,camera_intrinsic]
-
ego_pose:自车在特定时间戳的pose,基于激光雷达地图的定位算法给出 [token,rimestamp,rotation,translation]
-
sample_data:某时刻(timestamp)的传感器数据,例如图片,点云或者Radar等 [token,sample_token,ego_pose_token,calibrated_sensor_token,timestamp,fileformat,is_key_frame,height,width,filename,prev,next]
-
sample_annotation:标注数据[token,sample_token,instance_token,visibility_token,attribution_tokens, translation,size, rotation, prev,next, num_lidar_pts, num_radar_pts]
4.OpenPCDet对输入的NuScenes数据的处理:
执行下面的命令生成用于训练的pkl文件:
python -m pcdet.datasets.nuscenes.nuscenes_dataset --func create_nuscenes_infos
--cfg_file tools/cfgs/dataset_configs/nuscenes_dataset.yaml
--version v1.0-trainval
此处调用的 pcdet/datasets/nuscenes/nuscenes_dataset.py里的create_nuscenes_infos()函数。
def create_nuscenes_infos(version, data_path, save_path, max_sweeps=10, with_cam=False)
--> train_nusc_infos, val_nusc_infos = nuscenes_utils.fill_trainval_infos(
data_path=data_path, nusc=nusc, train_scenes=train_scenes, val_scenes=val_scenes,
test='test' in version, max_sweeps=max_sweeps, with_cam=with_cam)
pcdet/datasets/nuscenes/nuscenes_utils.py:
def fill_trainval_infos(data_path, nusc, train_scenes, val_scenes, test=False, max_sweeps=10, with_cam=False)
--> ref_lidar_path, ref_boxes, _ = get_sample_data(nusc, ref_sd_token)
…
tm = reduce(np.dot, [ref_from_car, car_from_global, global_from_car, car_from_current])
lidar_path = nusc.get_sample_data_path(curr_sd_rec['token'])
time_lag = ref_time - 1e-6 * curr_sd_rec['timestamp']
sweep = {
'lidar_path': Path(lidar_path).relative_to(data_path).__str__(),
'sample_data_token': curr_sd_rec['token'],
'transform_matrix': tm,
'global_from_car': global_from_car,
'car_from_current': car_from_current,
'time_lag': time_lag,
}
sweeps.append(sweep)
…
if not test:
annotations = [nusc.get('sample_annotation', token) for token in sample['anns']]
# the filtering gives 0.5~1 map improvement
num_lidar_pts = np.array([anno['num_lidar_pts'] for anno in annotations])
num_radar_pts = np.array([anno['num_radar_pts'] for anno in annotations])
mask = (num_lidar_pts + num_radar_pts > 0)
locs = np.array([b.center for b in ref_boxes]).reshape(-1, 3)
dims = np.array([b.wlh for b in ref_boxes]).reshape(-1, 3)[:, [1, 0, 2]] # wlh == > dxdydz (lwh)
velocity = np.array([b.velocity for b in ref_boxes]).reshape(-1, 3)
rots = np.array([quaternion_yaw(b.orientation) for b in ref_boxes]).reshape(-1, 1)
names = np.array([b.name for b in ref_boxes])
tokens = np.array([b.token for b in ref_boxes])
gt_boxes = np.concatenate([locs, dims, rots, velocity[:, :2]], axis=1)
assert len(annotations) == len(gt_boxes) == len(velocity)
info['gt_boxes'] = gt_boxes[mask, :]
info['gt_boxes_velocity'] = velocity[mask, :]
info['gt_names'] = np.array([map_name_from_general_to_detection[name] for name in names])[mask]
info['gt_boxes_token'] = tokens[mask]
info['num_lidar_pts'] = num_lidar_pts[mask]
info['num_radar_pts'] = num_radar_pts[mask]
if sample['scene_token'] in train_scenes:
train_nusc_infos.append(info)
else:
val_nusc_infos.append(info)
progress_bar.close()
return train_nusc_infos, val_nusc_infos
def get_sample_data(nusc, sample_data_token, selected_anntokens=None):
--> …
if selected_anntokens is not None:
boxes = list(map(nusc.get_box, selected_anntokens))
else:
boxes = nusc.get_boxes(sample_data_token)
# Make list of Box objects including coord system transforms.
box_list = []
for box in boxes:
box.velocity = nusc.box_velocity(box.token)
#下面将bbox从全局坐标系转到传感器(例如lidar)的坐标系下:global --> ego --> sensor
# Move box to ego vehicle coord system
box.translate(-np.array(pose_record['translation']))
box.rotate(Quaternion(pose_record['rotation']).inverse)
# Move box to sensor coord system
box.translate(-np.array(cs_record['translation']))
box.rotate(Quaternion(cs_record['rotation']).inverse)
box_list.append(box)
return data_path, box_list, cam_intrinsic
nuscenes/nuscenes.py:
//读取标注数据
def get_box(self, sample_annotation_token: str) -> Box:
"""
Instantiates a Box class from a sample annotation record.
:param sample_annotation_token: Unique sample_annotation identifier.
"""
record = self.get('sample_annotation', sample_annotation_token)
return Box(record['translation'], record['size'], Quaternion(record['rotation']),
name=record['category_name'], token=record['token'])
装载数据
装载数据时,self.infos里的bbox已经在前面生成pkl文件时从全局坐标系转换到了对应的关键帧所在的激光雷达坐标系下,所以剩下的重点就是get_lidar_with_sweeps ()里把sweep帧数据从各自帧采集当时所在的激光雷达坐标系下统一到关键帧采集时所在的激光雷达坐标系下,统一坐标的运算在 get_sweep()里,使用transform_matrix矩阵,也就是
tm = reduce(np.dot, [ref_from_car, car_from_global, global_from_car, car_from_current])
通过current->ego->global->ego->ref四次转换将每个sweep帧的数据转换到关键帧采集时所在的激光雷达坐标系下。
激光雷达采集点云的频率一般是10HZ,关键帧的抽取频率是2HZ,也就是每秒采集10帧点云,每5帧抽取一帧作为关键帧,剩下的都是sweep帧,sweep帧转换到关键帧的坐标系下时需要先转到自车坐标系下再转到全局坐标系下再转换回自车坐标系下,这么做是需要使用采集每个sweep帧时的ego_data自车在全局坐标系下的位姿数据做运动补偿,然后再从自车坐标系转到关键帧所在的激光雷达坐标系下。
采取关键帧和sweep帧这种做法,目的我想有两个:一个是为了减少标注工作量,另一个是为了加密点云,在5帧也就是0.5秒内,可以认为自车周边的物体移动幅度很小,每5帧可以共用关键帧的标注数据,5帧的点云合一起加密点云增强物体的特征让模型学习和检测都更容易。后者应该是主要目的,因为NuScenes点云数据采用的扫描式32线激光雷达采集的,比较稀疏,和现在市场上的128线雷达的点云比起来惨不忍睹。
pcdet/datasets/nuscenes/nuscenes_dataset.py
class NuScenesDataset(DatasetTemplate):
…
def __getitem__(self, index):
if self._merge_all_iters_to_one_epoch:
index = index % len(self.infos)
info = copy.deepcopy(self.infos[index])
points = self.get_lidar_with_sweeps(index, max_sweeps=self.dataset_cfg.MAX_SWEEPS)
input_dict = {
'points': points,
'frame_id': Path(info['lidar_path']).stem,
'metadata': {'token': info['token']}
}
if 'gt_boxes' in info:
if self.dataset_cfg.get('FILTER_MIN_POINTS_IN_GT', False):
mask = (info['num_lidar_pts'] > self.dataset_cfg.FILTER_MIN_POINTS_IN_GT - 1)
else:
mask = None
input_dict.update({
'gt_names': info['gt_names'] if mask is None else info['gt_names'][mask],
'gt_boxes': info['gt_boxes'] if mask is None else info['gt_boxes'][mask]
})
…
data_dict = self.prepare_data(data_dict=input_dict)
…
return data_dict
def get_lidar_with_sweeps(self, index, max_sweeps=1):
info = self.infos[index]
lidar_path = self.root_path / info['lidar_path']
points = np.fromfile(str(lidar_path), dtype=np.float32, count=-1).reshape([-1, 5])[:, :4]
sweep_points_list = [points]
sweep_times_list = [np.zeros((points.shape[0], 1))]
for k in np.random.choice(len(info['sweeps']), max_sweeps - 1, replace=False):
points_sweep, times_sweep = self.get_sweep(info['sweeps'][k])
sweep_points_list.append(points_sweep)
sweep_times_list.append(times_sweep)
points = np.concatenate(sweep_points_list, axis=0)
times = np.concatenate(sweep_times_list, axis=0).astype(points.dtype)
points = np.concatenate((points, times), axis=1)
return points
def get_sweep(self, sweep_info):
def remove_ego_points(points, center_radius=1.0):
mask = ~((np.abs(points[:, 0]) < center_radius) & (np.abs(points[:, 1]) < center_radius))
return points[mask]
lidar_path = self.root_path / sweep_info['lidar_path']
points_sweep = np.fromfile(str(lidar_path), dtype=np.float32, count=-1).reshape([-1, 5])[:, :4]
points_sweep = remove_ego_points(points_sweep).T
if sweep_info['transform_matrix'] is not None:
num_points = points_sweep.shape[1]
//需要将所有sweep帧的坐标系统一到关键帧的坐标系下(因为自车在移动)
points_sweep[:3, :] = sweep_info['transform_matrix'].dot(
np.vstack((points_sweep[:3, :], np.ones(num_points))))[:3, :]
cur_times = sweep_info['time_lag'] * np.ones((1, points_sweep.shape[1]))
return points_sweep.T, cur_times.T
5.对模型输出的检测结果做evaluation时的处理:
将检测结果数据从sensor坐标系下转换回全局坐标系下
tools/eval_utils/eval_utils.py里eval_one_epoch()函数调用
def eval_one_epoch(cfg, args, model, dataloader, epoch_id, logger, dist_test=False, result_dir=None):
...
result_str, result_dict = dataset.evaluation(
det_annos, class_names,
eval_metric=cfg.MODEL.POST_PROCESSING.EVAL_METRIC,
output_path=final_output_dir
)
pcdet/datasets/nuscenes/nuscenes_dataset.py
def evaluation(self, det_annos, class_names, **kwargs):
import json
from nuscenes.nuscenes import NuScenes
from . import nuscenes_utils
nusc = NuScenes(version=self.dataset_cfg.VERSION, dataroot=str(self.root_path), verbose=True)
nusc_annos = nuscenes_utils.transform_det_annos_to_nusc_annos(det_annos, nusc)
nusc_annos['meta'] = {
'use_camera': False,
'use_lidar': True,
'use_radar': False,
'use_map': False,
'use_external': False,
}
output_path = Path(kwargs['output_path'])
output_path.mkdir(exist_ok=True, parents=True)
res_path = str(output_path / 'results_nusc.json')
with open(res_path, 'w') as f:
json.dump(nusc_annos, f)
…
nusc_eval = NuScenesEval(
nusc,
config=eval_config,
result_path=res_path,
eval_set=eval_set_map[self.dataset_cfg.VERSION],
output_dir=str(output_path),
verbose=True,
)
metrics_summary = nusc_eval.main(plot_examples=0, render_curves=False)
with open(output_path / 'metrics_summary.json', 'r') as f:
metrics = json.load(f)
result_str, result_dict = nuscenes_utils.format_nuscene_results(metrics, self.class_names, version=eval_version)
return result_str, result_dict
def transform_det_annos_to_nusc_annos(det_annos, nusc):
nusc_annos = {
'results': {},
'meta': None,
}
for det in det_annos:
annos = []
box_list = boxes_lidar_to_nusenes(det)
box_list = lidar_nusc_box_to_global(
nusc=nusc, boxes=box_list, sample_token=det['metadata']['token']
)
…
//转换成nuscenes格式,但不涉及坐标系变换
def boxes_lidar_to_nusenes(det_info):
boxes3d = det_info['boxes_lidar']
scores = det_info['score']
labels = det_info['pred_labels']
box_list = []
for k in range(boxes3d.shape[0]):
quat = Quaternion(axis=[0, 0, 1], radians=boxes3d[k, 6])
velocity = (*boxes3d[k, 7:9], 0.0) if boxes3d.shape[1] == 9 else (0.0, 0.0, 0.0)
box = Box(
boxes3d[k, :3],
boxes3d[k, [4, 3, 5]], # wlh
quat, label=labels[k], score=scores[k], velocity=velocity,
)
box_list.append(box)
return box_list
//将bbox从sensor坐标系下转到全局坐标系下
def lidar_nusc_box_to_global(nusc, boxes, sample_token):
s_record = nusc.get('sample', sample_token)
sample_data_token = s_record['data']['LIDAR_TOP']
sd_record = nusc.get('sample_data', sample_data_token)
cs_record = nusc.get('calibrated_sensor', sd_record['calibrated_sensor_token'])
sensor_record = nusc.get('sensor', cs_record['sensor_token'])
pose_record = nusc.get('ego_pose', sd_record['ego_pose_token'])
data_path = nusc.get_sample_data_path(sample_data_token)
box_list = []
for box in boxes:
# Move box to ego vehicle coord system
box.rotate(Quaternion(cs_record['rotation']))
box.translate(np.array(cs_record['translation']))
# Move box to global coord system
box.rotate(Quaternion(pose_record['rotation']))
box.translate(np.array(pose_record['translation']))
box_list.append(box)
return box_list