处理KITTI数据集时的一些坑

Glingli

已于 2023-12-01 10:52:28 修改

阅读量251

点赞数

文章标签： python 目标检测 3d

于 2023-11-30 16:51:57 首次发布

本文链接：https://blog.csdn.net/qq_44852799/article/details/134714691

版权

原始数据

原始数据文件包括 training 和 testing 。对于 testing 没有什么好说的，因为就是一些点云文件，但是对于training需要我们自己划分val，同时训练时的真值都需要从其中的标注文件中读取。通常来说对于仅使用点云数据的同学们，我们的training文件夹下有
label_2：这是一个标注文件，标注每一个帧下的所有ground truth。
velodyne：这是蕴含点云数据的bin文件。
calib：由于KITTI使用了两个坐标系，即图像坐标系和激光雷达坐标系，对于其中的一些畸变和转化，可以使用这里的标定矩阵进行处理。
下面来逐个讲讲里面的内容，对于论文和其他人说的不明白的可能有坑的地方，我都用这种方法高亮标出了。

坐标系

在这里插入图片描述
左边是汽车行进的方向，也就是说可见光坐标系下车沿着z轴走，激光雷达坐标系下车沿着x轴走。两坐标系都是右手系，即右手四指指向x方向，握紧的方向就是x到y的方向，大拇指指向z的方向。

由于激光雷达和camera不共轴，所以必须使用calib进行转换。

label

label中的文件都是以txt的格式存储的，其中的内容如下：

Pedestrian 0.00 0 -0.20 712.40 143.00 810.73 307.92 1.89 0.48 1.20 1.84 1.47 8.41 0.01

第1列
目标类比别（type），共有8种类别，分别是Car、Pedestrian、Cyclist、或’DontCare。DontCare表示某些区域是有目标的，但是由于一些原因没有做标注，比如距离激光雷达过远。但实际算法可能会检测到该目标，但没有标注，这样会被当作false positive （FP）。这是不合理的。用DontCare标注后，评估时将会自动忽略这个区域的预测结果，相当于没有检测到目标，这样就不会增加FP的数量了。此外，在 2D 与 3D Detection Benchmark 中只针对 Car、Pedestrain、Cyclist 这三类。

第2列
截断程度（truncated），表示处于边缘目标的截断程度，取值范围为0~1，0表示没有截断，取值越大表示截断程度越大。处于边缘的目标可能只有部分出现在视野当中，这种情况被称为截断。

第3列
遮挡程度（occlude），取值为（0，1，2，3）。0表示完全可见，1表示小部分遮挡，2表示大部分遮挡，3表示未知（遮挡过大）。

第4列
观测角度（alpha）,取值范围为（− π , π -\pi, \pi−π,π）。是在相机坐标系下，以相机原点为中心，相机原点到物体中心的连线为半径，将物体绕相机y轴旋转至相机z轴，此时物体方向与相机x轴的夹角。这相当于将物体中心旋转到正前方后，计算其与车身方向的夹角。

第5-8列
二维检测框（bbox），目标二维矩形框坐标，分别对应left、top、right、bottom，即左上（xy）和右下的坐标（xy）。

第9-11列
三维物体的尺寸（dimensions），分别对应高度( $z_{lidar}$ )、宽度( $y_{lidar}$ )、长度( $x_{lidar}$ )，以米为单位。

第12-14列
底部中心坐标（location），三维物体中心在相机坐标系下的位置坐标（x，y，z），单位为米。注意这里是底部中心坐标！不是几何中心坐标

第15列
目标偏航角( $x_{cam}=0,z_{cam}=-\frac{\pi}{2}$ )

那么如何进行坐标系的转化呢？这里不说原理，直接上代码吧，因为好多其他的博客已经讲过原理了，也很好搜到。代码中的函数作用我都写注释了，因为是自己手搓的要是有错误欢迎指正：

# -*- coding: utf-8 -*-
"""
FUNCTION:
1. read velodyne/*.bin file and reduce points to just in fov then save as velodyne_fov./*.bin
2. coordinate trans cam->velo
@author: https://blog.csdn.net/qq_44852799
"""
import os
from pathlib import Path
import numpy as np
from tqdm import tqdm
import math as m


def velodyne_reduced():
    ## 由于KITTI数据集只对前方90度内的目标进行了标注，所以需要剔除没有用的点云数据
    read_path = r'D:\Craft\DATASET\FOR_MODEL\KITTI\training\velodyne'
    save_path = r'D:\Craft\DATASET\FOR_MODEL\KITTI\training\velodyne_fov'
    folder = Path(save_path)
    if not folder.exists():
        folder.mkdir()
    lidar_name_list = file_name_get(read_path,get_type='bin')
    for name in tqdm(lidar_name_list):
        file_read = os.path.join(read_path, name)
        file_save = os.path.join(save_path, name)
        lidar_points = (np.fromfile(file_read, dtype=np.float32)).reshape(-1, 4)
        ## 进行fov外未标注数据点的剔除
        fov_points = np.empty(shape=(0,), dtype=np.float32)
        for i,point in enumerate(lidar_points):
            if point[0]<0:
                continue
            if point[0]>point[1] and point[0]>-point[1]:
                fov_points = np.append(fov_points,point)
            else:
                continue
        fov_points.tofile(file_save)
    return 'done'
    
def file_name_get(path,get_type=None):
    assert (get_type!=None),"file name type in [bin,txt] you dumbass!"
    if get_type == 'bin':
        file_list = []
        for file_name in os.listdir(path):
            if file_name.endswith(".bin"):
                file_list.append(file_name)
        return file_list
    elif get_type == 'txt':
        file_list = []
        for file_name in os.listdir(path):
            if file_name.endswith(".txt"):
                file_list.append(file_name)
        return file_list

def cam2velo():
    ## change the [xyz,heading] from cam to velo, from bottom to center.
    ## 主要是将label_2中的坐标信息进行转换，从cam转到velo中，从底部中心到几何中心
    ## 将目标的偏航角进行坐标系的转换
    label_path = r'D:\Craft\DATASET\FOR_MODEL\KITTI\training\label_2'
    calib_path = r'D:\Craft\DATASET\FOR_MODEL\KITTI\training\calib'
    save_path = r'D:\Craft\DATASET\FOR_MODEL\KITTI\training\label_2_velo'
    folder = Path(save_path)
    if not folder.exists():
        folder.mkdir()
    name_list = file_name_get(label_path, get_type='txt')
    for name in tqdm(name_list):
        file_label = os.path.join(label_path, name)
        file_calib = os.path.join(calib_path, name)
        file_save = os.path.join(save_path, name)
        with open(file_label, 'r') as file:
            labels_cam = []
            for id, line in enumerate(file):
                line = line.strip().split()
                if line[0] == 'DontCare':
                    continue
                else:
                    target_label = {}
                    class_name = line[0]
                    info = [float(x) for x in line[1:]]
                    target_label['class_name'] = class_name
                    target_label['info'] = info
                labels_cam.append(target_label)

        with open(file_calib, 'r') as file:
            lines = file.readlines()[:6]
            calibs = {}
            for id, lines in enumerate(lines):
                line = lines.strip().split()
                if line[0] == 'R0_rect:' or line[0] == 'Tr_velo_to_cam:':
                    calib_name = line[0]
                    info = [float(x) for x in line[1:]]
                    calibs[calib_name] = info
                else:
                    continue

        R = np.array(calibs['R0_rect:']).reshape(3, 3)
        T = np.array(calibs['Tr_velo_to_cam:'])
        T = np.append(T, [0, 0, 0, 1], axis=0).reshape(4, 4)
        R_inverse = np.linalg.inv(R)
        T_inverse = np.linalg.inv(T)
        for i, label in enumerate(labels_cam):
            heigh = label['info'][7]
            P_cam = label['info'][10:13]
            heading = label['info'][-1]
            uncalib_cam = R_inverse @ P_cam
            uncalib_cam = np.append(uncalib_cam, [1]).reshape(4, 1)
            P_velo = T_inverse @ uncalib_cam
            P_velo = P_velo[:3, :].reshape(1, 3)
            ## first rot then filp
            '''
                   0               1.57               -1.57           
             -1.57 + 1.57  ====> 0  +  -3.14  ====>  0  +  -3.14
                 -3.14            -1.57                1.57
            '''
            assert (heading >= -m.pi and heading < m.pi), "heading is out of axis! But Not your fault"
            if heading >= -m.pi and heading < m.pi / 2:
                heading_rot = heading + m.pi / 2
            else:
                heading_rot = heading - (3 * m.pi / 2)
            heading_flip = -heading_rot
            heading_velo = heading_flip
            ## cam2velo
            label['info'][10] = P_velo[0][0]
            label['info'][11] = P_velo[0][1]
            label['info'][12] = P_velo[0][2] + heigh / 2
            label['info'][-1] = heading_velo
        with open(file_save, 'w') as file:
            for label in labels_cam:
                class_name = label['class_name']
                info_values = " ".join(str(round(value, 2)) for value in label['info'])
                line = f"{class_name} {info_values}\n"
                file.write(line)

if __name__ == '__main__':
    velodyne_reduced()
    cam2velo()

解释一下坐标转换的步骤：由于点云坐标系 velo 和图像坐标系 cam 的采集是分开的，所以需要进行坐标系的转换。3D 到 2D 图像上点的转换公式如下：
$y=P^{(i)}_{rect}R^{(0)}_{rect}T^{cam}_{velo}x$
其中 $x\in \mathbb R^{4\times1}$ 是 velo 坐标下的点云，形式为 $x=[x_{velo};y_{velo};z_{velo};1]$ 。 $(T^{cam}_{velo}x)\in\mathbb{R}^{3\times1}$ 使得坐标从 velo 坐标转移到了 cam 坐标系下。由于相机自身具有畸变，所以需要再进行畸变的复原 $R^{(0)}_{rect}(T^{cam}_{velo}x)\in\mathbb{R}^{3\times1}$ ，得到真正的在 cam 坐标系下的点云。

复原时首先进行畸变修复的逆再进行增广和坐标系转换，对于 heading 进行坐标系的旋转和镜像。

困难等级的定义（easy moderate hard）

KITTI数据集中easy、moderate、hard根据标注框是否被遮挡、遮挡程度和框的高度进行定义的，具体数据如下：
简单：
最小边界框高度：40像素，最大遮挡级别：完全可见，最大截断：15%
中等：
最小边界框高度：25像素，最大遮挡水平：部分遮挡，最大截断：30%
困难：
最小边界框高度：25像素，最大遮挡级别：难以看到，最大截断：50%

这些指标都能在标注文件中找到

评价指标

分别计算不同类别的下的不同难度等级的AP。AP的计算方法可以参考别人的文章，主要是R40的理解和TP，FP，FN，Precession，Recall的理解。这里就不说了。

但是有一个问题那就是：我的模型只负责预测目标类型和bbox，不负责预测难度，那么如果我的一个预测没有命中任何一个真值，按道理说应该是算进FP的，但是由于AP的计算按难度进行计算，那么我算进哪个难度的FP中？例如我的模型吧一棵树预测成了一个人，实际上真值里是没有这个目标的，那么理论上我要把这个预测结果放进“人”类别下的FP中，但是放进哪个难度下呢（见上一节“困难等级的定义）”）？答案是：忽略掉！没错，哪个都不放进去！预测错就错了，不算入AP的计算。也就是说理论上你把全图预测满了，把真值都预测到了，那你的AP就是100%。官方的评测中有一个compute_fp=False的选项，也就是说可以不计算FP。选上了就计算的是这个类别的AP，而不是各个难度下的AP，根据自己的需求来吧。

自己写AP代码时重点是IoU的计算，牵扯到2DIoU和3D IoU还都是旋转框，自己写耗时耗力，我补充一下已经有的轮子在下面，至于AP的计算部分都是很好写的，而且和你自己模型的输出数据存在接口匹配的问题，我就不提供了，自己想想就能写出来。

from shapely.geometry import LineString, box
from shapely import affinity

def IoU_2D(box1, box2):
    """
    两个box均为5元素list，5个元素分别是中心点xy坐标、箱子长(x)宽(y)和偏航角(弧度制，逆时针为+，顺时针为-)
    """
    result_xy = []
    for b in [box1, box2]:
        # 先解包获取两框中心坐标、长宽、偏航角
        x, y, l, w, yaw = b
        # 构造矩形
        poly = box(x - l / 2, y - w / 2, x + l / 2, y + w / 2)
        ## 使用弧度制
        poly_rot = affinity.rotate(poly, yaw, use_radians=True)
        result_xy.append(poly_rot)
    # 计算xy平面面积重叠、z轴重叠
    poly1, poly2 = result_xy
    # 计算IOU
    return poly1.intersection(poly2).area / poly1.union(poly2).area


def IoU_3D(box1, box2):
    """
    两个box均为7元素list，7个元素分别是中心点xyz坐标、箱子长宽高和偏航角（弧度制）
    """
    result_xy, result_z, result_v = [], [], []
    for b in [box1, box2]:
        # 先解包获取两框中心坐标、长宽高、偏航角
        x, y, z, l, w, h, yaw = b

        # 计算体积
        result_v.append(l * w * h)

        # 构造z轴
        ls = LineString([[0, z - h / 2], [0, z + h / 2]])
        result_z.append(ls)

        # 构造xy平面部分的矩形
        poly = box(x - l / 2, y - w / 2, x + l / 2, y + w / 2)
        poly_rot = affinity.rotate(poly, yaw, use_radians=True)
        result_xy.append(poly_rot)

    # 计算xy平面面积重叠、z轴重叠
    overlap_xy = result_xy[0].intersection(result_xy[1]).area
    overlap_z = result_z[0].intersection(result_z[1]).length

    # 计算IOU
    overlap_xyz = overlap_z * overlap_xy
    return overlap_xyz / (np.sum(result_v) - overlap_xyz)

返回值就是IoU，偏航角就是我上面坐标转换过的值，这里已经统一过坐标系了，不用再为坐标系发愁。

可视化

顺便写一下可视化吧，毕竟感性的认识是必要的。简单易懂奥，不懂的问问gpt也都差不多了，没什么复杂的语法，至于数据的路径，我代码里的路径是自己电脑上的，类比一下就是你的。注意这里的数据都是经过上面坐标系转换后的数据。

# -*- coding: utf-8 -*-
"""
FUNCTION: read *.bin file and visualize
@author: https://blog.csdn.net/qq_44852799
"""

from mayavi import mlab
import numpy as np
import math as m
import os

def kitti_see(points,labels, vals="distance"):
    x = points[:, 0]  # x position of point
    y = points[:, 1]  # y position of point
    z = points[:, 2]  # z position of point
    fig = mlab.figure(bgcolor=(0, 0, 0), size=(640, 360))
    # draw screen
    mlab.points3d(x, y, z,
                  y,  # Values used for Color
                  mode="point",
                  colormap='spectral',  # 'bone', 'copper', 'gnuplot'
                  # color=(0, 1, 0),   # Used a fixed (r,g,b) instead
                  figure=fig,
                  )
    # draw origin
    mlab.points3d(0, 0, 0, color=(1, 1, 1), mode="sphere", scale_factor=0.2)
    # draw axis
    axes = np.array(
        [[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0]],
        dtype=np.float64,
    )
    mlab.plot3d(
        [0, axes[0, 0]],
        [0, axes[0, 1]],
        [0, axes[0, 2]],
        color=(1, 0, 0),
        tube_radius=None,
        figure=fig,
    )
    mlab.plot3d(
        [0, axes[1, 0]],
        [0, axes[1, 1]],
        [0, axes[1, 2]],
        color=(0, 1, 0),
        tube_radius=None,
        figure=fig,
    )
    mlab.plot3d(
        [0, axes[2, 0]],
        [0, axes[2, 1]],
        [0, axes[2, 2]],
        color=(0, 0, 1),
        tube_radius=None,
        figure=fig,
    )

    with open(labels,'r') as objects:
        for object in objects:
            object = object.strip().split()
            x_target = float(object[-4])
            y_target = float(object[-3])
            z_target = float(object[-2])
            length =   float(object[-5])
            width =    float(object[-6])
            heigh =    float(object[-7])
            theta =    float(object[-1])
            # draw point
            mlab.points3d(x_target, y_target, z_target, color=(0, 1, 0), mode="sphere", scale_factor=0.2)
            # draw length
            mlab.plot3d(
                [x_target + (length / 2) * m.cos(theta), x_target - (length / 2) * m.cos(theta)],
                [y_target + (length / 2) * m.sin(theta), y_target - (length / 2) * m.sin(theta)],
                [z_target, z_target],
                color=(1, 0, 0),
                tube_radius=None,
                figure=fig,
            )
            # draw width
            mlab.plot3d(
                [x_target + (width / 2) * m.sin(-theta), x_target - (width / 2) * m.sin(-theta)],
                [y_target + (width / 2) * m.cos(-theta), y_target - (width / 2) * m.cos(-theta)],
                [z_target, z_target],
                color=(0, 1, 0),
                tube_radius=None,
                figure=fig,
            )
            # draw heigh
            mlab.plot3d(
                [x_target, x_target],
                [y_target, y_target],
                [z_target + heigh / 2, z_target - heigh / 2],
                color=(0, 0, 1),
                tube_radius=None,
                figure=fig,
            )
    ## draw 40m line
    mlab.plot3d(
        [40,40],
        [-40, 40],
        [0,0],
        color=(1, 1, 1),
        tube_radius=None,
        figure=fig,
    )
    mlab.show()


if __name__ == '__main__':
    def kitti_opt():
        for True_number in range(20):
            if True_number < 10:
                No = '00000'+str(True_number)
            elif True_number < 100:
                No = '0000' + str(True_number)
            elif True_number <1000:
                No = '000' + str(True_number)
            elif True_number <10000:
                No = '00' + str(True_number)

            PATH = r'D:\Craft\DATASET\FOR_MODEL\KITTI\val\label_2_velo'
            velo_file = os.path.join('D:/Craft/DATASET/FOR_MODEL/KITTI/val/velodyne_fov/',No+'.bin')
            label = No+'.txt'
            points = (np.fromfile(velo_file, dtype=np.float32)).reshape(-1, 4)
            kitti_see(points,os.path.join(PATH,label))
    kitti_opt()

在这里插入图片描述

我有点懒，没有画框，但是这种表示方法你也是可以理解的吧。我把中心点和长宽高都表示出来了，你用大脑想象一个框。

Glingli

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
处理KITTI数据集时的一些坑

KITTI数据集中easy、moderate、hard根据标注框是否被遮挡、遮挡程度和框的高度进行定义的，具体数据如下：简单：最小边界框高度：40像素，最大遮挡级别：完全可见，最大截断：15%中等：最小边界框高度：25像素，最大遮挡水平：部分遮挡，最大截断：30%困难：最小边界框高度：25像素，最大遮挡级别：难以看到，最大截断：50%这些指标都能在标注文件中找到。
复制链接

扫一扫