一文多图搞懂KITTI数据集下载及解析

等待破茧

已于 2022-07-22 14:12:44 修改

阅读量1w

点赞数 14

分类专栏：点云文章标签：深度学习计算机视觉人工智能

于 2022-07-22 09:40:10 首次发布

原文链接：https://developer.aliyun.com/article/855136

版权

点云专栏收录该内容

7 篇文章

订阅专栏

本文详细介绍了KITTI数据集的组成部分，包括图片、点云、标签和校正文件的下载链接，以及数据的格式和用途。数据集主要用于评测立体图像、光流、视觉测距、3D物体检测和跟踪等技术。内容涵盖数据采集平台、坐标系、图像和点云文件格式、校正文件和标签文件的解读，还展示了数据的可视化效果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

转载自一文多图搞懂KITTI数据集下载及解析-阿里云开发者社区

KITTI Dataset

1.图片下载：点击下载：https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip
2.点云下载：点击下载：https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_velodyne.zip
3.标签下载：点击下载：https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_label_2.zip
4.校正文件下载：点击下载:
https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_calib.zip
注意：点击下载没有反应的话，复制链接使用迅雷下载，速度更快。

Tip:具体下载请参考：

https://blog.csdn.net/lovely_yoshino/article/details/104996550

1 简介

KITTI数据集由德国卡尔斯鲁厄理工学院和丰田美国技术研究院联合创办，是目前国际上最大的自动驾驶场景下的计算机视觉算法评测数据集。该数据集用于评测立体图像(stereo)，光流(optical flow)，视觉测距(visual odometry)，3D物体检测(object detection)和3D跟踪(tracking)等计算机视觉技术在车载环境下的性能。

KITTI包含市区、乡村和高速公路等场景采集的真实图像数据，每张图像中最多达15辆车和30个行人，还有各种程度的遮挡与截断。 3D目标检测数据集由7481个训练图像和7518个测试图像以及相应的点云数据组成，包括总共80256个标记对象。

下图红色框标记的为我们需要的数据，分别是彩色图像数据（12GB）、点云数据（29GB）、相机矫正数据（16MB）、标签数据（5MB）。其中彩色图像数据、点云数据、相机矫正数据均包含training（7481）和testing（7518）两个部分，标签数据只有training数据。

1.1 数据采集平台

各设备坐标系、距离信息由上图可见。坐标系转换原理参见click。其实KITTI提供的数据中都包含三者的标定文件，不需人工转换。

1.2 坐标系

camera: x = right, y = down, z = forward
velodyne: x = forward, y = left, z = up
GPS/IMU: x = forward, y = left, z = up

1.3 image文件

image文件以8位PNG格式存储，图集如下：

1.4 velodyne文件

velodyne文件是激光雷达的测量数据（绕其垂直轴（逆时针）连续旋转），以“000001.bin”文件为例，内容如下：

7b14 4642 1058 b541 9643 0340 0000 0000
46b6 4542 1283 b641 3333 0340 0000 0000
4e62 4042 9643 b541 b072 0040 cdcc 4c3d
8340 3f42 08ac b541 3bdf ff3f 0000 0000
e550 4042 022b b841 9cc4 0040 0000 0000
10d8 4042 022b ba41 4c37 0140 0000 0000
3fb5 3a42 14ae b541 5a64 fb3f 0000 0000
7dbf 3942 2731 b641 be9f fa3f 8fc2 f53d
cd4c 3842 3f35 b641 4c37 f93f ec51 383e
dbf9 3742 a69b b641 c3f5 f83f ec51 383e
2586 3742 9a99 b741 fed4 f83f 1f85 6b3e
                   .
                   .
                   .

点云数据以浮点二进制文件格式存储，每行包含8个数据，每个数据由四位十六进制数表示（浮点数），每个数据通过空格隔开。一个点云数据由四个浮点数数据构成，分别表示点云的x、y、z、r（强度 or 反射值），点云的存储方式如下表所示：

1.5 calib文件

calib文件是相机、雷达、惯导等传感器的矫正数据。以“000001.txt”文件为例，内容如下：

P0: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 0.000000000000e+00 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P1: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.875744000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P2: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 4.485728000000e+01 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.163791000000e-01 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.745884000000e-03
P3: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.395242000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.199936000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.729905000000e-03
R0_rect: 9.999239000000e-01 9.837760000000e-03 -7.445048000000e-03 -9.869795000000e-03 9.999421000000e-01 -4.278459000000e-03 7.402527000000e-03 4.351614000000e-03 9.999631000000e-01
Tr_velo_to_cam: 7.533745000000e-03 -9.999714000000e-01 -6.166020000000e-04 -4.069766000000e-03 1.480249000000e-02 7.280733000000e-04 -9.998902000000e-01 -7.631618000000e-02 9.998621000000e-01 7.523790000000e-03 1.480755000000e-02 -2.717806000000e-01
Tr_imu_to_velo: 9.999976000000e-01 7.553071000000e-04 -2.035826000000e-03 -8.086759000000e-01 -7.854027000000e-04 9.998898000000e-01 -1.482298000000e-02 3.195559000000e-01 2.024406000000e-03 1.482454000000e-02 9.998881000000e-01 -7.997231000000e-01

1.6 label文件

label文件是KITTI中object的标签和评估数据，以“000001.txt”文件为例，包含样式如下：

///

Truck 0.00 0 -1.57 599.41 156.40 629.75 189.25 2.85 2.63 12.34 0.47 1.49 69.44 -1.56
Car 0.00 0 1.85 387.63 181.54 423.81 203.12 1.67 1.87 3.69 -16.53 2.39 58.49 1.57
Cyclist 0.00 3 -1.65 676.60 163.95 688.98 193.93 1.86 0.60 2.02 4.59 1.32 45.84 -1.55
DontCare -1 -1 -10 503.89 169.71 590.61 190.13 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 511.35 174.96 527.81 187.45 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 532.37 176.35 542.68 185.27 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 559.62 175.83 575.40 183.15 -1 -1 -1 -1000 -1000 -1000 -10

每一行代表一个object，每一行都有16列分别表示不同的含义，具体如下：

第1列（字符串）：代表物体类别（type）

总共有9类，分别是：Car、Van、Truck、Pedestrian、Person_sitting、Cyclist、Tram、Misc、DontCare。其中DontCare标签表示该区域没有被标注，比如由于目标物体距离激光雷达太远。为了防止在评估过程中（主要是计算precision），将本来是目标物体但是因为某些原因而没有标注的区域统计为假阳性(false positives)，评估脚本会自动忽略DontCare区域的预测结果。

第2列（浮点数）：代表物体是否被截断（truncated）

数值在0（非截断）到1（截断）之间浮动，数字表示指离开图像边界对象的程度。

第3列（整数）：代表物体是否被遮挡（occluded）

整数0、1、2、3分别表示被遮挡的程度。

第4列（弧度数）：物体的观察角度（alpha）

取值范围为：-pi ~ pi（单位：rad），它表示在相机坐标系下，以相机原点为中心，相机原点到物体中心的连线为半径，将物体绕相机y轴旋转至相机z轴，此时物体方向与相机x轴的夹角，如图1所示。

第5~8列（浮点数）：物体的2D边界框大小（bbox）

四个数分别是xmin、ymin、xmax、ymax（单位：pixel），表示2维边界框的左上角和右下角的坐标。

第9~11列（浮点数）：3D物体的尺寸（dimensions）

分别是高、宽、长（单位：米）

第12-14列（整数）：3D物体的位置（location）

分别是x、y、z（单位：米），特别注意的是，这里的xyz是在相机坐标系下3D物体的中心点位置。

第15列（弧度数）：3D物体的空间方向（rotation_y）

取值范围为：-pi ~ pi（单位：rad），它表示，在照相机坐标系下，物体的全局方向角（物体前进方向与相机坐标系x轴的夹角），如图1所示。

第16列（整数）：检测的置信度（score）

要特别注意的是，这个数据只在测试集的数据中有**（待确认）**。

1.7 KITTI可视化

目前已经完成了pointcloud、gt boxes、label、dt boxes（PointRCNN）等可视化，后续会把体素化加进去，先贴个可视化效果图：

2. 激光数据

首先在官网KITTI

下载 raw data development kit，其中的readme文件详细记录了你想知道的一切，数据采集装置，不同装置的数据格式，label等。

激光数据是什么形式呢？激光照射到物体表面产生大量点数据，KITTI中的点数据包括四维x,y,z以及reflectance反射强度。Velodyne 3D激光产生点云数据，以.bin(二进制)文件保存。

Velodyne 3D laser scan data
===========================

The velodyne point clouds are stored in the folder 'velodyne_points'. To
save space, all scans have been stored as Nx4 float matrix into a binary
file using the following code:

  stream = fopen (dst_file.c_str(),"wb");
  fwrite(data,sizeof(float),4*num,stream);
  fclose(stream);

Here, data contains 4*num values, where the first 3 values correspond to
x,y and z, and the last value is the reflectance information. All scans
are stored row-aligned, meaning that the first 4 values correspond to the
first measurement. Since each scan might potentially have a different
number of points, this must be determined from the file size when reading
the file, where 1e6 is a good enough upper bound on the number of values:

  // allocate 4 MB buffer (only ~130*4*4 KB are needed)
  int32_t num = 1000000;
  float *data = (float*)malloc(num*sizeof(float));

  // pointers
  float *px = data+0;
  float *py = data+1;
  float *pz = data+2;
  float *pr = data+3;

  // load point cloud
  FILE *stream;
  stream = fopen (currFilenameBinary.c_str(),"rb");
  num = fread(data,sizeof(float),num,stream)/4;
  for (int32_t i=0; i<num; i++) {
    point_cloud.points.push_back(tPoint(*px,*py,*pz,*pr));
    px+=4; py+=4; pz+=4; pr+=4;
  }
  fclose(stream);

x,y and y are stored in metric (m) Velodyne coordinates.

IMPORTANT NOTE: Note that the velodyne scanner takes depth measurements
continuously while rotating around its vertical axis (in contrast to the cameras,
which are triggered at a certain point in time). This means that when computing
point clouds you have to 'untwist' the points linearly with respect to the velo-
dyne scanner location at the beginning and the end of the 360掳 sweep. The time-
stamps for the beginning and the end of the sweeps can be found in the time-
stamps file. The velodyne rotates in counter-clockwise direction.

Of course this 'untwisting' only works for non-dynamic environments.

The relationship between the camera triggers and the velodyne is the following:
We trigger the cameras when the velodyne is looking exactly forward (into the
direction of the cameras).

官方提供的激光数据为N*4的浮点数矩阵，raw data development kit中的matlab文件夹是官方提供matlab接口，主要是将激光数据与相机数据结合，在图像上投影。matlab接口详解及使用最终可以将点云数据保存为pcd格式，然后用pcl进行相应处理。