KITTI 坐标系的理解

KITTI is one of the well known benchmarks for 3D Object detection. Working 
with this dataset requires some understanding of what the different files and their contents are. Goal here is to do some basic manipulation and sanity checks to get a general understanding of the data. 4 different types of files from the KITTI 3D Objection Detection dataset as follows are used in the article.

camera_2 image (.png), 
camera_2 label (.txt),
calibration (.txt), 
velodyne point cloud (.bin),



KITTI坐标系的示意图

For each frame , there is one of these files with same name but different extensions. The image files are regular png file and can be displayed by any PNG aware software. The label files contains the bounding box for objects in 2D and 3D in text. Each row of the file is one object and contains 15 values , including the tag (e.g. Car, Pedestrian, Cyclist). The 2D bounding boxes are in terms of pixels in the camera image . The 3D bounding boxes are in 2 co-ordinates. The size ( height, weight, and length) are in the object co-ordinate , and the center on the bounding box is in the camera co-ordinate.

The point cloud file contains the location of a point and its reflectance in the lidar co-ordinate. The calibration file contains the values of 6 matrices — P03R0_rectTr_velo_to_cam, and Tr_imu_to_velo.

The Px matrices project a point in the rectified referenced camera coordinate to the camera_x image. camera_0 is the reference camera coordinate. R0_rectis the rectifying rotation for reference coordinate ( rectification makes images of multiple cameras lie on the same plan). Tr_velo_to_cam maps a point in point cloud coordinate to reference co-ordinate.

Will do 2 tests here. The first test is to project 3D bounding boxes from label file onto image. Second test is to project a point in point cloud coordinate to image. The algebra is simple as follows. The first equation is for projecting the 3D bouding boxes in reference camera co-ordinate to camera_2 image. The second equation projects a velodyne co-ordinate point into the camera_2 image.

y_image = P2 * R0_rect * R0_rot * x_ref_coord

y_image = P2 * R0_rect * Tr_velo_to_cam * x_velo_coord

In the above, R0_rot is the rotation matrix to map from object coordinate to reference coordinate.

The code is relatively simple and available at github.


从这转载 https://medium.com/test-ttile/kitti-3d-object-detection-dataset-d78a762b5a4

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值