Datasets and Metrics for 3D object detection(自动驾驶领域的数据集)

Datasets and Metrics for 3D object detection自动驾驶领域的数据集

主流多模态数据集的比较如下表:
主流多模态数据集的比较

KITTI

自动驾驶领域最早的数据集之一,该数据集提供立体彩色图像,激光雷达点云和GPS坐标。该数据集支持多种任务,包括立体匹配,视觉里程计,3D跟踪和3D目标检测等。

该数据使用搭载64线激光雷达、4个相机和GPS/IMU组合系统的车进行收集数据。拥有包含城市、住宅和道路的超20个场景。

KITTI标注了汽车”,“面包车”,“卡车”,“行人”,“人(坐着)”,“骑自行车的人”,“有轨电车”和“杂项”。

同时为了针对多模式检测,KITTI开发了KITTI360,该数据集带有更丰富的传感器信息,并且360度标注。

mAP是常用的目标检测指标,mAP(mean Average Precision),在这里不做展开。

NuScenes

具有地面真实标签的自动驾驶领域最大数据集之一。包括700个训练场景、150个验证场景和150个测试场景。

该数据集使用32线激光雷达、6个相机,提供了360度视野内的23个种类的标注。是第一个提供雷达数据的自动驾驶数据集,装备了5个雷达传感器

Waymo Open Dataset

装备5个激光雷达传感器、5个针孔摄像机,支持领域适应(Domain adaptation )

other Datasets

ApolloScape

包含了来自中国4个不同地区的天气情况。数据使用搭载了两个激光雷达传感器、6个摄像机和一个IMU/GNSS联合系统的SUV进行获取。它支持各种自动驾驶任务,如场景解析,车道分割,轨迹预测,目标检测,跟踪等。

评价指标与KITTI相同

H3D

该数据集关注城市拥挤的交通场景。数据使用3台260度的视场相机和一台64线的激光雷达收集数据

Argoverse

它使用两个32通道激光雷达、七个环绕视摄像头和两个立体摄像头收集数据

AIODrive

它综合来自多个常用传感器的数据,包括3个高密度激光雷达,一个Velodyne-64激光雷达,5个高分辨率RGB相机,5个高分辨率深度相机,4个雷达,一个IMU和GPS系统。它还合成了一些不利的场景,如恶劣的天气条件或车祸。

上述数据集都需要进行数据标注,而高质量的数据标注是一项花费巨大的任务。现如今,也有学者使用迁移学习技术减少对于地面标签的依赖性。

如果觉得我写的不错,请给我一个免费的赞,如有错误,也欢迎向我反馈。
参考文献:
Wang Y, Mao Q, Zhu H, et al. Multi-modal 3d object detection in autonomous driving: a survey[J]. arXiv preprint arXiv:2106.12735, 2021.

### PASCAL VOC Format Segmentation Dataset #### Overview of PASCAL VOC Data Structure The PASCAL Visual Object Classes (VOC) challenge provides a standardized framework for evaluating object detection and semantic segmentation algorithms. For segmentation tasks, each image is accompanied by an annotation file that contains pixel-level labels indicating which pixels belong to objects from predefined categories. In the context of segmentation datasets adhering to the PASCAL VOC format, these typically include: - **Image Files**: Stored as JPEG images. - **Annotation Files**: XML files containing bounding box coordinates or mask information per instance[^1]. For segmentation specifically, masks are provided either through binary images where foreground pixels have non-zero values while background remains zero, or via polygonal annotations converted into rasterized form during preprocessing stages[^2]. #### File Organization A typical directory structure would look like this when following the PASCAL VOC convention: ``` VOCdevkit/ ├── VOC2012/ │ ├── Annotations/ # Annotation files in XML format │ ├── ImageSets/ │ │ └── Segmentation/ # Text files listing training/testing splits │ ├── JPEGImages/ # Original images stored here │ └── SegmentationClass/ # Class segmentations encoded as color-coded PNGs └── ... ``` This organization facilitates easy access to both input data and corresponding ground truth labels required for supervised learning models focused on visual recognition problems such as classification, localization, detection, and segmentation. #### Conversion Tools To work with TensorFlow frameworks efficiently, tools exist—such as `build_voc2012_data.py`—that convert raw PASCAL VOC formatted directories into TFRecord files suitable for feeding directly into machine learning pipelines without additional processing overhead. This conversion process ensures compatibility across different platforms and simplifies batch generation processes necessary for efficient model training sessions. ```python import tensorflow as tf from object_detection.utils import dataset_util def create_tf_example(group, path): height = None # Use actual value width = None # Use actual value filename = group.filename.encode('utf8') with tf.io.gfile.GFile(os.path.join(path, '{}'.format(filename)), 'rb') as fid: encoded_jpg = fid.read() # Further code omitted for brevity... return tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_jpg), 'image/format': dataset_util.bytes_feature(b'jpg'), })) ``` --related questions-- 1. What specific challenges does working with large-scale segmentation datasets present? 2. How do evaluation metrics differ between object detection and semantic segmentation within the PASCAL VOC competition? 3. Can you explain how augmentation techniques improve performance on limited-size datasets similar to those found in PASCAL VOC? 4. In what ways has the introduction of more complex architectures impacted the handling of PASCAL VOC-formatted datasets over recent years?
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值