SCALE AI 数据集指南

是咕咕咕呀

已于 2022-07-11 16:04:42 修改

阅读量1.8k

点赞数

分类专栏：自动驾驶坐大牢文章标签：大数据自动驾驶人工智能算法数据库

于 2022-07-06 17:41:37 首次发布

本文链接：https://blog.csdn.net/qq_39276262/article/details/125643986

版权

自动驾驶坐大牢专栏收录该内容

1 篇文章

订阅专栏

本文介绍了Scale AI数据平台的下载流程，需注册发邮件获取回执。还阐述了基本操作，包括数据访问，如激光雷达点云以pandas.DataFrames存储可进行操作，相机记录可通过Pillow Image对象处理，加载的数据集包含GPS位置、时间戳等元信息，以及长方体和语义分割注释的访问方式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.scale ai 的下载：

Scale AI: The Data Platform for AI

需要注册，发邮件，然后它会给你一个回执邮件，我用的QQ邮箱在2022.7.5日使用的是好用的。

他会给你回个这个邮件：

我是测试用的所以只下载了PART2 点开就是下载链接

下载下来后解压文件格式是这样的：

下图是官方的文件说明格式：

.
├── LICENSE.txt
├── annotations
│   ├── cuboids
│   │   ├── 00.pkl.gz
│   │   .
│   │   .
│   │   .
│   │   └── 79.pkl.gz
│   └── semseg  // Semantic Segmentation is available for specific scenes
│       ├── 00.pkl.gz
│       .
│       .
│       .
│       ├── 79.pkl.gz
│       └── classes.json
├── camera
│   ├── back_camera
│   │   ├── 00.jpg
│   │   .
│   │   .
│   │   .
│   │   ├── 79.jpg
│   │   ├── intrinsics.json
│   │   ├── poses.json
│   │   └── timestamps.json
│   ├── front_camera
│   │   └── ...
│   ├── front_left_camera
│   │   └── ...
│   ├── front_right_camera
│   │   └── ...
│   ├── left_camera
│   │   └── ...
│   └── right_camera
│       └── ...
├── lidar
│   ├── 00.pkl.gz
│   .
│   .
│   .
│   ├── 79.pkl.gz
│   ├── poses.json
│   └── timestamps.json
└── meta
    ├── gps.json
    └── timestamps.json

下面说说基本操作，直接用官方的话翻译过来给大家看：

#先导入包
   #scale ai 数据集
import numpy as np 
from pandaset import DataSet
import pandas as pd

这个就是你的总文件夹路径：

#导入数据集
dataset = DataSet('F:\Modelnet40\pandaset_1')

#我们可以列出在数据文件夹中找到的所有序列 ID。
print(dataset.sequences())

运行结果：

['048', '050', '051', '052', '053', '054', '055', '056', '057', '058', '059', '062', '063', '064', '065', '066', '067', '068', '069', '070', '071', '072', '073', '074', '077', '078', '079', '080', '084', '085', '086', '088', '089', '090', '091', '092', '093', '094', '095', '097']

由于语义分割注释并不总是可用于场景，因此我们可以过滤以仅获取同时具有语义分割和长方体注释的场景

>>>print(dataset.sequences(with_semseg=True))
['052', '053', '054', '056', '057', '058', '064', '065', '066', '067', '069', '070', '071', '072', '073', '077', '078', '080', '084', '088', '089', '090', '094', '095', '097']

现在，我们通过从先前返回的列表中选择其键来访问特定序列，在本例中为序列 ID '002'

>>>seq052 = dataset['052']
#devkit 将自动在序列目录中搜索可用的传感器数据、元数据和注释，并准备要显式加载的目录。此时没有点云或图像被加载到内存中。要将传感器数据和元数据加载到内存中，我们只需调用序列对象的 load() 方法。这将加载所有可用的传感器数据和元数据。
>>>seq052.load()
<pandaset.sequence.Sequence at 0x1f965be7580>

如果只需要某些数据进行分析，还有更具体的方法可用，也可以相互链接。

>>>seq052.load_lidar().load_cuboids()
#这里是从序号为52的文件夹中获取了雷达数据中的‘长方体’即语义分割后的物体的长方块
seq052.load_lidar().load_cuboids()

数据访问

激光雷达

LiDAR 点云存储为 pandas.DataFrames，因此您可以利用其广泛的 API 进行数据操作。这包括作为 numpy.ndarray 的简单返回。

经过本人的猜测：x，y，z为坐标，i为反射强度，t是时间，d不太晓得

>>>pc0 = seq052.lidar[0]
>>>print(pc0)
                x           y         z     i             t  d
index                                                         
0      -44.875744  132.243187  6.597356  14.0  1.557543e+09  0
1      -44.913245  132.241051  4.126787  13.0  1.557543e+09  0
2      -19.848992   58.456260  1.775779   2.0  1.557543e+09  0
3       -7.948135   23.455820  1.398348   0.0  1.557543e+09  0
4       -7.608594   22.439950  1.007010   0.0  1.557543e+09  0
...           ...         ...       ...   ...           ... ..
171634  10.017254 -115.447229  7.422902   0.0  1.557543e+09  1
171635  26.406157  -27.148679  4.657497   5.0  1.557543e+09  1
171636   9.802706 -115.248983  7.427200   1.0  1.557543e+09  1
171637  33.059280  -33.940527  5.448124   0.0  1.557543e+09  1
171638   9.582304 -115.326288  7.448025   1.0  1.557543e+09  1

[171639 rows x 6 columns]
#以np.array返回
>>>pc0_np = seq052.lidar[0].values  
# Returns the first LiDAR frame in the sequence as an numpy ndarray
>>>print(pc0_np)
[[-4.48757437e+01  1.32243187e+02  6.59735623e+00  1.40000000e+01
   1.55754268e+09  0.00000000e+00]
 [-4.49132448e+01  1.32241051e+02  4.12678670e+00  1.30000000e+01
   1.55754268e+09  0.00000000e+00]
 [-1.98489922e+01  5.84562601e+01  1.77577928e+00  2.00000000e+00
   1.55754268e+09  0.00000000e+00]
 ...
 [ 9.80270575e+00 -1.15248983e+02  7.42720048e+00  1.00000000e+00
   1.55754269e+09  1.00000000e+00]
 [ 3.30592800e+01 -3.39405272e+01  5.44812388e+00  0.00000000e+00
   1.55754269e+09  1.00000000e+00]
 [ 9.58230432e+00 -1.15326288e+02  7.44802499e+00  1.00000000e+00
   1.55754269e+09  1.00000000e+00]]

LiDAR 点存储在世界坐标系中；因此不需要使用车辆的位姿图来转换它们。这允许您查询序列中的所有 LiDAR 帧或特定的采样率，并使用您喜欢的库简单地将它们可视化。

除了总是使用所有可用的点云，还可以简单地对激光雷达属性进行切片，因为它是从 python 列表中使用的。

>>>pc_all = seq052.lidar[:] 
 # Returns all LiDAR frames from the sequence返回序列中的所有 LiDAR 帧
>>>pc_sampled = seq052.lidar[::2]  
# Returns every second LiDAR frame from the sequence从序列中返回每一秒 LiDAR 帧
>>>pc_sampled
[                x           y          z    i             t  d
 index                                                         
 105258  12.981324  -93.613201   8.244391  1.0  1.557543e+09  1
 105259 -12.186775  -54.953533   6.556023  1.0  1.557543e+09  1
 105260  19.275077 -136.907318  11.297377  1.0  1.557543e+09  1
 105261  20.132953 -140.893985  11.563712  2.0  1.557543e+09  1
 105262  20.370610 -140.760093  11.540880  3.0  1.557543e+09  1
 ...           ...         ...        ...  ...           ... ..
 171634  10.017254 -115.447229   7.422902  0.0  1.557543e+09  1
 171635  26.406157  -27.148679   4.657497  5.0  1.557543e+09  1
 171636   9.802706 -115.248983   7.427200  1.0  1.557543e+09  1
 171637  33.059280  -33.940527   5.448124  0.0  1.557543e+09  1
 171638   9.582304 -115.326288   7.448025  1.0  1.557543e+09  1
 
 [66381 rows x 6 columns],
                 x           y          z     i             t  d
 index                                                          
 105467 -18.362342 -123.348338  11.907765   3.0  1.557543e+09  1
 105468  -7.372698  -54.018367   5.923536  20.0  1.557543e+09  1
 105469  -7.273205  -53.959887   5.896123  23.0  1.557543e+09  1
 105470  -7.175100  -53.930795   5.876806  22.0  1.557543e+09  1
 105471  -7.063666  -53.883517   5.854130  23.0  1.557543e+09  1
 ...           ...         ...        ...   ...           ... ..
 173490   8.495586 -115.877950   6.768995   0.0  1.557543e+09  1
 173491  28.535524  -31.908077   4.417924   3.0  1.557543e+09  1
...
 176798  70.261721 -100.156543  4.701638   9.0  1.557543e+09  1
 176799  74.460034  -66.528833  3.191371  22.0  1.557543e+09  1
 176800  70.335841 -100.085006  4.686526   8.0  1.557543e+09  1
 
 [71740 rows x 6 columns]]

除了 LiDAR 点之外，lidar 属性还保存世界坐标系中的传感器姿态 (lidar.poses) 和记录的每个 LiDAR 帧的时间戳 (lidar.timestamps)。这两个对象都可以按照与激光雷达属性保持点云相同的方式进行切片。

>>>sl = slice(None, None, 5)  #等价于 [::5] # 每五帧提取一次，包括传感器姿态和时间戳
>>>lidar_obj = seq052.lidar
>>>pcs = lidar_obj[sl]
>>>poses = lidar_obj.poses[sl]
>>>timestamps = lidar_obj.timestamps[sl]
>>>print(poses)
>>>print(timestamps)
>>>print( len(pcs) == len(poses) == len(timestamps) )
[{'position': {'x': 0.0, 'y': 0.0, 'z': 0.0}, 'heading': {'w': -0.14587721867103703, 'x': 0.013384872125099812, 'y': -0.018176805964034162, 'z': 0.9890451385027598}}, {'position': {'x': 1.8221959748603356, 'y': -5.924891299105715, 'z': -0.22147311176192197}, 'heading': {'w': -0.14379171815552852, 'x': -0.00029704733062639794, 'y': -0.008386168136489291, 'z': 0.9895723953995107}}, {'position': {'x': 3.5964443225583094, 'y': -11.823288287091783, 'z': -0.23053624245025467}, 'heading': {'w': -0.1404443416637616, 'x': 0.005814065923684946, 'y': -0.017967234634453832, 'z': 0.989908461430466}}, {'position': {'x': 5.311575972297521, 'y': -17.665135813608245, 'z': -0.37536348337376674}, 'heading': {'w': -0.13954093002580367, 'x': -0.0009405587025941987, 'y': -0.008447938040321496, 'z': 0.9901798203052322}}, {'position': {'x': 6.993590049406704, 'y': -23.368951531597073, 'z': -0.4571886017362309}, 'heading': {'w': -0.1386054836930684, 'x': 0.0057014907843905204, 'y': -0.012511156407032238, 'z': 0.990252232443017}}, {'position': {'x': 8.619660664278799, 'y': -28.962986066335322, 'z': -0.5734488875718939}, 'heading': {'w': -0.13483321553588037, 'x': 0.009719741474183951, 'y': -0.013428044524542174, 'z': 0.9907296393235395}}, {'position': {'x': 10.138924117510804, 'y': -34.4042061520591, 'z': -0.6897895812208878}, 'heading': {'w': -0.12865821141725495, 'x': 0.011722421475492541, 'y': -0.01256040223991684, 'z': 0.9915401584228495}}, {'position': {'x': 11.528008159248953, 'y': -39.6201482942002, 'z': -0.7857018896635652}, 'heading': {'w': -0.12406058661038288, 'x': 0.011093324258542151, 'y': -0.011455826187088878, 'z': 0.9921464977779881}}, {'position': {'x': 12.793457129689976, 'y': -44.512740727638615, 'z': -0.8578254621843637}, 'heading': {'w': -0.12164914077077851, 'x': 0.01118354202483445, 'y': -0.012488751338110346, 'z': 0.9924315825423634}}, {'position': {'x': 13.933730877359315, 'y': -49.01039585747021, 'z': -0.937444988310696}, 'heading': {'w': -0.12053223697454005, 'x': 0.0099445271582908, 'y': -0.01174100614110005, 'z': 0.992590164672369}}, {'position': {'x': 14.986562169949876, 'y': -53.04798968307639, 'z': -0.9926588355089635}, 'heading': {'w': -0.13616611802587944, 'x': 0.0034841193992122006, 'y': -0.009602878664429033, 'z': 0.9906333499005294}}, {'position': {'x': 16.17765813042735, 'y': -56.64227616446552, 'z': -1.032918712267322}, 'heading': {'w': -0.19056113562129598, 'x': 0.0002385751763054337, 'y': -0.009432834747799325, 'z': 0.9816299803394568}}, {'position': {'x': 17.82225065989301, 'y': -59.88183292231424, 'z': -1.0692569174733118}, 'heading': {'w': -0.2877154031412421, 'x': -0.007798086174207079, 'y': -0.009263315736736905, 'z': 0.9576394037574205}}, {'position': {'x': 20.120862711242665, 'y': -62.60492892662567, 'z': -1.0912708061205836}, 'heading': {'w': -0.40879303833078406, 'x': -0.012567868657239703, 'y': -0.014884589492482616, 'z': 0.9124191742205745}}, {'position': {'x': 23.04036321808146, 'y': -64.63966276877291, 'z': -1.1148916440882066}, 'heading': {'w': -0.5166521489672898, 'x': -0.010126424867380595, 'y': -0.011805107421260294, 'z': 0.8560541174047779}}, {'position': {'x': 26.5185542988109, 'y': -66.01806631695949, 'z': -1.1423950957042723}, 'heading': {'w': -0.5969095945843376, 'x': 0.005854257934816275, 'y': -0.013859519729302315, 'z': 0.802167424712623}}]
[1557542685.100485, 1557542685.600459, 1557542686.100413, 1557542686.600308, 1557542687.100268, 1557542687.600117, 1557542688.099898, 1557542688.599731, 1557542689.099869, 1557542689.600013, 1557542690.099949, 1557542690.600074, 1557542691.100226, 1557542691.600699, 1557542692.101192, 1557542692.601412]
True

LiDAR 点云默认包括来自机械 360° LiDAR 和前置 LiDAR 的点。要仅选择其中一个传感器，可以使用 set_sensor 方法。

>>>pc0 = seq052.lidar[0]
>>>print(pc0.shape)

>>>seq052.lidar.set_sensor(0)  # 设置为仅包括机械 360° LiDAR
>>>pc0_sensor0 = seq052.lidar[0]
>>>print(pc0_sensor0.shape)

>>>seq052.lidar.set_sensor(1)  # 设置为仅包括前置式 LiDAR
>>>pc0_sensor1 = seq052.lidar[0]
>>>print(pc0_sensor1.shape)

(171639, 6)
(105258, 6)
(66381, 6)

由于应用的过滤操作使每个点的原始行索引保持不变（与加入 SemanticSegmentation (语义分割)相关），因此可以轻松测试过滤中没有遗漏任何点：

>>>import pandas as pd
>>>pc0_concat = pd.concat([pc0_sensor0, pc0_sensor1])
>>>print(pc0_concat.shape)
>>>print(pc0.shape)
>>>print(pc0)
>>>print(pc0_concat)

(171639, 6)
(171639, 6)
                x           y         z     i             t  d
index                                                         
0      -44.875744  132.243187  6.597356  14.0  1.557543e+09  0
1      -44.913245  132.241051  4.126787  13.0  1.557543e+09  0
2      -19.848992   58.456260  1.775779   2.0  1.557543e+09  0
3       -7.948135   23.455820  1.398348   0.0  1.557543e+09  0
4       -7.608594   22.439950  1.007010   0.0  1.557543e+09  0
...           ...         ...       ...   ...           ... ..
171634  10.017254 -115.447229  7.422902   0.0  1.557543e+09  1
171635  26.406157  -27.148679  4.657497   5.0  1.557543e+09  1
171636   9.802706 -115.248983  7.427200   1.0  1.557543e+09  1
171637  33.059280  -33.940527  5.448124   0.0  1.557543e+09  1
171638   9.582304 -115.326288  7.448025   1.0  1.557543e+09  1

[171639 rows x 6 columns]
                x           y         z     i             t  d
index                                                         
0      -44.875744  132.243187  6.597356  14.0  1.557543e+09  0
1      -44.913245  132.241051  4.126787  13.0  1.557543e+09  0
2      -19.848992   58.456260  1.775779   2.0  1.557543e+09  0
3       -7.948135   23.455820  1.398348   0.0  1.557543e+09  0
4       -7.608594   22.439950  1.007010   0.0  1.557543e+09  0
...           ...         ...       ...   ...           ... ..
...
171637  33.059280  -33.940527  5.448124   0.0  1.557543e+09  1
171638   9.582304 -115.326288  7.448025   1.0  1.557543e+09  1

[171639 rows x 6 columns]

这里我运行它的代码会报错，比较的是PC0,和PC0_concat，报错：只能比较标记相同的DATAFrame对象

官方GITHUB示例如下：

这是官方说明
>>> import pandas as pd
>>> pc0_concat = pd.concat([pc0_sensor0, pc0_sensor1])
>>> print(pc0_concat.shape)
(166768, 6)
>>> print(pc0 == pc0_concat)
           x     y     z     i     t     d
index                                     
0       True  True  True  True  True  True
1       True  True  True  True  True  True
2       True  True  True  True  True  True
3       True  True  True  True  True  True
4       True  True  True  True  True  True
      ...   ...   ...   ...   ...   ...
166763  True  True  True  True  True  True
166764  True  True  True  True  True  True
166765  True  True  True  True  True  True
166766  True  True  True  True  True  True
166767  True  True  True  True  True  True
[166768 rows x 6 columns]
>>> print((~(pc0 == pc0_concat)).sum())  # 计算具有 `False` 值的单元格的数量，即原始点云和级联过滤点云区分的单元格的数量
x    0
y    0
z    0
i    0
t    0
d    0
dtype: int64

相机

由于记录车配备了多个摄像头，首先我们需要列出哪些摄像头用于记录序列。

所有序列的摄像机数量和名称应该相同。

每个摄像机名称都将其记录加载为 Pillow Image 对象，并且可以通过正常的列表切片访问。在以下示例中，我们从前置摄像头中选择第一张图像，并使用 Python 中的 Pillow 库显示它。

print(seq052.camera.keys())
front_camera = seq052.camera['front_camera']
img0 = front_camera[0]
img0.show()

dict_keys(['back_camera', 'front_camera', 'front_left_camera', 'front_right_camera', 'left_camera', 'right_camera'])

之后，广泛的 Pillow Image API 可用于图像处理、转换或导出。

与 Lidar 对象类似，每个 Camera 对象都具有保存每个记录帧的相机位姿 (camera.poses) 和时间戳 (camera.timestamps) 以及相机内在函数 (camera.intrinsics) 的属性。同样，对象可以像 Camera 对象一样被切片：

sl = slice(None, None, 5)  # Equivalent to [::5]
camera_obj = seq052.camera['front_camera']
pcs = camera_obj[sl]
poses = camera_obj.poses[sl]
timestamps = camera_obj.timestamps[sl]
intrinsics = camera_obj.intrinsics

poses就是姿势：输出了相机的位姿态：

[{'position': {'x': 0.11759312086125054,
   'y': -0.29136374481281996,
   'z': 1.8005776975743413},
  'heading': {'w': -0.09109663944831273,
   'x': 0.12045724935648461,
   'y': -0.710348048710834,
   'z': 0.687456982691575}},
 {'position': {'x': 1.9043715418553206,
   'y': -6.1947948859074105,
   'z': 1.5912989575276542},
  'heading': {'w': -0.09648439939706947,
   'x': 0.11255100375702809,
   'y': -0.7062235107430294,
   'z': 0.6922942908186063}},
 {'position': {'x': 3.681747223016379,
   'y': -12.115399749707231,
   'z': 1.5673422661629572},
  'heading': {'w': -0.09343164655924122,
   'x': 0.11129280260374516,
   'y': -0.7090833338565794,
   'z': 0.6899893224946642}},
 {'position': {'x': 5.380015191468716,
   'y': -17.940283020789227,
   'z': 1.4320371884514842},
  'heading': {'w': -0.09585658524411389,
...
   'z': 0.6539618523713864},
  'heading': {'w': -0.4127396810389155,
   'x': 0.4282236990580735,
   'y': -0.573504039631688,
   'z': 0.5633502780571461}}]

这是时间戳
[1557542685.050653,
 1557542685.550628,
 1557542686.05058,
 1557542686.55047,
 1557542687.050435,
 1557542687.550293,
 1557542688.05008,
 1557542688.549885,
 1557542689.050015,
 1557542689.550165,
 1557542690.050113,
 1557542690.550216,
 1557542691.050368,
 1557542691.550787,
 1557542692.051331,
 1557542692.551519]

除了传感器数据，加载的数据集还包含以下元信息：

1.GPS位置

2.时间戳

这些可以通过已知的列表切片操作直接访问，并以其 dict 格式读取。以下示例显示了如何在第一帧上获取车辆的 GPS 坐标。

>>>pose0 = seq052.gps[0]

>>>#pose0['lat']
>>>#pose0['long']
>>>pose0
{'lat': 37.7728433385318,
 'long': -122.41862333850148,
 'height': 11.25988652081863,
 'xvel': -0.014911437436833385,
 'yvel': 12.341168303373445}

API 参考：GPS 类

API 参考：时间戳类

注释

长方体:（就是经过分割后的物体的各个属性的标签）

LiDAR Cuboid 注释也作为每个时间戳的 pandas.DataFrames 存储在序列对象中。位置坐标 (position.x,position.y,position.z) 位于长方体的中心。尺寸.x 是长方体从左到右的宽度，尺寸.y 是长方体从前到后的长度，尺寸.z 是长方体从上到下的高度。

>>>cuboids0 = seq052.cuboids[0]  # 返回序列中第一个 LiDAR 帧的长方体注释
>>>print(cuboids0.columns)
Index(['uuid', 'label', 'yaw', 'stationary', 'camera_used', 'position.x',
       'position.y', 'position.z', 'dimensions.x', 'dimensions.y',
       'dimensions.z', 'attributes.object_motion', 'cuboids.sibling_id',
       'cuboids.sensor_id', 'attributes.pedestrian_behavior',
       'attributes.pedestrian_age', 'attributes.rider_status'],
      dtype='object')

语义分割

类似于长方体注释，语义分割可以使用序列对象上的 semseg 属性来访问。每个 Semantic Segmentation 数据帧的索引对应于每个 LiDAR 点云数据帧的索引，并且可以使用索引进行连接。

semseg0 = seq052.semseg[0]  # # 返回序列中第一个 LiDAR 帧的语义分割

print(semseg0.columns)

print(seq052.semseg.classes)

Index(['class'], dtype='object')
{'1': 'Smoke', '2': 'Exhaust', '3': 'Spray or rain', '4': 'Reflection', '5': 'Vegetation', '6': 'Ground', '7': 'Road', '8': 'Lane Line Marking', '9': 'Stop Line Marking', '10': 'Other Road Marking', '11': 'Sidewalk', '12': 'Driveway', '13': 'Car', '14': 'Pickup Truck', '15': 'Medium-sized Truck', '16': 'Semi-truck', '17': 'Towed Object', '18': 'Motorcycle', '19': 'Other Vehicle - Construction Vehicle', '20': 'Other Vehicle - Uncommon', '21': 'Other Vehicle - Pedicab', '22': 'Emergency Vehicle', '23': 'Bus', '24': 'Personal Mobility Device', '25': 'Motorized Scooter', '26': 'Bicycle', '27': 'Train', '28': 'Trolley', '29': 'Tram / Subway', '30': 'Pedestrian', '31': 'Pedestrian with Object', '32': 'Animals - Bird', '33': 'Animals - Other', '34': 'Pylons', '35': 'Road Barriers', '36': 'Signs', '37': 'Cones', '38': 'Construction Signs', '39': 'Temporary Construction Barriers', '40': 'Rolling Containers', '41': 'Building', '42': 'Other Static Object'}

还会继续更新。