深度学习（1）RGB-D数据集：ScanNet

最新推荐文章于 2025-03-27 21:22:14 发布

冰火舞动

最新推荐文章于 2025-03-27 21:22:14 发布

阅读量2.2w

点赞数 22

文章标签：深度学习人工智能可视化

本文链接：https://blog.csdn.net/sinat_41354290/article/details/118103531

版权

本文详细介绍了多个RGB-D数据集，包括ScanNet、SUNRGB-D和NYU-DepthV2，涵盖了数据获取、解析和使用。重点讲述了ScanNet数据集的结构，如2D和3D数据、帧的解析以及提供的不同标注类型。同时，提供了数据下载的步骤和解析代码示例，帮助读者理解并处理这些大型3D视觉数据集。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本文主要介绍相关的RGB-D数据集，并完成其搬运工作。

1. ScanNet数据集
[2. SUN RGB-D 数据集](http://rgbd.cs.princeton.edu/)
- 2.1 获取数据集
- 2.2 解析数据集
[3. NYU-Depth V2数据集](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html)
- 3.1 获取数据集
- 3.2 解析数据集
4.TUM数据集
[5. SceneNet RGB-D数据集](https://robotvault.bitbucket.io/scenenet-rgbd.html)
参考资料

1. ScanNet数据集

1513个采集场景数据，21个类别的对象，其中，1201个场景用于训练，312个场景用于测试。
该数据集有四个评测任务：3D语义分割、3D实例分割、2D语义分割和2D实例分割。

ScanNet is an RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval. More information can be found in our paper.

官方链接

官方GitHub

1.1 获取数据集

申请数据集：ScanNet Terms of Use to scannet@googlegroups.com
下载数据集
```
#-o 保存文件路径
python download_scannet.py -o data
```
由于2DRGB-D帧的数据量特别大，作者提供了下载较小子集的选项scannet_frames_25k（约25,000帧，从完整数据集中大约每100帧进行二次采样）通过ScanNet数据下载，有5.6G，还有基准评估scannet_frames_test。#TODO 更多细节待补
```
PREPROCESSED_FRAMES_FILE = ['scannet_frames_25k.zip', '5.6GB']
TEST_FRAMES_FILE = ['scannet_frames_test.zip', '610MB']
```
下载scannet_frames_25k
```
python download-scannet.py -o data --preprocessed_frames 
```
一般会出现urllib.error.HTTPError: HTTP Error 404: Not Found,笔者的解决方法是将下图中马赛克的下的网页链接复制到浏览器，直接用浏览器或迅雷下载。笔者测试的是迅雷不能下载，浏览器需要科学上网，下载速度还是很可观的，8MB/S左右。

1.2 解析数据集

<scanId>
|-- <scanId>.sens
	RGB-D传感器流（*sens）：压缩二进制格式，
	包含每帧的颜色、深度、相机姿势和其他数据。
	其中RGB图像大小为1296×968，深度图像大小为640×480
|-- <scanId>_vh_clean.ply
	高质量重建后的surface mesh 文件（.ply）：
    (Updated if had remove annotations)
|-- <scanId>_vh_clean_2.ply
    (Updated if had remove annotations)
|-- <scanId>.aggregation.json, <scanId>_vh_clean.aggregation.json
    曲面网格分割文件（.segs.json）：记录了场景中物体分割的详细信息
    Updated aggregated instance-level semantic annotations on lo-res, hi-res meshes, respectively
|-- <scanId>_vh_clean_2.labels.ply
    Updated visualization of aggregated semantic segmentation; colored by nyu40 labels (see legend referenced above; ply property 'label' denotes the ScanNet label id)
|-- <scanId>_2d-label.zip
    Updated raw 2d projections of aggregated annotation labels as 16-bit pngs with ScanNet label ids
    原始16位png标签标注信息，图像大小为1296×968，带有ScanNet的标签id
|-- <scanId>_2d-instance.zip
    Updated raw 2d projections of aggregated annotation instances as 8-bit pngs
    原始16位png实例标注信息，图像大小为1296×968
|-- <scanId>_2d-label-filt.zip
    Updated filtered 2d projections of aggregated annotation labels as 16-bit pngs with ScanNet label ids
    经过滤波的8位png标签标注信息，图像大小为1296×968，带有ScanNet的标签id
|-- <scanId>_2d-instance-filt.zip
    Updated filtered 2d projections of aggregated annotation instances as 8-bit pngs
    经过滤波的8位png实例标注信息，图像大小为1296×968

目前还不清楚，label和instance的区别。

1.2.1 2D数据

包括每一个场景下的N个帧（为了避免帧之间的重叠信息一般取的时候隔50取一帧）2D标签和实例数据提供为.png图像文件。彩色图像以8位RGB的形式提供.jpg，深度图片为16位 .png（除以1000可获得以米为单位的深度）。详细信息见参考资料1.

在这里插入图片描述

2D图像数据解析

解析代码链接

安装依赖包imageio
```
pip install imageio==1.1
```
Imageio: 'freeimage-3.15.1-win64.dll' was not found on your computer; downloading it now.详细信息见参考资料2.

笔者的python2.7环境：

解析图像数据，推荐python2.7，python3存在struct.unpack str到bayes转换为题。

python reader.py --filename scene0000_00.sens --output_path image 
#python reader.py --filename [.sens file to export data from] --output_path [output directory to export data to]
#Options:
#--export_depth_images: export all depth frames as 16-bit pngs (depth shift 1000)
#--export_color_images: export all color frames as 8-bit rgb jpgs
#--export_poses: export all camera poses (4x4 matrix, camera to world)
#--export_intrinsics: export camera intrinsics (4x4 matrix)

为了便于可视化解析进程，建议对SensorData.py文件进行修改，增加进度条部分代码

from tqdm import tqdm 
#更换71行代码：for i in range(num_frames): 为：
for i in tqdm(range(num_frames),ncols=80):
#相应的81行、93行 也可以相应更换为：
for f in tqdm(range(0, len(self.frames), frame_skip),ncols=80):
for f in tqdm(range(0, len(self.frames), frame_skip),ncols=80):

在这里插入图片描述

解析结果：

row：

filtered ：

1.2.2 scannet_frames_25k 数据

数据组成：

在这里插入图片描述
color图为每隔100帧进行二次采样的结果，depth、instance、label和pose分别对应其深度图、实例图、标签图和位置信息。# TODO intrinsics_color.txt和intrinsics_depth.txt为相机矩阵。

在这里插入图片描述

1.3 数据集分割

还未开始相关工作，详细信息见参考资料3.

官方分割文件

2. SUN RGB-D 数据集

该数据集有四个评测任务：场景分类，语义分割，室内布局估计，3D目标检测。

包含10,335个 RGB-D 图像，其规模与 PASCAL VOC 相似；
是NYU depth v2 , Berkeley B3DO , and SUN3D ，三个数据集的并集；
整个数据集都进行了密集注释，其中包括 146,617 个 2D 多边形（平面目标框）和64,595个具有精确物体方向的3D 边界框（三维目标框）；
具有较高的物体方向的准确性及 3D 空间布局和场景分类。

2.1 获取数据集

Download：http://rgbd.cs.princeton.edu/challenge.html

# see: http://rgbd.cs.princeton.edu/ in section Data and Annotation
DATASET_URL = 'http://rgbd.cs.princeton.edu/data/SUNRGBD.zip'
DATASET_TOOLBOX_URL = 'http://rgbd.cs.princeton.edu/data/SUNRGBDtoolbox.zip'

2.2 解析数据集

README.txt：

********************************************************************************
Data: Image depth and label data are in SUNRGBD.zip
image: rgb image
depth: 
depth image  to read the depth see the code in SUNRGBDtoolbox/read3dPoints/.
	extrinsics: the rotation matrix to align the point could with gravity
fullres: full resolution depth and rgb image
intrinsics.txt  : sensor intrinsic
scene.txt  : scene type
annotation2Dfinal  : 2D segmentation
annotation3Dfinal  : 3D bounding box
annotation3Dlayout : 3D room layout bounding box

*********************************************************************************
Label: 
In SUNRGBDtoolbox/Metadata 
SUNRGBDMeta.mat:  
	2D,3D bounding box ground truth and image information for each frame.
SUNRGBD2Dseg.mat:  
	2D segmetation ground truth. 
The index in "SUNRGBD2Dseg(imageId).seglabelall"  
	mapping the name to "seglistall". 
The index in "SUNRGBD2Dseg(imageId).seglabel" 
	are mapping the object name in "seg37list".
 
********************************************************************************

共有37个类别

wall,floor,cabinet,bed,chair,sofa,
table,door,window,bookshelf,picture,
counter,blinds,desk,shelves,curtain,
dresser,pillow,mirror,floor_mat,clothes,
ceiling,books,fridge,tv,paper,towel,
shower_curtain,box,whiteboard,person,
night_stand,toilet,sink,lamp,bathtub,bag

部分解析代码：

直接从SUNRGBDtoolbox/Metadata中解析数据路径

for i, meta in tqdm(enumerate(SUNRGBDMeta)):
    meta_dir = '/'.join(meta.rgbpath.split('/')[:-2])
    real_dir = meta_dir.split('/n/fs/sun3d/data/SUNRGBD/')[1]
    depth_bfx_path = os.path.join(real_dir, 'depth_bfx/' + meta.depthname)
    rgb_path = os.path.join(real_dir, 'image/' + meta.rgbname)

    label_path = os.path.join(real_dir, 'label/label.npy')
    label_path_full = os.path.join(output_path, 'SUNRGBD', label_path)

    # save segmentation (label_path) as numpy array
    if not os.path.exists(label_path_full):
        os.makedirs(os.path.dirname(label_path_full), exist_ok=True)
        label = np.array(
            SUNRGBD2Dseg[seglabel[i][0]][:].transpose(1, 0)).\
            astype(np.uint8)
        np.save(label_path_full, label)

    if meta_dir in split_train:
        img_dir_train.append(os.path.join('SUNRGBD', rgb_path))
        depth_dir_train.append(os.path.join('SUNRGBD', depth_bfx_path))
        label_dir_train.append(os.path.join('SUNRGBD', label_path))
    else:
        img_dir_test.append(os.path.join('SUNRGBD', rgb_path))
        depth_dir_test.append(os.path.join('SUNRGBD', depth_bfx_path))
        label_dir_test.append(os.path.join('SUNRGBD', label_path))

数据可视化

SUN RGB-D 数据集论文翻译

更多信息见参考资料5.

3. NYU-Depth V2数据集

自3个城市的64个场景;
包含了1449张具有语义标注的RGB和深度图像和407024张没有语义标注的图像;

V2和V1的区别：
在这里插入图片描述

3.1 获取数据集

Download

3.2 解析数据集

在这里插入图片描述

accelData-采用Nx4的加速度计值矩阵，当每帧都被取走。这些列包括设备的滚动、偏航、俯仰和倾斜角度。
depths-HxWxN维度的矩阵深度图，其中H和W分别为高度和宽度，N为图像的个数。深度元素的值是米。
images-HxWx3xN RGB图像矩阵，其中H和W分别是高度和宽度，N是图像的数量
labels-HxWxN标签矩阵，其中H和W分别是高度和宽度，N是图像数量。标签范围从1…C，其中C是类的总数。标签的范围从1…C是类的总数。如果一个像素的标签值为0，那么这个像素就没有标记。
names-每个类的英文名称的Cx1单元格数组
namesTolds-从英文标签名称到ID（使用C键值对）
rawDepths-HxWxN深度图矩阵，其中H和W分别是高度和宽度，N是图像数量。这些深度图是kinect的原始输出
scenes-每个图像拍摄场景名称的Cx1单元阵列

详细信息见参考资料6

解析代码

4.TUM数据集

包含从RGB-D传感器采集到的一些室内的序列图像，
同时TUM提供很多数据子集，每个子集中包含了图像序列、相应的轮廓和完整的校准参数。
通过数据集可以在不同的纹理下、不同的光照和不同的结构条件下去评估物体重建和SLAM/视觉里程计的性能。

5. SceneNet RGB-D数据集

详细信息见参考资料

参考资料

关于ScanNet数据集
OSError: Unable to download ‘freeimage-3.15.1-win64.dll‘. Perhaps there is a no internet connection?
ScanNetV2 数据集讲解和选择性下载
主流RGBD数据集简介
《《《翻译》》》SUN RGB-D数据集
NYU Depth Dataset V2数据集的读取