ScanNet数据集详解

原创

已于 2025-08-04 17:36:43 修改 · 3k 阅读

33 ·

CC 4.0 BY-SA版权

文章标签：

#数据集

于 2025-07-14 11:35:47 首次发布

ScanNet

http://www.scan-net.org/ScanNet/

https://github.com/ScanNet/ScanNet

https://zhuanlan.zhihu.com/p/4107946359

https://blog.csdn.net/weixin_42888638/article/details/125263163

https://blog.csdn.net/shan_5233/article/details/128300415

https://github.com/daveredrum/ScanRefer/blob/master/data/scannet/README.md

ScanNet is an RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations.

ScanNet 是一个 RGB-D 视频数据集，在超过 1500 次扫描中包含了 250 万个视图，并标注了 3D 相机位姿、表面重建和实例级的语义分割。为了收集这些数据，设计了一个易于使用且具有扩展性的 RGB-D 捕获系统，包括自动化的表面重建和众包的语义标注。实验表明，使用此数据可以在多个 3D 场景理解任务（如 3D 物体分类、语义体素标注和 CAD 模型检索）上取得最新的性能。

ScanNet Data

If you would like to download the ScanNet data, please fill out an agreement to the ScanNet Terms of Use, using your institutional email addresses, and send it to us at scannet@googlegroups.com.

If you have not received a response within a week, it is likely that your email is bouncing - please check this before sending repeat requests. Please do not reply to the noreply email - your email won’t be seen.

Please check the changelog for updates to the data release.

如果您想下载 ScanNet 数据，请使用您的机构邮箱地址填写一份 ScanNet 使用条款协议，并发送至 scannet@googlegroups.com。

如果您在一周内未收到回复，很可能是您的邮件被退回了——在重复发送请求前，请先检查此问题。请不要回复“noreply”邮箱地址，您的邮件将不会被查阅。

请查看更新日志 (changelog) 以了解数据发布的最新动态。

Data Organization

The data in ScanNet is organized by RGB-D sequence. Each sequence is stored under a directory with named scene<spaceId>_<scanId>, or scene%04d_%02d, where each space corresponds to a unique location (0-indexed). The raw data captured during scanning, camera poses and surface mesh reconstructions, and annotation metadata are all stored together for the given sequence. The directory has the following structure:

ScanNet 中的数据是按 RGB-D 序列组织的。每个序列存储在一个名为 scene<空间ID>_<扫描ID>（或 scene%04d_%02d）的目录下，其中每个空间对应一个从0开始索引的唯一位置。扫描期间捕获的原始数据、相机位姿和表面网格重建以及标注元数据都一同存储在给定序列的目录中。该目录具有以下结构：

<scanId>
|-- <scanId>.sens
    RGB-D sensor stream containing color frames, depth frames, camera poses and other data
|-- <scanId>_vh_clean.ply
    High quality reconstructed mesh
|-- <scanId>_vh_clean_2.ply
    Cleaned and decimated mesh for semantic annotations
|-- <scanId>_vh_clean_2.0.010000.segs.json
    Over-segmentation of annotation mesh
|-- <scanId>.aggregation.json, <scanId>_vh_clean.aggregation.json
    Aggregated instance-level semantic annotations on lo-res, hi-res meshes, respectively
|-- <scanId>_vh_clean_2.0.010000.segs.json, <scanId>_vh_clean.segs.json
    Over-segmentation of lo-res, hi-res meshes, respectively (referenced by aggregated semantic annotations)
|-- <scanId>_vh_clean_2.labels.ply
    Visualization of aggregated semantic segmentation; colored by nyu40 labels (see img/legend; ply property 'label' denotes the nyu40 label id)
|-- <scanId>_2d-label.zip
    Raw 2d projections of aggregated annotation labels as 16-bit pngs with ScanNet label ids
|-- <scanId>_2d-instance.zip
    Raw 2d projections of aggregated annotation instances as 8-bit pngs
|-- <scanId>_2d-label-filt.zip
    Filtered 2d projections of aggregated annotation labels as 16-bit pngs with ScanNet label ids
|-- <scanId>_2d-instance-filt.zip
    Filtered 2d projections of aggregated annotation instances as 8-bit pngs

这份文件清单描述了一个 特定3D扫描场景 (<scanId>) 的所有相关数据 。

简单来说，这些文件可以分为五大类：

📸 原始传感器数据

<scanId>.sens
- 这是最原始的输入，一个包含所有信息的“录像带”。它记录了扫描时的 彩色图像 (RGB)、深度图像 (Depth) 和 相机移动轨迹。所有后续的文件都是基于它生成的。

🧊 3D场景模型

<scanId>_vh_clean.ply
- 高质量的3D模型。这是根据 .sens 文件重建出的最精细、最完整的场景样貌。
<scanId>_vh_clean_2.ply
- 用于标注的简化版3D模型。为了方便进行标注，这个版本在保留主要结构的同时，减少了面数，文件更小、处理更快。

🏷️ 语义标注文件 (核心)

这些 .json 文件是“说明书”，用数据描述了3D模型里的物体是什么。

<scanId> ... segs.json
- 预分割文件。将3D模型拆分成很多小碎块，是为物体标注做准备。
<scanId> ... aggregation.json
- 最终的物体标注文件。这个文件是核心，它将上面的小碎块“聚合”成有意义的物体（例如，哪些碎块组合起来是“椅子”，哪些是“桌子”），并给每个物体一个唯一的ID。

🎨 可视化标注模型

<scanId>_vh_clean_2.labels.ply
- 一个彩色的3D模型。它直接将标注结果用颜色画在了简化的3D模型上（比如椅子是红色，桌子是蓝色），让你能非常直观地看到标注结果。

🖼️ 2D投影数据

这些 .zip 压缩包是将3D的标注信息“拍扁”成2D图片的结果，对应原始视频的每一帧。

<scanId>_2d-label.zip
- 2D类别图：图片里的每个像素值代表它属于哪个类别 (如：椅子、桌子)。
<scanId>_2d-instance.zip
- 2D实例图：图片里的每个像素值代表它属于哪个具体的物体 (如：椅子1号、椅子2号)。
...-filt.zip
- 是上述2D图的优化过滤版，去除了噪点，标注更干净。

这个数据文件夹包含了从 ①原始扫描数据，到 ②重建的3D模型，再到 ③对模型中每个物体的文字标注，最后生成了 ④可供预览的彩色模型 和 ⑤可用于2D图像分析的标注图片 的全套资料。

Data Formats

The following are overviews of the data formats used in ScanNet:

Reconstructed surface mesh file (*.ply): Binary PLY format mesh with +Z axis in upright orientation.

RGB-D sensor stream (*.sens): Compressed binary format with per-frame color, depth, camera pose and other data. See ScanNet C++ Toolkit for more information and parsing code. See SensReader/python for a very basic python data exporter.

Surface mesh segmentation file (*.segs.json)

重建的表面网格文件 (*.ply) ：二进制 PLY 格式的网格文件，其中 +Z 轴为垂直向上方向 。

RGB-D 传感器流 (*.sens) ：一种压缩的二进制格式，包含每一帧的彩色、深度、相机位姿及其他数据。更多信息和解析代码，请参阅 ScanNet C++ 工具包 (ScanNet C++ Toolkit)。如果需要一个非常基础的 Python 数据导出工具，请参阅 SensReader/python

表面网格分割文件 (*.segs.json)：

{
   
   
  "params": {
   
     // segmentation parameters
   "kThresh": "0.0001",
   "segMinVerts": "20",
   "minPoints": "750",
   "maxPoints": "30000",
   "thinThresh": "0.05",
   "flatThresh": "0.001",
   "minLength": "0.02",
   "maxLength": "1"
  },
  "sceneId": "...",  // id of segmented scene
  "segIndices": [1,1,1,1,3,3,15,15,15,15],  // per-vertex index of mesh segment
}

Aggregated semantic annotation file (*.aggregation.json)：

{
   
   
  "sceneId": "...",  // id of annotated scene
  "appId": "...", // id + version of the tool used to create the annotation
  "segGroups":

最低0.47元/天解锁文章