深入理解Detectron 2 — Part 3 数据加载器和Ground Truth实例（Data Loader and Ground Truth Instances）

本文详细介绍Detectron2的数据加载过程及如何处理groundtruth数据，包括数据集注册、数据映射等关键步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Figure 1. Inference result of Faster (Base) R-CNN with Feature Pyramid Network.

嗨，我是计算机视觉研究员 Hiroto Honda¹[homepage] [twitter] 在这篇文章中，我想分享我对 Detectron 2 的了解——repo 结构、构建和训练网络、处理数据集等等。 2019 年，我使用Detectron 2所基于的 maskrcnn-benchmark 在 Open Images 竞赛 (ICCV 2019) 中获得了第 6 名。理解整个框架对我来说并不是一件容易的事，所以我希望这篇文章能帮助那些渴望了解系统细节并开发自己的模型的研究人员和工程师。

在深入理解Detectron 2 — 特征金字塔网络中，我展示了特征金字塔网络 (FPN) 的详细信息。
在进行区域提议网络（RPN）之前，我们应该了解ground truth的数据结构。在这一部分中，我将分享如何从数据集中加载ground truth，以及如何在将加载的数据馈送到网络之前对其进行处理。

网络中哪些地方使用了ground truth数据？

要训练检测模型，您需要准备图像和标注数据。
对于 Base-RCNN-FPN（Faster R-CNN），ground truth数据用于区域提议网络（RPN）和 Box Head（见图 2）。

Figure 2. Ground truth box annotations are used in the Region Proposal Network and Box Head to calculate losses.

用于目标检测的标注数据有以下部分组成:

框标签：对象的位置和大小（例如 [x, y, w, h]）
类别标签：对象类别 ID（例如 12：“停车计时器”）

请注意，RPN 不会学习对对象类别进行分类，因此类别标签仅用于 ROI Heads。

从指定数据集的注释文件加载Ground truth数据。我们来看一下数据加载的过程。

数据加载器（Data loader）

Detectron 2 的数据加载器是多级嵌套的。它由构建器在开始训练之前构建³。

dataset_dicts (list) 是从数据集中注册的标注数据列表。
DatasetFromList (data.Dataset) 接受一个 dataset_dicts 并将其包装为一个torch数据集。
MapDataset(data.Dataset) 调用DatasetMapper 类来映射DatasetFromList 的每个元素。它加载图像，转换图像和标注，并将标注转换为“实例”对象。

Figure 3. Data loader of Detectron 2.

加载标注数据

假设我们有一个名为“mydataset”的数据集，其中包含以下图像和标注⁴。

FIgure 4. Example of an image and annotations

要从数据集加载数据，必须将其注册到 DatasetCatalog。例如，要注册 mydataset，

from detectron2.data import DatasetCatalog
from mydataset import load_mydataset_json
def register_mydataset_instances(name, json_file):
    DatasetCatalog.register(name, lambda: load_mydataset_json(json_file, name))

load_mydataset_json 函数必须包含一个 json 加载器，以便返回以下 dict 记录列表：

[
{
'file_name': 'imagedata_1.jpg',   # image file name
'height': 640,                     # image height 
'width': 640,                      # image width
'image_id': 12,                    # image id
'annotations': [                   # list of annotations
{'iscrowd': 0,                           # crowd flag
'bbox': [180.58, 162.66, 24.20, 18.29],  # bounding box label
'category_id': 9,                        # category label
'bbox_mode': <BoxMode.XYWH_ABS: 1>}      # box coordinate mode
,...
]
},
,...
]

对于 COCO 数据集（Detectron 2 的默认值），load_coco_json 函数起到了这个作用。

映射数据（Mapping data）

在训练过程中，注册的标签数据被一条一条地挑选出来。我们需要实际的图像数据（不是路径）和相应的标签。数据集映射器 (DatasetMapper) 处理记录以将“图像”和“实例”添加到 dataset_dict。 “实例”是 Detectron 2 的真实标签（ground truth）对象。

加载和转换图像
    由“文件名”指定的图像由 read_image 函数加载。加载的图像通过预定义的数据增强器（例如左右翻转）进行数据增强，最后注册形状为（通道、高度、宽度）的图像张量。
增强数据标签
    dataset_dict 的“标签”由对图像执行的增强进行增强。例如，如果图像已翻转，则框坐标更改为翻转位置。
    将标注转换为实例
    通过在数据集映射器中调用的这个函数，注释被转换为实例。 ‘bbox’标签被注册到可以存储边界框列表的 Boxes 结构对象。 “category_id”标签被简单地转换为pytorch张量。

映射后， dataset_dict 看起来像：

{'file_name': 'imagedata_1.jpg',
'height': 640, 'width': 640, 'image_id': 0,
'image': tensor([[[255., 255., 255.,  ...,  29.,  34.,  36.],
...[169., 163., 162.,  ...,  44.,  44.,  45.]]]),
'instances': {'gt_boxes': Boxes(tensor([[100.55, 180.24, 114.63, 103.01],
[180.58, 162.66, 204.78, 180.95]])),
'gt_classes': tensor([9, 9]),
}

现在我们有了 Detectron 2 模型可以学习的图像和真值标签。

待续…

在下一部分中，我们将看到region proposal network 如何学习对象位置。感谢您的阅读，请等待下一部分！

[1] This is a personal article and the opinions expressed here are my own and not those of my employer.
[2] Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo and Ross Girshick, Detectron2. GitHub - facebookresearch/detectron2: Detectron2 is FAIR's next-generation platform for object detection, segmentation and other visual recognition tasks., 2019. The file, directory, and class names are cited from the repository ( Copyright 2019, Facebook, Inc. )
[3] In some cases, AspectRatioGroupedDataset is used additionally to group the data into landscape and portrait image groups judging from image sizes.
[4] It’s one of my vacation photos and not from a specific dataset :)

Digging into Detectron 2 — part 3 | by Hiroto Honda | Medium