【mmdetection3d】——使用 dataset 装饰器自定义数据集

最新推荐文章于 2024-03-14 22:46:37 发布

Kadima°

最新推荐文章于 2024-03-14 22:46:37 发布

阅读量4k

点赞数 6

分类专栏： mmdetection3d 文章标签： 3d 深度学习机器学习

本文链接：https://blog.csdn.net/m0_45388819/article/details/121284827

版权

mmdetection3d 专栏收录该内容

13 篇文章 27 订阅

订阅专栏

使用 dataset 装饰器自定义数据集

MMDetection 也支持非常多的数据集包装器（wrapper）来混合数据集或在训练时修改数据集的分布。
最近 MMDetection 支持如下三种数据集包装：

RepeatDataset：将整个数据集简单地重复。
ClassBalancedDataset：以类别均衡的方式重复数据集。
ConcatDataset：合并数据集。

重复数据集（Repeat dataset）

使用 RepeatDataset 包装器来重复数据集。例如，假设原始数据集为 Dataset_A，重复它过后，其配置如下：

dataset_A_train = dict(
        type='RepeatDataset',
        times=N,
        dataset=dict(  # Dataset_A 的原始配置信息
            type='Dataset_A',
            ...
            pipeline=train_pipeline
        )
    )

mmdet中源码：

@DATASETS.register_module()
class RepeatDataset:
    """A wrapper of repeated dataset.

    The length of repeated dataset will be `times` larger than the original
    dataset. This is useful when the data loading time is long but the dataset
    is small. Using RepeatDataset can reduce the data loading time between
    epochs.

    Args:
        dataset (:obj:`Dataset`): The dataset to be repeated.
        times (int): Repeat times.
    """

    def __init__(self, dataset, times):
        self.dataset = dataset
        self.times = times
        self.CLASSES = dataset.CLASSES
        if hasattr(self.dataset, 'flag'):
            self.flag = np.tile(self.dataset.flag, times)

        self._ori_len = len(self.dataset)

    def __getitem__(self, idx):
        return self.dataset[idx % self._ori_len]

    def get_cat_ids(self, idx):
        """Get category ids of repeat dataset by index.

        Args:
            idx (int): Index of data.

        Returns:
            list[int]: All categories in the image of specified index.
        """

        return self.dataset.get_cat_ids(idx % self._ori_len)

    def __len__(self):
        """Length after repetition."""
        return self.times * self._ori_len

 重复数据集的长度将比原始数据集大“times倍”。当数据加载时间较长但数据集较小时，这非常有用。使用RepeatDataset可以缩短epoch之间的数据加载时间

类别均衡数据集（Class balanced dataset）

使用 ClassBalancedDataset 作为包装器在类别的出现的频率上重复数据集。数据集需要实例化 self.get_cat_ids(idx) 函数以支持 ClassBalancedDataset。
比如，以 oversample_thr=1e-3 来重复数据集 Dataset_A，其配置如下：

dataset_A_train = dict(
        type='ClassBalancedDataset',
        oversample_thr=1e-3,
        dataset=dict(  # Dataset_A 的原始配置信息
            type='Dataset_A',
            ...
            pipeline=train_pipeline
        )
    )

更多细节请参考源码。

合并数据集（Concatenate dataset）

合并数据集有三种方法：

如果要合并的数据集类型一致但有多个的标注文件，那么可以使用如下配置将其合并。
```
dataset_A_train = dict(
    type='Dataset_A',
    ann_file = ['anno_file_1', 'anno_file_2'],
    pipeline=train_pipeline
)
```
如果合并的数据集适用于测试或者评估，那么这种方式支持每个数据集分开进行评估。如果想要将合并的数据集作为整体用于评估，那么可以像如下一样设置 separate_eval=False。
```
dataset_A_train = dict(
    type='Dataset_A',
    ann_file = ['anno_file_1', 'anno_file_2'],
    separate_eval=False,
    pipeline=train_pipeline
)
```

如果想要合并的是不同数据集，那么可以使用如下配置。

dataset_A_val = dict()
dataset_B_val = dict()

data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dataset_A_train,
    val=dict(
        type='ConcatDataset',
        datasets=[dataset_A_val, dataset_B_val],
        separate_eval=False))

只需设置 separate_eval=False，用户就可以将所有的数据集作为一个整体来评估。

注意

在做评估时，separate_eval=False 选项是假设数据集使用了 self.data_infos。因此COCO数据集不支持此项操作，因为COCO数据集在做评估时并不是所有都依赖 self.data_infos。组合不同类型的数据集并将其作为一个整体来评估，这种做法没有得到测试，也不建议这样做。
因为不支持评估 ClassBalancedDataset 和 RepeatDataset，所以也不支持评估它们的组合。

一个更复杂的例子则是分别将 Dataset_A 和 Dataset_B 重复N和M次，然后进行如下合并。

dataset_A_train = dict(
    type='RepeatDataset',
    times=N,
    dataset=dict(
        type='Dataset_A',
        ...
        pipeline=train_pipeline
    )
)
dataset_A_val = dict(
    ...
    pipeline=test_pipeline
)
dataset_A_test = dict(
    ...
    pipeline=test_pipeline
)
dataset_B_train = dict(
    type='RepeatDataset',
    times=M,
    dataset=dict(
        type='Dataset_B',
        ...
        pipeline=train_pipeline
    )
)
data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train = [
        dataset_A_train,
        dataset_B_train
    ],
    val = dataset_A_val,
    test = dataset_A_test
)

Kadima°

关注

6
点赞
踩
17

收藏

觉得还不错? 一键收藏
2
评论
【mmdetection3d】——使用 dataset 装饰器自定义数据集

使用 dataset 装饰器自定义数据集MMDetection 也支持非常多的数据集包装器（wrapper）来混合数据集或在训练时修改数据集的分布。最近 MMDetection 支持如下三种数据集包装：RepeatDataset：将整个数据集简单地重复。ClassBalancedDataset：以类别均衡的方式重复数据集。ConcatDataset：合并数据集。重复数据集（Repeat dataset）使用 RepeatDataset 包装器来重复数据集。例如，假设原始数据集为 Datas
复制链接

扫一扫