MMDetection 系列之（自定义数据管道和引入Albumentations库处理增强管道）（四）

qq_41627642

已于 2024-07-26 15:56:38 修改

阅读量3.8k

点赞数 6

分类专栏： MMdetection 文章标签：深度学习 transformer 机器学习

于 2022-05-13 16:07:36 首次发布

本文链接：https://blog.csdn.net/qq_41627642/article/details/124752426

版权

MMdetection 专栏收录该内容

33 篇文章 22 订阅

订阅专栏

1、数据管道设计

遵循典型约定，我们使用Dataset和DataLoader对多个worker进行数据加载。数据集返回与模型的forward方法的参数相对应的数据项字典。由于目标检测中的数据可能不相同大小(图像大小，gt box大小等)，我们在MMCV中引入了一个新的DataContainer类型，以帮助收集和分发不同大小的数据。查看这里了解更多细节。

对数据准备管道和数据集进行分解。数据集通常定义如何处理注释，数据管道定义准备数据字典的所有步骤。管道由一系列操作组成。每个操作都接受一个字典作为输入，并输出一个字典用于下一个转换。
我们在下图中展示了一个经典的管道。蓝色的块是管道操作。随着管道的运行，每个操作符都可以向结果字典添加新的键(标记为绿色)或更新现有的键(标记为橙色)。
在这里插入图片描述

1、pipeline example for Faster R-CNN.

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

对于每个操作，我们列出了添加/更新/删除的相关dict字段。

数据加载

数据加载
LoadImageFromFile
添加：img，img_shape，ori_shape

LoadAnnotations
添加：gt_bboxes，gt_bboxes_ignore，gt_labels，gt_masks，gt_semantic_seg，bbox_fields，mask_fields

LoadProposals
添加：建议

预处理

Resize（缩放）
add: scale, scale_idx, pad_shape, scale_factor, keep_ratio
update: img, img_shape, *bbox_fields, *mask_fields, *seg_fields

RandomFlip（随机镜像）
add: flip
update: img, *bbox_fields, *mask_fields, *seg_fields

Pad（填充）
add: pad_fixed_size, pad_size_divisor
update: img, pad_shape, *mask_fields, *seg_fields

RandomCrop（随机裁剪）
update: img, pad_shape, gt_bboxes, gt_labels, gt_masks, *bbox_fields

Normalize（归一化）
add: img_norm_cfg
update: img

SegRescale
update: gt_semantic_seg

PhotoMetricDistortion
update: img

Expand
update: img, gt_bboxes

MinIoURandomCrop
update: img, gt_bboxes, gt_labels

Corrupt
update: img

RandomShift

在这里插入图片描述

shift_ratio (float): 平移的概率。默认值是 0.5。
max_shift_px (int): 最大平移像素值。默认值是 32。
filter_thr_px (int): 过滤阈值，低于这个宽度和高度的边界框将被过滤掉。默认值是 1。

算法原理
随机平移：
根据 shift_ratio 确定是否应用平移变换。
平移的最大像素值由 max_shift_px 决定。
边界框过滤：
平移后的边界框如果宽度和高度低于 filter_thr_px，则会被过滤掉。

Pad

在这里插入图片描述

RandomCrop

这段代码定义了一个名为 RandomCrop 的类，用于随机裁剪图像、边界框和掩码。
在这里插入图片描述

参数说明
crop_size (tuple): 裁剪的相对比例或绝对像素值（高度和宽度）。
crop_type (str, optional): 指定裁剪类型，可以是 “relative_range”、“relative”、“absolute”、“absolute_range” 之一。默认值为 “absolute”。
“relative”: 随机裁剪部分图像，高度和宽度在 crop_size 的范围内。
“relative_range”: 在 [crop_size[0], 1] 和 [crop_size[1], 1] 范围内分别采样高度和宽度。
“absolute”: 从输入图像中以绝对尺寸裁剪。
“absolute_range”: 在 [crop_size[0], min(h, crop_size[1])] 和 [crop_size[0], min(w, crop_size[1])] 范围内分别采样高度和宽度。
allow_negative_crop (bool, optional): 是否允许裁剪后的图像不包含任何边界框区域。默认值为 False。
recompute_bbox (bool, optional): 是否根据裁剪后的实例掩码重新计算边界框。默认值为 False。
bbox_clip_border (bool, optional): 是否裁剪图像边界外的对象。默认值为 True。
注意事项
如果图像尺寸小于绝对裁剪尺寸，则返回原始图像。
边界框、标签和掩码的键必须对齐。例如，gt_bboxes 对应 gt_labels 和 gt_masks，gt_bboxes_ignore 对应 gt_labels_ignore 和 gt_masks_ignore。
如果裁剪后的图像不包含任何 gt-bbox 区域，并且 allow_negative_crop 设置为 False，则跳过该图像。
算法原理
裁剪尺寸确定：

根据 crop_type 和 crop_size，确定裁剪的具体尺寸。
如果是相对裁剪，尺寸是图像尺寸的比例。
如果是绝对裁剪，尺寸是具体的像素值。
随机裁剪：

从图像中随机选择一个区域进行裁剪。
如果 allow_negative_crop 为 False，则确保裁剪后的区域包含至少一个边界框。
边界框和掩码调整：

根据裁剪后的图像区域，调整边界框和掩码的位置和尺寸。
如果 recompute_bbox 为 True，则根据新的实例掩码重新计算边界框。
边界处理：

如果 bbox_clip_border 为 True，则将裁剪区域外的对象裁剪掉。
通过这种方式，RandomCrop 类可以对图像、边界框和掩码进行随机裁剪，增强训练数据的多样性，从而提高模型的泛化能力。

MinIoURandomCrop

在这里插入图片描述

Corrupt

在这里插入图片描述

Albu

在这里插入图片描述

RandomCenterCropPad

在这里插入图片描述

CutOut

在这里插入图片描述

Mosaic(在faster RCNN中使用马赛克增强)

Mosaic变换的步骤如下：

选择马赛克中心作为4张图像的交点
根据索引获取左上角图像，并从自定义数据集中随机抽取另外3张图像
如果图像大于马赛克块，将对子图像进行裁剪
img_scale（Sequence[int]）：单个图像在马赛克处理前的图像大小。形状顺序应为（宽度，高度）。默认为（640, 640）。
center_ratio_range（Sequence[float]）：马赛克输出的中心比率范围。默认为（0.5, 1.5）。
bbox_clip_border（bool，可选）：是否裁剪图像边界外的物体。在一些数据集中，如MOT17，gt边界框允许越过图像边界。因此，在这些情况下我们不需要裁剪gt边界框。默认为True。
pad_val（int）：填充值。默认为114。
prob（float）：应用此转换的概率。默认为1.0。

在这里插入图片描述

    def __init__(self,
                 img_scale: Tuple[int, int] = (640, 640),
                 center_ratio_range: Tuple[float, float] = (0.5, 1.5),
                 bbox_clip_border: bool = True,
                 pad_val: float = 114.0,
                 prob: float = 1.0) -> None:
        assert isinstance(img_scale, tuple)
        assert 0 <= prob <= 1.0, 'The probability should be in range [0,1]. ' \
                                 f'got {prob}.'

        log_img_scale(img_scale, skip_square=True, shape_order='wh')
        self.img_scale = img_scale
        self.center_ratio_range = center_ratio_range
        self.bbox_clip_border = bbox_clip_border
        self.pad_val = pad_val
        self.prob =

如果要Mosaic在训练中使用，请确保MultiImageMixDataset同时使用。以 ‘Faster R-CNN’ 算法为例，您应该修改配置中的train_pipeline和的值，train_dataset如下所示：

# Open configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py directly and add the following fields
data_root = 'data/coco/'
dataset_type = 'CocoDataset'
img_scale=(1333, 800)

train_pipeline = [
    dict(type='Mosaic', img_scale=img_scale, pad_val=114.0),
    dict(
        type='RandomAffine',
        scaling_ratio_range=(0.1, 2),
        border=(-img_scale[0] // 2, -img_scale[1] // 2)), # The image will be enlarged by 4 times after Mosaic processing,so we use affine transformation to restore the image size.
    dict(type='RandomFlip', prob=0.5),
    dict(type='PackDetInputs')
]

train_dataset = dict(
    _delete_ = True, # remove unnecessary Settings
    type='MultiImageMixDataset',
    dataset=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True)
        ],
        filter_empty_gt=False,
    ),
    pipeline=train_pipeline
    )

data = dict(
    train=train_dataset
    )

修改后的配置信息：
![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/2a290420c65240d6b8c95

MixUp

在这里插入图片描述

RandomAffine

在这里插入图片描述

YOLOXHSVRandomAug

在这里插入图片描述

CopyPaste

在这里插入图片描述

AutoAugment

在这里插入图片描述

Shear

在这里插入图片描述

Rotate

在这里插入图片描述

Translate

在这里插入图片描述

ColorTransform

在这里插入图片描述

EqualizeTransform

在这里插入图片描述

BrightnessTransform

在这里插入图片描述

ContrastTransform

在这里插入图片描述

混合数据增强的使用

MMDetection 支持数据增强神器 Simple Copy Paste 全过程
 在mmdetection中使用mosaic增强
MMDetection 在支持 YOLOX 系列算法时已经支持 Mosaic 和 MixUp
(1) MixUp
MixUp 数据增强是将两张图片按照随机比例进行逐像素点叠加，然后简单地把子图中的标签整合在一起作为混合后图片的标签。

MMDetection 中 MixUp 数据增强的主要逻辑集中在 _mixup_transform 函数，内部集成了 Resize、Flip、Crop 和 Filter 等功能。对于采样选取的图片，已经完成了 Resize 等数据增强，而随机选取的图片还没有进行任何处理，所以需要在 _mixup_transform 函数中对随机选取的图片进行数据增强，之后再进行 MixUp 操作。相关配置和增强示例如下：

dict( 
    type='MixUp', 
    img_scale=img_scale, 
    ratio_range=(0.8, 1.6), 
    pad_val=114.0)

(2) Mosaic
Mosaic 数据增强是随机选取 4 张图片拼接在一起，然后简单地把子图中的标签整合在一起作为混合后图片的标签。

MMDetection 中 Mosaic 数据增强的主要逻辑集中在 _mosaic_transform 函数。首先，创建一个两倍 img_scale 尺寸的空图，确定图像拼接的中心点；然后，根据左上、右上、左下和右下四个方位对不同图像分别进行缩放和拼接；最后，将四张图片的标签拼接在一起。相关配置和增强示例如下：

dict(type='Mosaic', img_scale=img_scale, pad_val=114.0), 
dict( 
    type='RandomAffine', 
    scaling_ratio_range=(0.1, 2), 
    border=(-img_scale[0] // 2, -img_scale[1] // 2)), 
# 注意 Mosaic 需要同 RandomAffine 一起使用，否则图片的尺寸会扩大一倍。RandomAffine 可以单独使用。

(3) Mosaic + MixUp
Mosaic 和 MixUp 两种数据增强可以一起使用，相关配置和增强示例如下：

dict(type='Mosaic', img_scale=img_scale, pad_val=114.0), 
dict( 
    type='RandomAffine', 
    scaling_ratio_range=(0.1, 2), 
    border=(-img_scale[0] // 2, -img_scale[1] // 2)), 
dict( 
    type='MixUp', 
    img_scale=img_scale, 
     ratio_range=(0.8, 1.6), 
     pad_val=114.0),

(4) YOLOX
YOLOX 同时采用了 Mosaic 和 MixUp 两种数据增强，并且增加了色调变换。典型配置和增强示例如下：

# yolox_s_8x8_300e_coco.py 
train_pipeline = [ 
    dict(type='Mosaic', img_scale=img_scale, pad_val=114.0), 
    dict( 
        type='RandomAffine', 
        scaling_ratio_range=(0.1, 2), 
        border=(-img_scale[0] // 2, -img_scale[1] // 2)), 
    dict( 
        type='MixUp', 
        img_scale=img_scale, 
        ratio_range=(0.8, 1.6), 
        pad_val=114.0), 
    dict(type='YOLOXHSVRandomAug'), 
    ... 
    ) 
] 
 
train_dataset = dict( 
    type='MultiImageMixDataset', 
    dataset=dict( 
        type=dataset_type, 
        ... 
    ), 
    pipeline=train_pipeline)

注意：对于混合数据增强，必须配合 MMDetection 中的数据集装饰器 MultiImageMixDataset 才能使用。

MultiImageMixDataset 数据集装饰器

MultiImageMixDataset 数据集装饰器，需要配置两个 pipeline ：内部的 pipeline 是 load_pipeline ，外部的 pipeline 是train_pipeline 。
混合数据增强需要随机选取多个图片，这些图片通常需要进行相同的处理，这部分处理可以通过 load_pipeline 进行处理，经过相同处理的图片会存在 mix_results。合并后的 results 只是增加了 mix_results，可以进行任意的 transform 而不会报错。
train_pipeline 需要包括混合操作，例如Mosaic 、 MixUp 和 CopyPaste 。这些混合操作需要保证输入的 results 中有 mix_results ，输出之前需要删除 mix_results 。具体流程如下：

def __getitem__(self, idx): 
    results = copy.deepcopy(self.dataset[idx]) 
    for (transform, transform_type) in zip(self.pipeline, 
                                           self.pipeline_types): 
        if self._skip_type_keys is not None and \ 
                transform_type in self._skip_type_keys: 
            continue 
 
        if hasattr(transform, 'get_indexes'): 
            for i in range(self.max_refetch): 
                # Make sure the results passed the loading pipeline 
                # of the original dataset is not None. 
                indexes = transform.get_indexes(self.dataset) 
                if not isinstance(indexes, collections.abc.Sequence): 
                    indexes = [indexes] 
                mix_results = [ 
                    copy.deepcopy(self.dataset[index]) for index in indexes 
                ] 
                if None not in mix_results: 
                    results['mix_results'] = mix_results 
                    break 
            else: 
                raise RuntimeError( 
                    'The loading pipeline of the original dataset' 
                    ' always return None. Please check the correctness ' 
                    'of the dataset and its pipeline.') 
 
        for i in range(self.max_refetch): 
            # To confirm the results passed the training pipeline 
            # of the wrapper is not None. 
            updated_results = transform(copy.deepcopy(results)) 
            if updated_results is not None: 
                results = updated_results 
                break 
        else: 
            raise RuntimeError( 
                'The training pipeline of the dataset wrapper' 
                ' always return None.Please check the correctness ' 
                'of the dataset and its pipeline.') 
 
        if 'mix_results' in results: 
            results.pop('mix_results')

注意：由于部分 transform 可能会返回 None ，使用 max_refetch 来确保 dataset 返回的样本不是 None ，同时避免陷入循环。
(SimpleCopyPaste)
SimpleCopyPaste 与 Mosaic 和 MixUp 的不同之处在于：需要对两张图片分别进行缩放、剪切、翻转和填充，再从一张图片中随机选取一些实例粘贴到另一张图片上。缩放、剪切、翻转和填充这些数据增强是非常通用的数据增强，MMDetection 中已经有非常完备的实现了，可以直接复用。具体复用的方式，是利用 MultiImageMixDataset的load_pipeline和train_pipeline。

之前的 Mosaic 和 MixUp 并不需要对随机选取的图片进行特殊处理，所以 load_pipeline 仅包含 LoadImageFromFile 和 LoadAnnotations 用于加载图片和标注。后续的数据增强统一放在 train_pipeline 中，比如Mosaic、RandomAffine、MixUp、YOLOXHSVRandomAug等等。
然而，SimpleCopyPaste 需要对随机选取的图片进行相同的数据增强，所以可以在 load_pipeline 中增加 Resize、 RandomCrop、FilterAnnotations 、RandomFlip 和 Pad 。

其中 FilterAnnotations 是过滤一些特别小的物体的标注，Pad 保证输出的图片具备相同的尺寸，可以直接进行复制粘贴而不会出现像素点越界问题。具体配置如下：

load_pipeline = [ 
    dict(type='LoadImageFromFile', file_client_args=file_client_args), 
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True), 
    dict( 
        type='Resize', 
        img_scale=image_size, 
        ratio_range=(0.8, 1.25), 
        multiscale_mode='range', 
        keep_ratio=True), 
    dict( 
        type='RandomCrop', 
        crop_type='absolute_range', 
        crop_size=image_size, 
        recompute_bbox=True, 
        allow_negative_crop=True), 
    dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), 
    dict(type='RandomFlip', flip_ratio=0.5), 
    dict(type='Pad', size=image_size), 
] 
train_pipeline = [ 
    dict(type='CopyPaste', max_num_pasted=100), 
    dict(type='Normalize', **img_norm_cfg), 
    dict(type='DefaultFormatBundle'), 
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), 
] 
train=dict( 
    type='MultiImageMixDataset', 
    dataset=dict( 
        type=dataset_type, 
        ann_file=data_root + 'annotations/instances_train2017.json', 
        img_prefix=data_root + 'train2017/', 
        pipeline=load_pipeline), 
   pipeline=train_pipeline)

2、扩展和使用自定义管道

1、在文件中写入一个新的管道，例如my_pipeline.py。它接受一个字典作为输入并返回一个字典。

import random
from mmdet.datasets import PIPELINES


@PIPELINES.register_module()
class MyTransform:
    """Add your transform

    Args:
        p (float): Probability of shifts. Default 0.5.
    """

    def __init__(self, p=0.5):
        self.p = p

    def __call__(self, results):
        if random.random() > self.p:
            results['dummy'] = True
        return results

2. 在配置文件中导入并使用管道。确保导入是相对于train脚本所在的位置。

custom_imports = dict(imports=['path.to.my_pipeline'], allow_failed_imports=False)

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='MyTransform', p=0.2),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]

3.可视化您的增强管道的输出

要可视化您的agementpipeline的输出，tools/misc/browse_data .py可以帮助用户可视化地浏览检测数据集(包括图像和边界框注释)，或者将图像保存到指定目录。更多细节可以参考useful_tools

3、自定义数据增强集成到mmdetection/mmdet/datasets/piplines/transform.py

mmdet/datasets/pipelines/transforms.py，修改后的源码如下：

transforms.py

mmdet/datasets/pipelines/init.py，修改后的源码如下：

init.py

4、MMdetection 数据增强

新版MMDetection新添加用Albu数据库对图像进行增强的代码，位置在mmdetection/configs/albu_example/mask_rcnn_r50_fpn_1x.py，是基于图像分割的，用于目标检测的代码更新如下

1、引入albumentations数据增强库进行增强

目标检测tricks：Ablu数据库增强

重点部分代码：

albu_train_transforms = [
    # dict(
    #     type='HorizontalFlip',
    #     p=0.5),
    # dict(
    #     type='VerticalFlip',
    #     p=0.5),

    dict(
        type='ShiftScaleRotate',
        shift_limit=0.0625,
        scale_limit=0.0,
        rotate_limit=180,
        interpolation=1,
        p=0.5),
    # dict(
    #     type='RandomBrightnessContrast',
    #     brightness_limit=[0.1, 0.3],
    #     contrast_limit=[0.1, 0.3],
    #     p=0.2),
    # dict(
    #     type='OneOf',
    #     transforms=[
    #         dict(
    #             type='RGBShift',
    #             r_shift_limit=10,
    #             g_shift_limit=10,
    #             b_shift_limit=10,
    #             p=1.0),
    #         dict(
    #             type='HueSaturationValue',
    #             hue_shift_limit=20,
    #             sat_shift_limit=30,
    #             val_shift_limit=20,
    #             p=1.0)
    #     ],
    #     p=0.1),
    # # dict(type='JpegCompression', quality_lower=85, quality_upper=95, p=0.2),
    #
    # dict(type='ChannelShuffle', p=0.1),
    # dict(
    #     type='OneOf',
    #     transforms=[
    #         dict(type='Blur', blur_limit=3, p=1.0),
    #         dict(type='MedianBlur', blur_limit=3, p=1.0)
    #     ],
    #     p=0.1),
]

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=[(4096, 800), (4096, 1200)], keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Pad', size_divisor=32),
    '''
    Args:
        transforms (list[dict]): A list of albu transformations , albu transformations 列表
        bbox_params (dict): Bbox_params for albumentation `Compose`
        keymap (dict): Contains {'input key':'albumentation-style key'}
        skip_img_without_anno (bool): Whether to skip the image if no ann left
            after aug 是否跳过图像，如果没有ann离开后aug
    '''
    dict(
        type='Albu',
        transforms=albu_train_transforms,
        bbox_params=dict(
            type='BboxParams',
            format='pascal_voc',
            label_fields=['gt_labels'],
            min_visibility=0.0,
            filter_lost_elements=True),
        keymap={
            'img': 'image',
            'gt_bboxes': 'bboxes'
        },
        update_pad_shape=False,
        skip_img_without_anno=True),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='DefaultFormatBundle'),
    dict(
        type='Collect',
        keys=['img', 'gt_bboxes', 'gt_labels'],
        meta_keys=('filename', 'ori_shape', 'img_shape', 'img_norm_cfg',
                   'pad_shape', 'scale_factor')
    )
]

全部代码

# model settings
model = dict(
    type='CascadeRCNN',
    num_stages=3,
    pretrained='torchvision://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_scales=[8],
        anchor_ratios=[0.5, 1.0, 2.0],
        anchor_strides=[4, 8, 16, 32, 64],
        target_means=[.0, .0, .0, .0],
        target_stds=[1.0, 1.0, 1.0, 1.0],
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), # CrossEntropyLoss/FocalLoss
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
    bbox_roi_extractor=dict(
        type='SingleRoIExtractor',
        roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
        out_channels=256,
        featmap_strides=[4, 8, 16, 32]),
    bbox_head=[
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=8,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.1, 0.1, 0.2, 0.2],
            reg_class_agnostic=True,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)),
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=8,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.05, 0.05, 0.1, 0.1],
            reg_class_agnostic=True,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)),
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=8,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.033, 0.033, 0.067, 0.067],
            reg_class_agnostic=True,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
    ])
# model training and testing settings
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=0,
        pos_weight=-1,
        debug=False),
    rpn_proposal=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=2000,
        max_num=2000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=[
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.6,
                neg_iou_thr=0.6,
                min_pos_iou=0.6,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.7,
                min_pos_iou=0.7,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)
    ],
    stage_loss_weights=[1, 0.5, 0.25])
test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=1000,
        nms_post=1000,
        max_num=1000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        score_thr=0.01,
        nms=dict(type='nms', iou_thr=0.5), max_per_img=100),
    keep_all_stages=False)
# dataset
dataset_type = 'CocoDataset'
data_root = './data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

albu_train_transforms = [
    # dict(
    #     type='HorizontalFlip',
    #     p=0.5),
    # dict(
    #     type='VerticalFlip',
    #     p=0.5),

    dict(
        type='ShiftScaleRotate',
        shift_limit=0.0625,
        scale_limit=0.0,
        rotate_limit=180,
        interpolation=1,
        p=0.5),
    # dict(
    #     type='RandomBrightnessContrast',
    #     brightness_limit=[0.1, 0.3],
    #     contrast_limit=[0.1, 0.3],
    #     p=0.2),
    # dict(
    #     type='OneOf',
    #     transforms=[
    #         dict(
    #             type='RGBShift',
    #             r_shift_limit=10,
    #             g_shift_limit=10,
    #             b_shift_limit=10,
    #             p=1.0),
    #         dict(
    #             type='HueSaturationValue',
    #             hue_shift_limit=20,
    #             sat_shift_limit=30,
    #             val_shift_limit=20,
    #             p=1.0)
    #     ],
    #     p=0.1),
    # # dict(type='JpegCompression', quality_lower=85, quality_upper=95, p=0.2),
    #
    # dict(type='ChannelShuffle', p=0.1),
    # dict(
    #     type='OneOf',
    #     transforms=[
    #         dict(type='Blur', blur_limit=3, p=1.0),
    #         dict(type='MedianBlur', blur_limit=3, p=1.0)
    #     ],
    #     p=0.1),
]

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=[(4096, 800), (4096, 1200)], keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Pad', size_divisor=32),
    dict(
        type='Albu',
        transforms=albu_train_transforms,
        bbox_params=dict(
            type='BboxParams',
            format='pascal_voc',
            label_fields=['gt_labels'],
            min_visibility=0.0,
            filter_lost_elements=True),
        keymap={
            'img': 'image',
            'gt_bboxes': 'bboxes'
        },
        update_pad_shape=False,
        skip_img_without_anno=True),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='DefaultFormatBundle'),
    dict(
        type='Collect',
        keys=['img', 'gt_bboxes', 'gt_labels'],
        meta_keys=('filename', 'ori_shape', 'img_shape', 'img_norm_cfg',
                   'pad_shape', 'scale_factor')
    )
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale = (4096, 1000),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=3,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'first_pinggai/bottle/annotations/instances_train2017.json',
        img_prefix=data_root + 'first_pinggai/bottle/images/train2017/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'first_pinggai/bottle/annotations/instances_val2017.json',
        img_prefix=data_root + 'first_pinggai/bottle/images/val2017/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'first_pinggai/bottle/annotations/instances_val2017.json',
        img_prefix=data_root + 'first_pinggai/bottle/images/val2017/',
        pipeline=test_pipeline)
            )
# optimizer
optimizer = dict(type='SGD', lr= 0.005, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardL0oggerHook')
    ])
# yapf:enable
# runtime settings
total_epochs = 50
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/HRo_cascade50_pinggai_1400'
load_from = './checkpoints/cascade_rcnn_r50_fpn_1x_20190501-3b6211ab.pth'
resume_from = None
workflow = [('train', 1)]

这部分代码需要看开头"注意"部分解析，防止修改代码后运行报错。
需要修改两个部分代码：

import inspect
import albumentations
import mmcv
import numpy as np
from albumentations import Compose
from imagecorruptions import corrupt
from numpy import random

from mmdet.core.evaluation.bbox_overlaps import bbox_overlaps
from ..registry import PIPELINES


@PIPELINES.register_module
class Resize(object):
    """Resize images & bbox & mask.

    This transform resizes the input image to some scale. Bboxes and masks are
    then resized with the same scale factor. If the input dict contains the key
    "scale", then the scale in the input dict is used, otherwise the specified
    scale in the init method is used.

    `img_scale` can either be a tuple (single-scale) or a list of tuple
    (multi-scale). There are 3 multiscale modes:
    - `ratio_range` is not None: randomly sample a ratio from the ratio range
        and multiply it with the image scale.
    - `ratio_range` is None and `multiscale_mode` == "range": randomly sample a
        scale from the a range.
    - `ratio_range` is None and `multiscale_mode` == "value": randomly sample a
        scale from multiple scales.

    Args:
        img_scale (tuple or list[tuple]): Images scales for resizing.
        multiscale_mode (str): Either "range" or "value".
        ratio_range (tuple[float]): (min_ratio, max_ratio)
        keep_ratio (bool): Whether to keep the aspect ratio when resizing the
            image.
    """

    def __init__(self,
                 img_scale=None,
                 multiscale_mode='range',
                 ratio_range=None,
                 keep_ratio=True):
        if img_scale is None:
            self.img_scale = None
        else:
            if isinstance(img_scale, list):
                self.img_scale = img_scale
            else:
                self.img_scale = [img_scale]
            assert mmcv.is_list_of(self.img_scale, tuple)

        if ratio_range is not None:
            # mode 1: given a scale and a range of image ratio
            assert len(self.img_scale) == 1
        else:
            # mode 2: given multiple scales or a range of scales
            assert multiscale_mode in ['value', 'range']

        self.multiscale_mode = multiscale_mode
        self.ratio_range = ratio_range
        self.keep_ratio = keep_ratio

    @staticmethod
    def random_select(img_scales):
        assert mmcv.is_list_of(img_scales, tuple)
        scale_idx = np.random.randint(len(img_scales))
        img_scale = img_scales[scale_idx]
        return img_scale, scale_idx

    @staticmethod
    def random_sample(img_scales):
        assert mmcv.is_list_of(img_scales, tuple) and len(img_scales) == 2
        img_scale_long = [max(s) for s in img_scales]
        img_scale_short = [min(s) for s in img_scales]
        long_edge = np.random.randint(
            min(img_scale_long),
            max(img_scale_long) + 1)
        short_edge = np.random.randint(
            min(img_scale_short),
            max(img_scale_short) + 1)
        img_scale = (long_edge, short_edge)
        return img_scale, None

    @staticmethod
    def random_sample_ratio(img_scale, ratio_range):
        assert isinstance(img_scale, tuple) and len(img_scale) == 2
        min_ratio, max_ratio = ratio_range
        assert min_ratio <= max_ratio
        ratio = np.random.random_sample() * (max_ratio - min_ratio) + min_ratio
        scale = int(img_scale[0] * ratio), int(img_scale[1] * ratio)
        return scale, None

    def _random_scale(self, results):
        if self.ratio_range is not None:
            scale, scale_idx = self.random_sample_ratio(
                self.img_scale[0], self.ratio_range)
        elif len(self.img_scale) == 1:
            scale, scale_idx = self.img_scale[0], 0
        elif self.multiscale_mode == 'range':
            scale, scale_idx = self.random_sample(self.img_scale)
        elif self.multiscale_mode == 'value':
            scale, scale_idx = self.random_select(self.img_scale)
        else:
            raise NotImplementedError

        results['scale'] = scale
        results['scale_idx'] = scale_idx

    def _resize_img(self, results):
        if results['concat']:
            img_raw, img_temp = results['img'][:, :, :3], results['img'][:, :, 3:]
            if self.keep_ratio:
                img_raw, scale_factor = mmcv.imrescale(
                    img_raw, results['scale'], return_scale=True)
                img_temp, scale_factor = mmcv.imrescale(
                    img_temp, results['scale'], return_scale=True)
            else:
                img_raw, w_scale, h_scale = mmcv.imresize(
                    img_raw, results['scale'], return_scale=True)
                img_temp, w_scale, h_scale = mmcv.imresize(
                    img_temp, results['scale'], return_scale=True)
                scale_factor = np.array([w_scale, h_scale, w_scale, h_scale],
                                        dtype=np.float32)
            results['img'] = np.concatenate([img_raw, img_temp], axis=2)
            results['img_shape'] = img_raw.shape
            results['pad_shape'] = img_raw.shape  # in case that there is no padding
            results['scale_factor'] = scale_factor
            results['keep_ratio'] = self.keep_ratio
        else:
            if self.keep_ratio:
                img, scale_factor = mmcv.imrescale(
                    results['img'], results['scale'], return_scale=True)
            else:
                img, w_scale, h_scale = mmcv.imresize(
                    results['img'], results['scale'], return_scale=True)
                scale_factor = np.array([w_scale, h_scale, w_scale, h_scale],
                                        dtype=np.float32)
            results['img'] = img
            results['img_shape'] = img.shape
            results['pad_shape'] = img.shape  # in case that there is no padding
            results['scale_factor'] = scale_factor
            results['keep_ratio'] = self.keep_ratio

    def _resize_bboxes(self, results):
        img_shape = results['img_shape']
        for key in results.get('bbox_fields', []):
            bboxes = results[key] * results['scale_factor']
            bboxes[:, 0::2] = np.clip(bboxes[:, 0::2], 0, img_shape[1] - 1)
            bboxes[:, 1::2] = np.clip(bboxes[:, 1::2], 0, img_shape[0] - 1)
            results[key] = bboxes

    def _resize_masks(self, results):
        for key in results.get('mask_fields', []):
            if results[key] is None:
                continue
            if self.keep_ratio:
                masks = [
                    mmcv.imrescale(
                        mask, results['scale_factor'], interpolation='nearest')
                    for mask in results[key]
                ]
            else:
                mask_size = (results['img_shape'][1], results['img_shape'][0])
                masks = [
                    mmcv.imresize(mask, mask_size, interpolation='nearest')
                    for mask in results[key]
                ]
            results[key] = masks

    def _resize_seg(self, results):
        for key in results.get('seg_fields', []):
            if self.keep_ratio:
                gt_seg = mmcv.imrescale(
                    results[key], results['scale'], interpolation='nearest')
            else:
                gt_seg = mmcv.imresize(
                    results[key], results['scale'], interpolation='nearest')
            results['gt_semantic_seg'] = gt_seg

    def __call__(self, results):
        if 'scale' not in results:
            self._random_scale(results)
        self._resize_img(results)
        self._resize_bboxes(results)
        self._resize_masks(results)
        self._resize_seg(results)
        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += ('(img_scale={}, multiscale_mode={}, ratio_range={}, '
                     'keep_ratio={})').format(self.img_scale,
                                              self.multiscale_mode,
                                              self.ratio_range,
                                              self.keep_ratio)
        return repr_str


@PIPELINES.register_module
class RandomFlip(object):
    """Flip the image & bbox & mask.

    If the input dict contains the key "flip", then the flag will be used,
    otherwise it will be randomly decided by a ratio specified in the init
    method.

    Args:
        flip_ratio (float, optional): The flipping probability.
    """

    def __init__(self, flip_ratio=None, direction='horizontal'):
        self.flip_ratio = flip_ratio
        self.direction = direction
        if flip_ratio is not None:
            assert flip_ratio >= 0 and flip_ratio <= 1
        assert direction in ['horizontal', 'vertical']

    def bbox_flip(self, bboxes, img_shape, direction):
        """Flip bboxes horizontally.

        Args:
            bboxes(ndarray): shape (..., 4*k)
            img_shape(tuple): (height, width)
        """
        assert bboxes.shape[-1] % 4 == 0
        flipped = bboxes.copy()
        if direction == 'horizontal':
            w = img_shape[1]
            flipped[..., 0::4] = w - bboxes[..., 2::4] - 1
            flipped[..., 2::4] = w - bboxes[..., 0::4] - 1
        elif direction == 'vertical':
            h = img_shape[0]
            flipped[..., 1::4] = h - bboxes[..., 3::4] - 1
            flipped[..., 3::4] = h - bboxes[..., 1::4] - 1
        else:
            raise ValueError(
                'Invalid flipping direction "{}"'.format(direction))
        return flipped

    def __call__(self, results):
        if 'flip' not in results:
            flip = True if np.random.rand() < self.flip_ratio else False
            results['flip'] = flip
        if 'flip_direction' not in results:
            results['flip_direction'] = self.direction
        if results['flip']:
            # flip image
            results['img'] = mmcv.imflip(
                results['img'], direction=results['flip_direction'])
            # flip bboxes
            for key in results.get('bbox_fields', []):
                results[key] = self.bbox_flip(results[key],
                                              results['img_shape'],
                                              results['flip_direction'])
            # flip masks
            for key in results.get('mask_fields', []):
                results[key] = [
                    mmcv.imflip(mask, direction=results['flip_direction'])
                    for mask in results[key]
                ]

            # flip segs
            for key in results.get('seg_fields', []):
                results[key] = mmcv.imflip(
                    results[key], direction=results['flip_direction'])
        return results

    def __repr__(self):
        return self.__class__.__name__ + '(flip_ratio={})'.format(
            self.flip_ratio)


@PIPELINES.register_module
class Pad(object):
    """Pad the image & mask.

    There are two padding modes: (1) pad to a fixed size and (2) pad to the
    minimum size that is divisible by some number.

    Args:
        size (tuple, optional): Fixed padding size.
        size_divisor (int, optional): The divisor of padded size.
        pad_val (float, optional): Padding value, 0 by default.
    """

    def __init__(self, size=None, size_divisor=None, pad_val=0):
        self.size = size
        self.size_divisor = size_divisor
        self.pad_val = pad_val
        # only one of size and size_divisor should be valid
        assert size is not None or size_divisor is not None
        assert size is None or size_divisor is None

    def _pad_img(self, results):
        if results['concat']:
            img_raw, img_temp = results['img'][:, :, :3], results['img'][:, :, 3:]
            if self.size is not None:
                padded_img_raw = mmcv.impad(img_raw, self.size)
                padded_img_temp = mmcv.impad(img_temp, self.size)
            elif self.size_divisor is not None:
                padded_img_raw = mmcv.impad_to_multiple(
                    img_raw, self.size_divisor, pad_val=self.pad_val)
                padded_img_temp = mmcv.impad_to_multiple(
                    img_temp, self.size_divisor, pad_val=self.pad_val)
            results['img'] = np.concatenate([padded_img_raw, padded_img_temp], axis=2)
            results['pad_shape'] = padded_img_raw.shape
            results['pad_fixed_size'] = self.size
            results['pad_size_divisor'] = self.size_divisor
        else:
            if self.size is not None:
                padded_img = mmcv.impad(results['img'], self.size)
            elif self.size_divisor is not None:
                padded_img = mmcv.impad_to_multiple(
                    results['img'], self.size_divisor, pad_val=self.pad_val)
            results['img'] = padded_img
            results['pad_shape'] = padded_img.shape
            results['pad_fixed_size'] = self.size
            results['pad_size_divisor'] = self.size_divisor

    def _pad_masks(self, results):
        pad_shape = results['pad_shape'][:2]
        for key in results.get('mask_fields', []):
            padded_masks = [
                mmcv.impad(mask, pad_shape, pad_val=self.pad_val)
                for mask in results[key]
            ]
            if padded_masks:
                results[key] = np.stack(padded_masks, axis=0)
            else:
                results[key] = np.empty((0, ) + pad_shape, dtype=np.uint8)

    def _pad_seg(self, results):
        for key in results.get('seg_fields', []):
            results[key] = mmcv.impad(results[key], results['pad_shape'][:2])

    def __call__(self, results):
        self._pad_img(results)
        self._pad_masks(results)
        self._pad_seg(results)
        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += '(size={}, size_divisor={}, pad_val={})'.format(
            self.size, self.size_divisor, self.pad_val)
        return repr_str


@PIPELINES.register_module
class Normalize(object):
    """Normalize the image.

    Args:
        mean (sequence): Mean values of 3 channels.
        std (sequence): Std values of 3 channels.
        to_rgb (bool): Whether to convert the image from BGR to RGB,
            default is true.
    """

    def __init__(self, mean, std, to_rgb=True):
        self.mean = np.array(mean, dtype=np.float32)
        self.std = np.array(std, dtype=np.float32)
        self.to_rgb = to_rgb

    def __call__(self, results):
        if results['concat']:
            img_raw, img_temp = results['img'][:, :, :3], results['img'][:, :, 3:]
            img_raw = mmcv.imnormalize(img_raw, self.mean, self.std,
                                       self.to_rgb)
            img_temp = mmcv.imnormalize(img_temp, self.mean, self.std,
                                        self.to_rgb)
            results['img'] = np.concatenate([img_raw, img_temp], axis=2)
            results['img_norm_cfg'] = dict(
                mean=self.mean, std=self.std, to_rgb=self.to_rgb)
        else:
            results['img'] = mmcv.imnormalize(results['img'], self.mean, self.std,
                                              self.to_rgb)
            results['img_norm_cfg'] = dict(
                mean=self.mean, std=self.std, to_rgb=self.to_rgb)
        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += '(mean={}, std={}, to_rgb={})'.format(
            self.mean, self.std, self.to_rgb)
        return repr_str


@PIPELINES.register_module
class RandomCrop(object):
    """Random crop the image & bboxes & masks.

    Args:
        crop_size (tuple): Expected size after cropping, (h, w).
    """

    def __init__(self, crop_size):
        self.crop_size = crop_size

    def __call__(self, results):
        img = results['img']
        margin_h = max(img.shape[0] - self.crop_size[0], 0)
        margin_w = max(img.shape[1] - self.crop_size[1], 0)
        offset_h = np.random.randint(0, margin_h + 1)
        offset_w = np.random.randint(0, margin_w + 1)
        crop_y1, crop_y2 = offset_h, offset_h + self.crop_size[0]
        crop_x1, crop_x2 = offset_w, offset_w + self.crop_size[1]

        # crop the image
        img = img[crop_y1:crop_y2, crop_x1:crop_x2, :]
        img_shape = img.shape
        results['img'] = img
        results['img_shape'] = img_shape

        # crop bboxes accordingly and clip to the image boundary
        for key in results.get('bbox_fields', []):
            bbox_offset = np.array([offset_w, offset_h, offset_w, offset_h],
                                   dtype=np.float32)
            bboxes = results[key] - bbox_offset
            bboxes[:, 0::2] = np.clip(bboxes[:, 0::2], 0, img_shape[1] - 1)
            bboxes[:, 1::2] = np.clip(bboxes[:, 1::2], 0, img_shape[0] - 1)
            results[key] = bboxes

        # crop semantic seg
        for key in results.get('seg_fields', []):
            results[key] = results[key][crop_y1:crop_y2, crop_x1:crop_x2]

        # filter out the gt bboxes that are completely cropped
        if 'gt_bboxes' in results:
            gt_bboxes = results['gt_bboxes']
            valid_inds = (gt_bboxes[:, 2] > gt_bboxes[:, 0]) & (
                gt_bboxes[:, 3] > gt_bboxes[:, 1])
            # if no gt bbox remains after cropping, just skip this image
            if not np.any(valid_inds):
                return None
            results['gt_bboxes'] = gt_bboxes[valid_inds, :]
            if 'gt_labels' in results:
                results['gt_labels'] = results['gt_labels'][valid_inds]

            # filter and crop the masks
            if 'gt_masks' in results:
                valid_gt_masks = []
                for i in np.where(valid_inds)[0]:
                    gt_mask = results['gt_masks'][i][crop_y1:crop_y2,
                                                     crop_x1:crop_x2]
                    valid_gt_masks.append(gt_mask)
                results['gt_masks'] = valid_gt_masks

        return results

    def __repr__(self):
        return self.__class__.__name__ + '(crop_size={})'.format(
            self.crop_size)


@PIPELINES.register_module
class SegRescale(object):
    """Rescale semantic segmentation maps.

    Args:
        scale_factor (float): The scale factor of the final output.
    """

    def __init__(self, scale_factor=1):
        self.scale_factor = scale_factor

    def __call__(self, results):
        for key in results.get('seg_fields', []):
            if self.scale_factor != 1:
                results[key] = mmcv.imrescale(
                    results[key], self.scale_factor, interpolation='nearest')
        return results

    def __repr__(self):
        return self.__class__.__name__ + '(scale_factor={})'.format(
            self.scale_factor)


@PIPELINES.register_module
class PhotoMetricDistortion(object):
    """Apply photometric distortion to image sequentially, every transformation
    is applied with a probability of 0.5. The position of random contrast is in
    second or second to last.

    1. random brightness
    2. random contrast (mode 0)
    3. convert color from BGR to HSV
    4. random saturation
    5. random hue
    6. convert color from HSV to BGR
    7. random contrast (mode 1)
    8. randomly swap channels

    Args:
        brightness_delta (int): delta of brightness.
        contrast_range (tuple): range of contrast.
        saturation_range (tuple): range of saturation.
        hue_delta (int): delta of hue.
    """

    def __init__(self,
                 brightness_delta=32,
                 contrast_range=(0.5, 1.5),
                 saturation_range=(0.5, 1.5),
                 hue_delta=18):
        self.brightness_delta = brightness_delta
        self.contrast_lower, self.contrast_upper = contrast_range
        self.saturation_lower, self.saturation_upper = saturation_range
        self.hue_delta = hue_delta

    def __call__(self, results):
        img = results['img']
        # random brightness
        if random.randint(2):
            delta = random.uniform(-self.brightness_delta,
                                   self.brightness_delta)
            img += delta

        # mode == 0 --> do random contrast first
        # mode == 1 --> do random contrast last
        mode = random.randint(2)
        if mode == 1:
            if random.randint(2):
                alpha = random.uniform(self.contrast_lower,
                                       self.contrast_upper)
                img *= alpha

        # convert color from BGR to HSV
        img = mmcv.bgr2hsv(img)

        # random saturation
        if random.randint(2):
            img[..., 1] *= random.uniform(self.saturation_lower,
                                          self.saturation_upper)

        # random hue
        if random.randint(2):
            img[..., 0] += random.uniform(-self.hue_delta, self.hue_delta)
            img[..., 0][img[..., 0] > 360] -= 360
            img[..., 0][img[..., 0] < 0] += 360

        # convert color from HSV to BGR
        img = mmcv.hsv2bgr(img)

        # random contrast
        if mode == 0:
            if random.randint(2):
                alpha = random.uniform(self.contrast_lower,
                                       self.contrast_upper)
                img *= alpha

        # randomly swap channels
        if random.randint(2):
            img = img[..., random.permutation(3)]

        results['img'] = img
        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += ('(brightness_delta={}, contrast_range={}, '
                     'saturation_range={}, hue_delta={})').format(
                         self.brightness_delta, self.contrast_range,
                         self.saturation_range, self.hue_delta)
        return repr_str


@PIPELINES.register_module
class Expand(object):
    """Random expand the image & bboxes.

    Randomly place the original image on a canvas of 'ratio' x original image
    size filled with mean values. The ratio is in the range of ratio_range.

    Args:
        mean (tuple): mean value of dataset.
        to_rgb (bool): if need to convert the order of mean to align with RGB.
        ratio_range (tuple): range of expand ratio.
        prob (float): probability of applying this transformation
    """

    def __init__(self,
                 mean=(0, 0, 0),
                 to_rgb=True,
                 ratio_range=(1, 4),
                 seg_ignore_label=None,
                 prob=0.5):
        self.to_rgb = to_rgb
        self.ratio_range = ratio_range
        if to_rgb:
            self.mean = mean[::-1]
        else:
            self.mean = mean
        self.min_ratio, self.max_ratio = ratio_range
        self.seg_ignore_label = seg_ignore_label
        self.prob = prob

    def __call__(self, results):
        if random.uniform(0, 1) > self.prob:
            return results

        img, boxes = [results[k] for k in ('img', 'gt_bboxes')]

        h, w, c = img.shape
        ratio = random.uniform(self.min_ratio, self.max_ratio)
        expand_img = np.full((int(h * ratio), int(w * ratio), c),
                             self.mean).astype(img.dtype)
        left = int(random.uniform(0, w * ratio - w))
        top = int(random.uniform(0, h * ratio - h))
        expand_img[top:top + h, left:left + w] = img
        boxes = boxes + np.tile((left, top), 2).astype(boxes.dtype)

        results['img'] = expand_img
        results['gt_bboxes'] = boxes

        if 'gt_masks' in results:
            expand_gt_masks = []
            for mask in results['gt_masks']:
                expand_mask = np.full((int(h * ratio), int(w * ratio)),
                                      0).astype(mask.dtype)
                expand_mask[top:top + h, left:left + w] = mask
                expand_gt_masks.append(expand_mask)
            results['gt_masks'] = expand_gt_masks

        # not tested
        if 'gt_semantic_seg' in results:
            assert self.seg_ignore_label is not None
            gt_seg = results['gt_semantic_seg']
            expand_gt_seg = np.full((int(h * ratio), int(w * ratio)),
                                    self.seg_ignore_label).astype(gt_seg.dtype)
            expand_gt_seg[top:top + h, left:left + w] = gt_seg
            results['gt_semantic_seg'] = expand_gt_seg
        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += '(mean={}, to_rgb={}, ratio_range={}, ' \
                    'seg_ignore_label={})'.format(
                        self.mean, self.to_rgb, self.ratio_range,
                        self.seg_ignore_label)
        return repr_str


@PIPELINES.register_module
class MinIoURandomCrop(object):
    """Random crop the image & bboxes, the cropped patches have minimum IoU
    requirement with original image & bboxes, the IoU threshold is randomly
    selected from min_ious.

    Args:
        min_ious (tuple): minimum IoU threshold for all intersections with
        bounding boxes
        min_crop_size (float): minimum crop's size (i.e. h,w := a*h, a*w,
        where a >= min_crop_size).
    """

    def __init__(self, min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3):
        # 1: return ori img
        self.sample_mode = (1, *min_ious, 0)
        self.min_crop_size = min_crop_size

    def __call__(self, results):
        img, boxes, labels = [
            results[k] for k in ('img', 'gt_bboxes', 'gt_labels')
        ]
        h, w, c = img.shape
        while True:
            mode = random.choice(self.sample_mode)
            if mode == 1:
                return results

            min_iou = mode
            for i in range(50):
                new_w = random.uniform(self.min_crop_size * w, w)
                new_h = random.uniform(self.min_crop_size * h, h)

                # h / w in [0.5, 2]
                if new_h / new_w < 0.5 or new_h / new_w > 2:
                    continue

                left = random.uniform(w - new_w)
                top = random.uniform(h - new_h)

                patch = np.array(
                    (int(left), int(top), int(left + new_w), int(top + new_h)))
                overlaps = bbox_overlaps(
                    patch.reshape(-1, 4), boxes.reshape(-1, 4)).reshape(-1)
                if overlaps.min() < min_iou:
                    continue

                # center of boxes should inside the crop img
                center = (boxes[:, :2] + boxes[:, 2:]) / 2
                mask = ((center[:, 0] > patch[0]) * (center[:, 1] > patch[1]) *
                        (center[:, 0] < patch[2]) * (center[:, 1] < patch[3]))
                if not mask.any():
                    continue
                boxes = boxes[mask]
                labels = labels[mask]

                # adjust boxes
                img = img[patch[1]:patch[3], patch[0]:patch[2]]
                boxes[:, 2:] = boxes[:, 2:].clip(max=patch[2:])
                boxes[:, :2] = boxes[:, :2].clip(min=patch[:2])
                boxes -= np.tile(patch[:2], 2)

                results['img'] = img
                results['gt_bboxes'] = boxes
                results['gt_labels'] = labels

                if 'gt_masks' in results:
                    valid_masks = [
                        results['gt_masks'][i] for i in range(len(mask))
                        if mask[i]
                    ]
                    results['gt_masks'] = [
                        gt_mask[patch[1]:patch[3], patch[0]:patch[2]]
                        for gt_mask in valid_masks
                    ]

                # not tested
                if 'gt_semantic_seg' in results:
                    results['gt_semantic_seg'] = results['gt_semantic_seg'][
                        patch[1]:patch[3], patch[0]:patch[2]]
                return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += '(min_ious={}, min_crop_size={})'.format(
            self.min_ious, self.min_crop_size)
        return repr_str


@PIPELINES.register_module
class Corrupt(object):

    def __init__(self, corruption, severity=1):
        self.corruption = corruption
        self.severity = severity

    def __call__(self, results):
        results['img'] = corrupt(
            results['img'].astype(np.uint8),
            corruption_name=self.corruption,
            severity=self.severity)
        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += '(corruption={}, severity={})'.format(
            self.corruption, self.severity)
        return repr_str


@PIPELINES.register_module
class Albu(object):

    def __init__(self,
                 transforms,
                 bbox_params=None,
                 keymap=None,
                 update_pad_shape=False,
                 skip_img_without_anno=False):
        """
        Adds custom transformations from Albumentations lib.
        Please, visit `https://albumentations.readthedocs.io`
        to get more information.

        transforms (list): list of albu transformations
        bbox_params (dict): bbox_params for albumentation `Compose`
        keymap (dict): contains {'input key':'albumentation-style key'}
        skip_img_without_anno (bool): whether to skip the image
                                      if no ann left after aug
        """

        self.transforms = transforms
        self.filter_lost_elements = False
        self.update_pad_shape = update_pad_shape
        self.skip_img_without_anno = skip_img_without_anno

        # A simple workaround to remove masks without boxes
        if (isinstance(bbox_params, dict) and 'label_fields' in bbox_params
                and 'filter_lost_elements' in bbox_params):
            self.filter_lost_elements = True
            self.origin_label_fields = bbox_params['label_fields']
            bbox_params['label_fields'] = ['idx_mapper']
            del bbox_params['filter_lost_elements']

        self.bbox_params = (
            self.albu_builder(bbox_params) if bbox_params else None)
        self.aug = Compose([self.albu_builder(t) for t in self.transforms],
                           bbox_params=self.bbox_params)

        if not keymap:
            self.keymap_to_albu = {
                'img': 'image',
                'gt_masks': 'masks',
                'gt_bboxes': 'bboxes'
            }
        else:
            self.keymap_to_albu = keymap
        self.keymap_back = {v: k for k, v in self.keymap_to_albu.items()}

    def albu_builder(self, cfg):
        """Import a module from albumentations.
        Inherits some of `build_from_cfg` logic.

        Args:
            cfg (dict): Config dict. It should at least contain the key "type".
        Returns:
            obj: The constructed object.
        """
        assert isinstance(cfg, dict) and "type" in cfg
        args = cfg.copy()

        obj_type = args.pop("type")
        if mmcv.is_str(obj_type):
            obj_cls = getattr(albumentations, obj_type)
        elif inspect.isclass(obj_type):
            obj_cls = obj_type
        else:
            raise TypeError(
                'type must be a str or valid type, but got {}'.format(
                    type(obj_type)))

        if 'transforms' in args:
            args['transforms'] = [
                self.albu_builder(transform)
                for transform in args['transforms']
            ]

        return obj_cls(**args)

    @staticmethod
    def mapper(d, keymap):
        """
        Dictionary mapper.
        Renames keys according to keymap provided.

        Args:
            d (dict): old dict
            keymap (dict): {'old_key':'new_key'}
        Returns:
            dict: new dict.
        """
        updated_dict = {}
        for k, v in zip(d.keys(), d.values()):
            new_k = keymap.get(k, k)
            updated_dict[new_k] = d[k]
        return updated_dict

    def __call__(self, results):
        # dict to albumentations format
        results = self.mapper(results, self.keymap_to_albu)

        if 'bboxes' in results:
            # to list of boxes
            if isinstance(results['bboxes'], np.ndarray):
                results['bboxes'] = [x for x in results['bboxes']]
            # add pseudo-field for filtration
            if self.filter_lost_elements:
                results['idx_mapper'] = np.arange(len(results['bboxes']))

        results = self.aug(**results)

        if 'bboxes' in results:
            if isinstance(results['bboxes'], list):
                results['bboxes'] = np.array(
                    results['bboxes'], dtype=np.float32)

            # filter label_fields
            if self.filter_lost_elements:

                results['idx_mapper'] = np.arange(len(results['bboxes']))

                for label in self.origin_label_fields:
                    results[label] = np.array(
                        [results[label][i] for i in results['idx_mapper']])
                if 'masks' in results:
                    results['masks'] = [
                        results['masks'][i] for i in results['idx_mapper']
                    ]

                if (not len(results['idx_mapper'])
                        and self.skip_img_without_anno):
                    return None

        if 'gt_labels' in results:
            if isinstance(results['gt_labels'], list):
                results['gt_labels'] = np.array(results['gt_labels'])

        # back to the original format
        results = self.mapper(results, self.keymap_back)

        # update final shape
        if self.update_pad_shape:
            results['pad_shape'] = results['img'].shape

        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += '(transformations={})'.format(self.transformations)
        return repr_str

此代码为引入albumentations数据增强库进行增强

具体修改方式如下：添加dict(type=‘Albu’, transforms = [{“type”: ‘RandomRotate90’}])，其他的类似。

from .compose import Compose
from .formating import (Collect, ImageToTensor, ToDataContainer, ToTensor,
                        Transpose, to_tensor)
from .loading import LoadAnnotations, LoadImageFromFile, LoadProposals
from .test_aug import MultiScaleFlipAug
from .transforms import (Albu, Expand, MinIoURandomCrop, Normalize, Pad,
                         PhotoMetricDistortion, RandomCrop, RandomFlip, Resize,
                         SegRescale)


__all__ = [
    'Compose', 'to_tensor', 'ToTensor', 'ImageToTensor', 'ToDataContainer',
    'Transpose', 'Collect', 'LoadAnnotations', 'LoadImageFromFile',
    'LoadProposals', 'MultiScaleFlipAug', 'Resize', 'RandomFlip', 'Pad',
    'RandomCrop', 'Normalize', 'SegRescale', 'MinIoURandomCrop', 'Expand',
    'PhotoMetricDistortion', 'Albu'
]

具体修改方式如下：添加dict(type=‘Albu’, transforms = [{“type”: ‘RandomRotate90’}])，其他的类似。

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Albu', transforms = [{"type": 'RandomRotate90'}]),# 数据增强
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]

2、MMDetection自带数据增强

源码在mmdet/datasets/extra_aug.py里面，包括RandomCrop、brightness、contrast、saturation、ExtraAugmentation等等图像增强方法。
添加config位置是train_pipeline或test_pipeline这个地方（一般train进行增强而test不需要），例如数据增强RandomFlip，flip_ratio代表随机翻转的概率：

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

3、Bbox增强

albumentations数据增强（同上）
源码在mmdet/datasets/custom.py里面，增强源码为

def pre_pipeline(self, results):
        results['img_prefix'] = self.img_prefix
        results['seg_prefix'] = self.seg_prefix
        results['proposal_file'] = self.proposal_file
        results['bbox_fields'] = []
        results['mask_fields'] = []

5、配置文件增加的数据增强

configs/base/datasets/coco_detection.py 在train pipeline修改Data Augmentation在train

dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
# 在这里加albumentation的aug
albu_train_transforms = [
    dict(
        type='ShiftScaleRotate',
        shift_limit=0.0625,
        scale_limit=0.0,
        rotate_limit=0,
        interpolation=1,
        p=0.5),
    dict(
        type='RandomBrightnessContrast',
        brightness_limit=[0.1, 0.3],
        contrast_limit=[0.1, 0.3],
        p=0.2),
    dict(
        type='OneOf',
        transforms=[
            dict(
                type='RGBShift',
                r_shift_limit=10,
                g_shift_limit=10,
                b_shift_limit=10,
                p=1.0),
            dict(
                type='HueSaturationValue',
                hue_shift_limit=20,
                sat_shift_limit=30,
                val_shift_limit=20,
                p=1.0)
        ],
        p=0.1),
    dict(type='JpegCompression', quality_lower=85, quality_upper=95, p=0.2),
    dict(type='ChannelShuffle', p=0.1),
    dict(
        type='OneOf',
        transforms=[
            dict(type='Blur', blur_limit=3, p=1.0),
            dict(type='MedianBlur', blur_limit=3, p=1.0)
        ],
        p=0.1),
]
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    #据说这里改img_scale即可多尺度训练，但是实际运行报错。
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='Pad', size_divisor=32),
    dict(
        type='Albu',
        transforms=albu_train_transforms,
        bbox_params=dict(
            type='BboxParams',
            format='pascal_voc',
            label_fields=['gt_labels'],
            min_visibility=0.0,
            filter_lost_elements=True),
        keymap={
            'img': 'image',
            'gt_masks': 'masks',
            'gt_bboxes': 'bboxes'
        },
]
# 测试的pipeline
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        # 多尺度测试 TTA在这里修改，注意有些模型不支持多尺度TTA，比如cascade_mask_rcnn，若不支持会提示
        # Unimplemented Error
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
# 包含batch_size, workers和路径。
# 路径如果按照上面的设置好就不需要更改
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline))
evaluation = dict(interval=1, metric='bbox')

6、为什么图像和Bbox需要进行数据增强

答：因为数据多了就可以尽可能多的学习到图像中的不变性，学习到的不变性越多那么模型的泛化能力越强。

但是输入到CNN中的图像为什么不具有平移不变性？如何去解决？下面链接有专门的解析：
输入到CNN中的图像为什么不具有平移不变性？如何去解决？

7、albumentations库方法

Albumentations 是一个用于图像增强的 Python 库。图像增强用于深度学习和计算机视觉任务，以提高训练模型的质量。图像增强的目的是从现有数据中创建新的训练样本。

下面是一个示例，说明如何应用 Albumentations 中的一些像素级增强功能从原始图像创建新图像：
在这里插入图片描述

1、像素级变换

模糊变换

__all__ = ["Blur", "MotionBlur", "GaussianBlur", "GlassBlur", "AdvancedBlur", "MedianBlur", "Defocus", "ZoomBlur"]

AdvancedBlur：高级模糊
Blur：模糊
GaussianBlur：高斯模糊
GlassBlur：玻璃模糊
Defocus：失焦
MedianBlur：中值模糊
MotionBlur：运动模糊
ZoomBlur：变焦模糊

噪声变换

GaussNoise：高斯噪声
ISONoise：ISO噪声
MultiplicativeNoise：乘性噪声

图像转换

ToFloat：转换为浮点数
ToGray：转换为灰度图
ToRGB：转换为RGB
ToSepia：转换为棕褐色

随机变换

RandomFog：随机雾化

RandomFog 算法原理与应用场景
算法原理
RandomFog 是一种用于图像增强的算法，它通过模拟图像中的雾气效果来增强图像数据。以下是其主要参数及作用：

fog_coef_range (tuple): 雾气强度系数的范围。该参数定义了雾气的浓度区间，默认值为 (0.3, 1)。在这个范围内随机选择一个值来生成雾气效果。
alpha_coef (float): 雾圈的透明度。该参数定义了雾圈的透明度，取值范围为 [0, 1]，默认值为 0.08。
p (float): 应用该变换的概率。默认值为 0.5，表示有 50% 的几率会对图像应用雾气效果。
应用场景
自动驾驶：模拟不同天气条件下的驾驶场景，增强训练数据集的多样性，提高模型在实际复杂环境下的鲁棒性。
计算机视觉：在图像识别、目标检测等任务中，通过模拟雾天条件下的图像，帮助模型学习在不同能见度条件下识别物体。
图像处理：在图像编辑软件中，可以作为一种特效应用于照片，增加艺术效果。
参考
该算法的实现可以参考以下开源库：

Automold--Road-Augmentation-Library

在这里插入图片描述

RandomGamma：随机伽玛（调整图像的亮度）

原理
RandomGamma算法通过在指定范围内随机选择一个gamma值并应用gamma校正来调整图像的亮度。gamma校正可以模拟各种键盘条件，从而增强模型的化能力。以下是其主要参数及作用：

gamma_limit (Union[int, tuple[int, int]]) : gamma 调整的范围。如果gamma_limit是单个整数，则范围解释为 (-gamma_limit, gamma_limit)，定义了图像 gamma 调整的幅度默认。值为 (80, 120)。
always_apply : 已废弃的参数，建议使用p=1代替。
p (float) : 应用该变换的概率。默认值为 0.5，表示有 50% 的可视化图像应用 gamma 校正。

在这里插入图片描述

RandomGravel：随机碎石

RandomGravel算法通过在图像上添加随机生成的突发石块来实现数据增强。该算法的核心参数包括：

gravel_roi (tuple) : 抢石区域的坐标范围 (左上角 x, 左上角 y, 右下角 x, 右下角 y)，坐标值应在 [0, 1] 范围内。
number_of_patches (int) : 所需的消耗石块数量。
该算法的具体实现步骤如下：

定义烧烤石区域：通过gravel_roi参数指定烧烤石添加的区域，该区域由图像的左上角和右下角坐标确定。
指定竞猜石数量：通过number_of_patches参数指定生成的竞猜石数量。
阵势生成石效果：在指定区域内随机生成阵势石块，将其集中到原图像上。

在这里插入图片描述

RandomRain：随机降雨
RandomShadow：随机阴影
RandomSnow：随机降雪
RandomSunFlare：随机太阳光斑
RandomToneCurve：随机色调曲线

CLAHE（对比度受限的自适应直方图均衡化）

在这里插入图片描述

Equalize

在这里插入图片描述

RandomBrightnessContrast：随机亮度对比度

原理
该算法通过对图像的亮度和制动进行随机调整

Brightness_limit (tuple or float)：亮度调整的范围。如果为单个浮点数，则范围为(-brightness_limit, Brightness_limit)。
对比限制（tuple或float）：调整调整的范围。如果为单个浮点数，则范围是（-contrast_limit，contrast_limit）。
Brightness_by_max (bool)：是否按最大像素值进行亮度调整。
p (float)：应用此变换概率。
具体步骤如下：

亮度调整：通过将选择每个像素的值加上一个随机的亮度调整量来调整图像的亮度。
扭矩调整：通过将每个像素的值乘以一个随机选择的扭矩系数来调整图像的扭矩。
应用概率：根据设定的概率决定应用是否进行此变换。

在这里插入图片描述

Emboss: 浮雕

基于形态学的一种数据增强，压印输入图像，并将结果与原始图像重叠。对图像执行某一程度浮雕操作，通过某一通道将结果与图像融合。其目的和锐化融合一个道理也是一种不错的数据增强的方法。
在这里插入图片描述

Sharpen：锐化

ChannelDropout：通道丢弃

ChannelShuffle：通道混洗

InvertImg：图像反转

Solarize：太阳化

Superpixels：超像素

HistogramMatching：直方图匹配

ImageCompression：图像压缩

2、空间级变换

空间级变换将同时改变输入图像以及其他目标，如蒙版、边界框和关键点。下表显示了每个转换支持的附加目标。
在这里插入图片描述

Flip

VerticalFlip、HorizontalFlip、Flip

Resize

"RandomScale", "LongestMaxSize", "SmallestMaxSize", "Resize"

Rotate

"Rotate", "RandomRotate90", "SafeRotate"

ShiftScaleRotate

Affine - 仿射变换

PiecewiseAffine - 分段仿射变换

ElasticTransform - 弹性变换

Perspective - 透视变换

Transpose - 转置

OpticalDistortion - 光学畸变

GridDistortion - 网格畸变

PadIfNeeded - 需要时填充

CoarseDropout

GridDropout

ChannelDropout

crop

RandomCrop - 随机裁剪

CenterCrop - 中心裁剪

Crop - 裁剪

CropNonEmptyMaskIfExists - 裁剪非空掩码（如果存在）

RandomSizedCrop - 随机大小裁剪

RandomResizedCrop - 随机调整大小裁剪

RandomCropNearBBox - 在边界框附近随机裁剪

RandomSizedBBoxSafeCrop - 安全随机大小边界框裁剪

CropAndPad - 裁剪并填充

RandomCropFromBorders - 从边界随机裁剪

BBoxSafeRandomCrop - 安全随机裁剪边界框

3、Mixing-level transforms

在这里插入图片描述

qq_41627642

关注

6
点赞
踩
37

收藏

觉得还不错? 一键收藏
0
评论
MMDetection 系列之（自定义数据管道和引入Albumentations库处理增强管道）（四）

数据管道设计遵循典型约定，我们使用Dataset和DataLoader对多个worker进行数据加载。数据集返回与模型的forward方法的参数相对应的数据项字典。由于目标检测中的数据可能不相同大小(图像大小，gt box大小等)，我们在MMCV中引入了一个新的DataContainer类型，以帮助收集和分发不同大小的数据。查看这里了解更多细节。对数据准备管道和数据集进行分解。数据集通常定义如何处理注释，数据管道定义准备数据字典的所有步骤。管道由一系列操作组成。每个操作都接受一个字典作为输入，并输出一个
复制链接

扫一扫