（附代码）全面解析语义分割数据增强：从基础到进阶的详细实现

深度学习浪人

已于 2024-12-29 11:36:42 修改

阅读量1.5k

点赞数 37

分类专栏：语义分割文章标签：人工智能

于 2024-12-28 10:54:59 首次发布

本文链接：https://blog.csdn.net/2201_76033400/article/details/144784286

版权

语义分割专栏收录该内容

5 篇文章

订阅专栏

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

前言

数据增强是一种扩展数据集规模、提高模型泛化能力的有效手段。本文将详细介绍语义分割任务中从基础到进阶的各种数据增强方法，结合完整代码说明每种方法的实现细节及其作用。

一、数据增强的意义与分类

1.1 为什么需要数据增强？

语义分割任务需要对每个像素点进行分类，对训练数据的多样性要求极高。然而，采集大规模分割数据集的成本昂贵，此时通过数据增强，可以：

扩展数据分布，模拟更多真实场景（如光照、旋转、噪声等变化）。
增强模型鲁棒性，减少模型对特定分布的依赖。
防止过拟合，特别是当数据有限时。

1.2 数据增强的分类

数据增强方法分为两类：

基础增强方法：简单、高效，如随机翻转、平移、旋转、亮度调整等。
进阶增强方法：复杂、多样，如 Mosaic、MixUp、Copy-Paste 等。

二、基础数据增强方法

2.1 随机翻转

作用：随机水平或垂直翻转图像和掩码，增强模型对不同方向目标的适应能力。

def random_flip(self, img, mask):
    """
    随机水平或垂直翻转。
    """
    if random.random() < 0.8:  # 80% 概率翻转
        flip_type = random.choice(['horizontal', 'vertical'])
        if flip_type == 'horizontal':
            img = img.transpose(Image.FLIP_LEFT_RIGHT)
            mask = mask.transpose(Image.FLIP_LEFT_RIGHT)
        elif flip_type == 'vertical':
            img = img.transpose(Image.FLIP_TOP_BOTTOM)
            mask = mask.transpose(Image.FLIP_TOP_BOTTOM)
    return img, mask

2.2 随机旋转

作用：随机旋转图像一定角度（如 90° 或 180°），提升模型对方向变化的鲁棒性。

def random_rotation(self, img, mask):
    """
    随机旋转 90° 或 180°。
    """
    if random.random() < 0.8:  # 80% 概率旋转
        angle = random.choice([90, 180])
        img = img.rotate(angle, expand=True)
        mask = mask.rotate(angle, expand=True)
    return img, mask

2.3 随机平移

作用：模拟目标位置的变化，增强模型对不同位置信息的鲁棒性。

def random_translation(self, img, mask):
    """
    随机平移图像和掩码。
    """
    if random.random() < 0.25:
        max_shift = 10  # 最大平移像素
        x_shift = random.randint(-max_shift, max_shift)
        y_shift = random.randint(-max_shift, max_shift)
        img = ImageChops.offset(img, x_shift, y_shift)
        mask = ImageChops.offset(mask, x_shift, y_shift)
    return img, mask

2.4 随机亮度调整

作用：调整图像亮度，增强模型在不同光照条件下的鲁棒性。

def random_brightness(self, img, mask):
    """
    随机亮度调整。
    """
    enhancer = ImageEnhance.Brightness(img)
    img_enhanced = enhancer.enhance(random.uniform(0.5, 1.5))
    return img_enhanced, mask

2.5 随机对比度调整

作用：调整图像对比度，增强模型对场景中对比度差异的适应能力。

def random_contrast(self, img, mask):
    """
    随机对比度调整。
    """
    enhancer = ImageEnhance.Contrast(img)
    img_enhanced = enhancer.enhance(random.uniform(0.5, 1.5))
    return img_enhanced, mask

2.6 随机椒盐噪声

作用：模拟传感器噪声或劣质数据，增强模型对噪声干扰的鲁棒性。

def salt_pepper_noise(self, img):
    """
    为图像添加椒盐噪声。
    """
    np_img = np.array(img)
    row, col, ch = np_img.shape
    s_vs_p = 0.5
    amount = random.uniform(0.04, 0.08)
    num_salt = int(amount * np_img.size * s_vs_p)
    num_pepper = int(amount * np_img.size * (1 - s_vs_p))

    # 添加盐噪声
    coords = [np.random.randint(0, i - 1, num_salt) for i in np_img.shape[:2]]
    np_img[coords[0], coords[1], :] = 255

    # 添加椒噪声
    coords = [np.random.randint(0, i - 1, num_pepper) for i in np_img.shape[:2]]
    np_img[coords[0], coords[1], :] = 0

    return Image.fromarray(np.uint8(np_img))

三、进阶数据增强方法

3.1 Mosaic 拼接

原理：从四张图像中裁剪随机区域拼接成新图像和掩码。

作用：模拟复杂目标分布场景，扩展数据多样性。

def mosaic_augmentation(self, idx, img, mask):
    """
    实现 Mosaic 数据增强。
    """
    indices = list(range(len(self.ids)))
    indices.remove(idx)
    mosaic_indices = random.sample(indices, 3)

    img_files = [img]
    mask_files = [mask]
    for i in mosaic_indices:
        name = self.ids[i]
        img_file = list(self.images_dir.glob(name + '.*'))
        mask_file = list(self.mask_dir.glob(name + self.mask_suffix + '.*'))
        img_files.append(load_image(img_file[0]))
        mask_files.append(load_image(mask_file[0]))

    imgs_cropped = []
    masks_cropped = []
    for im, ma in zip(img_files, mask_files):
        w, h = im.size
        left = random.randint(0, w // 2)
        upper = random.randint(0, h // 2)
        right = left + w // 2
        lower = upper + h // 2
        imgs_cropped.append(im.crop((left, upper, right, lower)))
        masks_cropped.append(ma.crop((left, upper, right, lower)))

    new_w = imgs_cropped[0].width + imgs_cropped[1].width
    new_h = imgs_cropped[0].height + imgs_cropped[2].height
    img_mosaic = Image.new('RGB', (new_w, new_h))
    mask_mosaic = Image.new('L', (new_w, new_h))

    img_mosaic.paste(imgs_cropped[0], (0, 0))
    img_mosaic.paste(imgs_cropped[1], (imgs_cropped[0].width, 0))
    img_mosaic.paste(imgs_cropped[2], (0, imgs_cropped[0].height))
    img_mosaic.paste(imgs_cropped[3], (imgs_cropped[0].width, imgs_cropped[0].height))

    mask_mosaic.paste(masks_cropped[0], (0, 0))
    mask_mosaic.paste(masks_cropped[1], (masks_cropped[0].width, 0))
    mask_mosaic.paste(masks_cropped[2], (0, masks_cropped[0].height))
    mask_mosaic.paste(masks_cropped[3], (masks_cropped[0].width, masks_cropped[0].height))

    return img_mosaic, mask_mosaic

3.2 垂直拼接

作用：从两张图片中分别裁剪上半部分和下半部分进行垂直拼接，增强垂直方向数据的多样性。

def vertical_concat_augmentation(self, idx, img, mask):
    """
    实现垂直拼接。
    """
    indices = list(range(len(self.ids)))
    indices.remove(idx)
    i = random.choice(indices)

    name = self.ids[i]
    img_file = list(self.images_dir.glob(name + '.*'))
    mask_file = list(self.mask_dir.glob(name + self.mask_suffix + '.*'))
    img2 = load_image(img_file[0])
    mask2 = load_image(mask_file[0])

    w, h = img.size
    img_cropped1 = img.crop((0, 0, w, h // 2))
    mask_cropped1 = mask.crop((0, 0, w, h // 2))
    img_cropped2 = img2.crop((0, h // 2, w, h))
    mask_cropped2 = mask2.crop((0, h // 2, w, h))

    new_h = img_cropped1.height + img_cropped2.height
    img_concat = Image.new('RGB', (w, new_h))
    mask_concat = Image.new('L', (w, new_h))

    img_concat.paste(img_cropped1, (0, 0))
    img_concat.paste(img_cropped2, (0, img_cropped1.height))
    mask_concat.paste(mask_cropped1, (0, 0))
    mask_concat.paste(mask_cropped2, (0, mask_cropped1.height))

    return img_concat, mask_concat

3.3 水平拼接

作用：从两张图片中分别裁剪左半部分和右半部分进行水平拼接，增强水平方向的数据多样性。

def horizontal_concat_augmentation(self, idx, img, mask):
    """
    实现水平拼接。
    """
    indices = list(range(len(self.ids)))
    indices.remove(idx)
    i = random.choice(indices)

    name = self.ids[i]
    img_file = list(self.images_dir.glob(name + '.*'))
    mask_file = list(self.mask_dir.glob(name + self.mask_suffix + '.*'))
    img2 = load_image(img_file[0])
    mask2 = load_image(mask_file[0])

    w, h = img.size
    img_cropped1 = img.crop((0, 0, w // 2, h))
    mask_cropped1 = mask.crop((0, 0, w // 2, h))
    img_cropped2 = img2.crop((w // 2, 0, w, h))
    mask_cropped2 = mask2.crop((w // 2, 0, w, h))

    new_w = img_cropped1.width + img_cropped2.width
    img_concat = Image.new('RGB', (new_w, h))
    mask_concat = Image.new('L', (new_w, h))

    img_concat.paste(img_cropped1, (0, 0))
    img_concat.paste(img_cropped2, (img_cropped1.width, 0))
    mask_concat.paste(mask_cropped1, (0, 0))
    mask_concat.paste(mask_cropped2, (mask_cropped1.width, 0))

    return img_concat, mask_concat

3.4 不规则拼接

作用：通过从两张图像中裁剪互补的不规则区域进行拼接，增强目标位置和形状的多样性。

def irregular_stitch_augmentation(self, idx, img, mask):
    """
    实现不规则拼接。
    """
    indices = list(range(len(self.ids)))
    indices.remove(idx)
    i = random.choice(indices)

    name = self.ids[i]
    img_file = list(self.images_dir.glob(name + '.*'))
    mask_file = list(self.mask_dir.glob(name + self.mask_suffix + '.*'))
    img2 = load_image(img_file[0])
    mask2 = load_image(mask_file[0])

    w, h = img.size
    initial_right = w // 2
    initial_lower = h // 2

    mask_region = Image.new('L', (w, h), 0)
    mask_draw = ImageDraw.Draw(mask_region)
    mask_draw.rectangle([0, 0, initial_right, initial_lower], fill=255)

    if random.random() < 0.5:
        x = random.randint(0, initial_right - 1)
        y = initial_lower - 1
    else:
        x = initial_right - 1
        y = random.randint(0, initial_lower - 1)

    dx = random.randint(0, w // 2)
    dy = random.randint(0, h // 2)
    extension_left = x
    extension_upper = y
    extension_right = min(x + dx, w)
    extension_lower = min(y + dy, h)

    mask_draw.rectangle([extension_left, extension_upper, extension_right, extension_lower], fill=255)

    img_combined = Image.composite(img, img2, mask_region)
    mask_combined = Image.composite(mask, mask2, mask_region)

    return img_combined, mask_combined

3.5 MixUp 数据增强

原理

MixUp 是一种将两张图像按权重比例进行像素级混合的数据增强方法，同时对两张图像的掩码也按同样比例混合。该方法增加了样本的多样性，并能提高模型的鲁棒性。

作用

提高模型的泛化能力，特别是对边界的平滑过渡区域。
减少对数据中硬边界标签的依赖。

实现代码：

def mixup_augmentation(self, idx, img, mask):
    """
    实现 MixUp 数据增强。
    """
    # 从数据集中随机选择另外一个索引
    indices = list(range(len(self.ids)))
    indices.remove(idx)
    mixup_idx = random.choice(indices)

    # 加载另一张图像和掩码
    name = self.ids[mixup_idx]
    img_file = list(self.images_dir.glob(name + '.*'))
    mask_file = list(self.mask_dir.glob(name + self.mask_suffix + '.*'))
    img2 = load_image(img_file[0])
    mask2 = load_image(mask_file[0])

    # 确保两张图像和掩码尺寸一致
    if img.size != img2.size:
        img2 = img2.resize(img.size, resample=Image.BICUBIC)
        mask2 = mask2.resize(mask.size, resample=Image.NEAREST)

    # 混合比例 λ
    lam = np.random.beta(5.0, 5.0)  # β分布生成权重

    # 对图像进行加权混合
    img1_np = np.array(img).astype(np.float32)
    img2_np = np.array(img2).astype(np.float32)
    mixed_img_np = lam * img1_np + (1 - lam) * img2_np
    mixed_img = Image.fromarray(mixed_img_np.astype(np.uint8))

    # 对掩码进行像素级混合
    mask1_np = np.array(mask).astype(np.int32)
    mask2_np = np.array(mask2).astype(np.int32)
    mixed_mask_np = np.where(np.random.rand(*mask1_np.shape) < lam, mask1_np, mask2_np)
    mixed_mask = Image.fromarray(mixed_mask_np.astype(np.uint8))

    return mixed_img, mixed_mask

3.6 随机 HSV 调整

原理

在 HSV 色彩空间中调整图像的色调（Hue）、饱和度（Saturation）和明度（Value），从而生成色彩变化的增强数据。

作用

提高模型对色彩变化的鲁棒性。
模拟不同的光照和颜色环境，提升模型的泛化能力。

实现代码：

def random_hsv_augmentation(self, img, mask):
    """
    实现随机 HSV 调整。
    """
    # 随机生成色调、饱和度和明度的调整因子
    hue_factor = random.uniform(-0.1, 0.1)  # 色调变化范围
    saturation_factor = random.uniform(0.8, 1.2)  # 饱和度变化范围
    brightness_factor = random.uniform(0.8, 1.2)  # 明度变化范围

    # 转换为 HSV 模式
    img = img.convert('HSV')
    np_img = np.array(img).astype(np.float32)

    # 调整色调
    np_img[..., 0] = (np_img[..., 0] + hue_factor * 255) % 255
    # 调整饱和度
    np_img[..., 1] = np.clip(np_img[..., 1] * saturation_factor, 0, 255)
    # 调整明度
    np_img[..., 2] = np.clip(np_img[..., 2] * brightness_factor, 0, 255)

    # 转换回 RGB 模式
    np_img = np_img.astype(np.uint8)
    img = Image.fromarray(np_img, mode='HSV').convert('RGB')

    return img, mask

3.7 随机透视变换

原理

透视变换是一种仿射变换，它对图像的四个角点进行随机偏移，从而模拟不同视角下拍摄的图像。

作用

提高模型对视角变化的鲁棒性。
模拟图像的透视失真效果。

实现代码

def random_perspective_augmentation(self, img, mask):
    """
    实现随机透视变换。
    """
    # 获取图像尺寸
    width, height = img.size
    max_shift = 0.2  # 最大偏移量

    # 定义原始四个角点
    original_corners = [(0, 0), (width, 0), (width, height), (0, height)]

    # 定义随机变换后的四个角点
    displacement = lambda: random.uniform(-max_shift, max_shift) * width
    distorted_corners = [
        (displacement(), displacement()),
        (width + displacement(), displacement()),
        (width + displacement(), height + displacement()),
        (displacement(), height + displacement())
    ]

    # 计算透视变换矩阵
    coeffs = self.find_perspective_transform(original_corners, distorted_corners)

    # 应用透视变换
    img = img.transform(img.size, Image.PERSPECTIVE, coeffs, Image.BICUBIC)
    mask = mask.transform(mask.size, Image.PERSPECTIVE, coeffs, Image.NEAREST)

    return img, mask

@staticmethod
def find_perspective_transform(pa, pb):
    """
    计算透视变换矩阵。
    """
    matrix = []
    for p1, p2 in zip(pb, pa):
        matrix.append([p1[0], p1[1], 1, 0, 0, 0, -p2[0] * p1[0], -p2[0] * p1[1]])
        matrix.append([0, 0, 0, p1[0], p1[1], 1, -p2[1] * p1[0], -p2[1] * p1[1]])
    A = np.array(matrix, dtype=np.float32)
    B = np.array(pa).reshape(8)
    res = np.linalg.lstsq(A, B, rcond=None)[0]
    return res

3.8 Copy-Paste 数据增强

原理

Copy-Paste 是一种增强方法，它从一张图像中复制目标区域，并将其粘贴到另一张图像中，同时合并掩码。

作用

增加目标区域的多样性和复杂性。
模拟真实场景中多个目标同时存在的情况。

实现代码：

def copy_paste_augmentation(self, idx, img, mask):
    """
    实现 Copy-Paste 数据增强。
    """
    # 从数据集中随机选择另外一个索引
    indices = list(range(len(self.ids)))
    indices.remove(idx)
    cp_idx = random.choice(indices)

    # 加载另一张图像和掩码
    name = self.ids[cp_idx]
    img_file = list(self.images_dir.glob(name + '.*'))
    mask_file = list(self.mask_dir.glob(name + self.mask_suffix + '.*'))
    img_cp = load_image(img_file[0])
    mask_cp = load_image(mask_file[0])

    # 确保尺寸一致
    if img.size != img_cp.size:
        img_cp = img_cp.resize(img.size, resample=Image.BICUBIC)
        mask_cp = mask_cp.resize(mask.size, resample=Image.NEAREST)

    # 随机选择对象区域（掩码中的非零值）
    mask_np = np.array(mask_cp)
    obj_ids = np.unique(mask_np[mask_np != 0])  # 非背景区域
    if len(obj_ids) == 0:
        return img, mask  # 如果没有对象，直接返回原始图像
    obj_id = random.choice(obj_ids)
    obj_mask = (mask_np == obj_id).astype(np.uint8) * 255
    obj_mask_img = Image.fromarray(obj_mask, mode='L')

    # 从源图像中裁剪对象
    img_cp_cropped = Image.composite(img_cp, Image.new('RGB', img_cp.size), obj_mask_img)

    # 随机选择粘贴位置
    paste_x = random.randint(0, img.width - img_cp.width)
    paste_y = random.randint(0, img.height - img_cp.height)

    # 粘贴到目标图像上
    img.paste(img_cp_cropped, (paste_x, paste_y), obj_mask_img)
    mask.paste(obj_mask_img, (paste_x, paste_y), obj_mask_img)

    return img, mask