【YOLO v4 相关理论】Data augmentation: MixUp、Random Erasing、CutOut、CutMix、Mosic

满船清梦压星河HK

已于 2022-04-09 10:15:49 修改

阅读量2.2k

点赞数 6

分类专栏： # 相关理论文章标签：数据增强图像分类目标检测 YOLO v4

于 2021-05-12 15:37:58 首次发布

本文链接：https://blog.csdn.net/qq_38253797/article/details/116668074

版权

一、图像裁剪类

1.1、MixUp

来源：

https://arxiv.org/pdf/1710.09412.pdf

1.1.1、背景理论

Mixup是MIT和FAIR在ICLR 2018上发表的文章中提到的一种数据增强算法。在介绍mixup之前，我们首先简单了解两个概念：经验风险最小化（Empirical risk minimization，ERM）和邻域风险最小化（Vicinal Risk Minimization，VRM）。

经验风险最小化（ERM）：“经验风险最小化”是目前大多数网络优化都遵循的一个原则，即使用已知的经验数据（训练样本）训练得到的学习器的误差或风险，也叫作“经验误差”或“训练误差”。相对的，在新样本（未知样本）上的误差称为“泛化误差”，显然，我们希望学习器的“泛化误差”越小越好。然而，通常我们事先并不知道新样本是什么样的，实际能做的是努力使经验误差越小越好。但是，过分的减小经验误差，通常会在未知样本上产生很差的结果，也就是我们常说的“过拟合”。

为了提高模型泛化性（模型在验证集的表现能力），通常可以通过使用大规模训练数据来提高，但是实际上，获取有标签的大规模数据需要耗费巨大的人工成本，甚至有些情况下根本无法获取数据。解决这个问题的一个有效途径是“邻域风险最小化”，即通过先验知识构造训练样本的邻域值。一般的做法就是传统的数据增强方法，比如加噪、翻转、缩放等，但是这种做法很依赖于特定的数据集和人类的先验知识。

Mixup是一种一般性（不针对特定数据集）的邻域分布方式。

更多理论学习理解: link.

1.1.2、算法介绍

算法非常简单：

图片：\overline{X}=\lambda x_i + (1-\lambda)x_j

标签：\overline{y}=\lambda y_i + (1-\lambda)y_j

其中，

x_i, y_i )

和

x_j, y_j )

是从原始训练数据中随机选取的两个样本，

\lambda∈ [0, 1]

。

\lambda

是mixup的超参数，控制两个样本插值的强度，当

\lambda→ 0

或

\lambda→ 1

时，则退化到了ERM的情况。

1.1.3、代码

下面的代码是我在论文作者的源码上进行修改的: 源码.
目标检测的MixUp:link.

def mixup(batch_x, batch_y, alpha=1, use_cuda=False):
    '''Returns mixed inputs, pairs of targets, and lambda'''
    if alpha > 0:
        lam = np.random.beta(alpha, alpha)
        print("mixup")
    else:
        lam = 1

    batch_size = len(batch_x)
    if use_cuda:
        index = torch.randperm(batch_size).cuda()
    else:
        index = torch.randperm(batch_size)

    mixed_x = lam * batch_x + (1 - lam) * batch_x[index, :]
    y_a, y_b = batch_y, batch_y[index]
    return mixed_x, y_a, y_b, lam


def mixup_criterion(criterion, pred, y_a, y_b, lam):
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)


if __name__ == "__main__":
# 1、mixup_data
    transform = transforms.Compose([
        transforms.RandomResizedCrop(300),
        transforms.ToTensor(),
    ])
    train_root = "F:\\yolov4\\module\\data\\flowers"
    trainset = datasets.ImageFolder(root=train_root, transform=transform)

    trainloader = torch.utils.data.DataLoader(trainset,
                                              batch_size=2,
                                              shuffle=True)
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        img1 = inputs[0].numpy().transpose([1, 2, 0])
        mixed_x, _, _, _ = mixup(inputs, targets)
        mixed_img1 = mixed_x[0, :].numpy().transpose([1, 2, 0])
        mixed_img1 = cv2.cvtColor(mixed_img1, cv2.COLOR_BGR2RGB)
        mixed_img2 = mixed_x[1, :].numpy().transpose([1, 2, 0])
        mixed_img2 = cv2.cvtColor(mixed_img2, cv2.COLOR_BGR2RGB)
        cv2.imshow('img1', mixed_img2)
        cv2.waitKey(0)

train.py

...
inputs, targets_a, targets_b, lam = mixup_data(inputs, targets, args.alpha, use_cuda)
#映射为Variable
inputs, targets_a, targets_b = map(Variable, (inputs,targets_a,targets_b))
outputs = net(inputs)
loss = mixup_criterion(criterion, outputs, targets_a, targets_b, lam)
train_loss += loss.data[0]
_, predicted = torch.max(outputs.data, 1)
total += targets.size(0)
correct += (lam * predicted.eq(targets_a.data).cpu().sum().float()
                    + (1 - lam) * predicted.eq(targets_b.data).cpu().sum().float()) 
...

1.1.4、结果

后面更
在这里插入图片描述

1.2、Random Erasing

1.2.1、理论

随机擦除（Random Erasing, RE）增强，目的主要是模拟遮挡，从而提高模型泛化能力，这种操作其实非常有意义的模拟现实场景。因为我把物体遮挡一部分后依然能够分类正确，那么肯定会迫使网络利用局部未遮挡的数据进行识别，加大了训练难度，一定程度会提高泛化能力。其也可以被视为add noise的一种，并且与随机裁剪、随机水平翻转具有一定的互补性，综合应用他们，可以取得更好的模型表现，尤其是对噪声和遮挡具有更好的鲁棒性(健壮性,模型克服不利条件的能力)。

具体操作就是：训练模型时，随机选取一个图片的矩形区域，将这个矩形区域的像素值用随机值或者平均像素值代替，产生局部遮挡的效果。

RE的随机性包含5个方面：

对于图像 I，随机擦除与否是以一定概率p进行的；
矩形框与图像 I 的面积比例是随机的；
矩形框的位置（如左上角坐标或中心坐标）是随机的；
矩形框的宽高比是随机的；
填充值是随机的，其范围为[ 0 , 255 ]。

1.2.2、代码

下面的代码是我在论文作者的源码上进行修改的: 源码.

import random
import math
import cv2
import numpy as np


class RandomErasing:
    """Random erasing the an rectangle region in Image.
    Class that performs Random Erasing in Random Erasing Data Augmentation by Zhong et al.
    Args:
        sl: min erasing area region
        sh: max erasing area region
        r1: random erasing region的最小 长宽比
        p: 执行random erasing的可能性
    """

    def __init__(self, p=0.5, sl=0.02, sh=0.4, r1=0.3):
        self.p = p
        self.s = (sl, sh)  # (0.02, 0.4)
        self.r = (r1, 1 / r1)

    def __call__(self, img):
        """
        perform random erasing
        Args:
            img: opencv numpy array in form of [w, h, c] range
                 from [0, 255]
        Returns:
            erased img
        """
        assert len(img.shape) == 3, 'image should be a 3 dimension numpy array'
        if random.random() > self.p:
            # 1、对于图像 I，随机擦除与否是以一定概率p进行的
            return img
        else:
            while True:
                # random.uniform(*self.s)从(0.02, 0.4)随机生成一个实数
                # 2、矩形框与图像 I 的面积比例是随机的
                Se = random.uniform(*self.s) * img.shape[0] * img.shape[1]

                # 3、矩形框的宽高比是随机的
                re = random.uniform(*self.r)

                He = int(round(math.sqrt(Se * re)))  # 根据随机生成的面积和宽高比生成生成高宽
                We = int(round(math.sqrt(Se / re)))

                # 4、random erasing region的位置（左上角坐标）是随机的
                xe = random.randint(0, img.shape[1])
                ye = random.randint(0, img.shape[0])

                if xe + We <= img.shape[1] and ye + He <= img.shape[0]:
                    # 填充值是随机的，其范围为[ 0 , 255 ]
                    img[ye: ye + He, xe: xe + We, :] = np.random.randint(low=0, high=255, size=(He, We, img.shape[2]))
                    return img

if __name__ == "__main__":
    img = cv2.imread("test.jpeg")
    RE = RandomErasing(p=0.7)
    for i in range(20):
        img1 = RE(img.copy())
        cv2.imshow("test", img1)
        cv2.waitKey(1000)

1.2.3、结果

在这里插入图片描述

二、图像混叠类

图像混叠主要对 Batch 后的数据进行混合。是对两幅图像进行融合，生成一幅图像。

2.1、CutOut

2.1.1、理论

Cutout是一种新的正则化方法。Cutout的出发点和随机擦除一样，也是模拟遮挡，目的是提高泛化能力和鲁棒性。

实现：随机选择一个固定大小的正方形区域，然后采用全0填充就OK了，当然为了避免填充0值对训练的影响，应该要对数据进行中心归一化操作，norm到0。

注意：

作者发现cutout区域的大小比形状重要，所以cutout只要是正方形就行，非常简单。
擦除矩形区域存在一定概率不完全在原图像中的（文中设置为50%）

Cutout为什么能有意义：
直接引用论文中的原话就是This technique encourages the network to better utilize the full context of the image, rather than relying on the presence of a small set of specific visual features.简单点说就是，CutOut能够让CNN利用整幅图像的全局信息，而不是一些小特征组成的局部信息。其实这种思想和大部分细粒度论文的思想是类似的。

2.1.2、代码

原论文代码：

https://github.com/uoguelph-mlrg/Cutout

在源码基础上自己实现：

class Cutout(object):
    """Randomly mask out one or more patches from an image.
    Args:
        n_holes (int): Number of patches to cut out of each image.
        length (int): The length (in pixels) of each square patch.
    """
    def __init__(self, n_holes, length, fill_value):
        self.n_holes = n_holes
        self.length = length
        self.fill_value= fill_value

    def __call__(self, img):
        """
        Args:
            img (Tensor): Tensor image of size (C, H, W).
        Returns:
            Tensor: Image with n_holes of dimension length x length cut out of it.
        """
        h = img.size(1)
        w = img.size(2)

        mask = np.ones((h, w), np.float32)

        for n in range(self.n_holes):
            y = np.random.randint(h)
            x = np.random.randint(w)

            y1 = np.clip(y - self.length // 2, 0, h)
            y2 = np.clip(y + self.length // 2, 0, h)
            x1 = np.clip(x - self.length // 2, 0, w)
            x2 = np.clip(x + self.length // 2, 0, w)

            mask[y1: y2, x1: x2] = self.fill_value

        mask = torch.from_numpy(mask)
        mask = mask.expand_as(img)
        img = img * mask

        return img
        
if __name__ == "__main__":
    # # 2、Cutout
    img = cv2.imread("test.jpeg")
    transform = Compose([
        transforms.ToTensor(),
        Cutout(n_holes=30, length=10, fill_value=0.)
    ])
    img2 = transform(img=img)
    img2 = img2.numpy().transpose([1, 2, 0])
    cv2.imshow("test", img2)
    cv2.waitKey(0)

2.1.3、结果

在这里插入图片描述

2.2、CutMix

YOLO v4论文中使用的就是CutMix

2.2.1、理论

论文地址

https://arxiv.org/abs/1905.04899v2

CutMix的处理方式也比较简单，同样也是对一对图片做操作，简单讲就是随机生成一个裁剪框Box,裁剪掉A图的相应位置，然后用B图片相应位置的ROI放到A图中被裁剪的区域形成新的样本，计算损失时同样采用加权求和的方式进行求解。

2.2.2、代码

论文源码: link.

根据源码写的自己的实现如下：

def cutmix_criterion(criterion, pred, y_a, y_b, lam):
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)

def rand_bbox(size, lam):
    """找到裁剪的区域位置（左上角和右下角）"""
    W = size[2]  # 图片的宽度
    H = size[3]  # 图片的高度
    cut_rat = np.sqrt(1. - lam)
    cut_w = np.int(W * cut_rat)  # 要裁剪的图片宽度
    cut_h = np.int(H * cut_rat)  # 要裁剪的图片高度

    # uniform
    cx = np.random.randint(W)    # 随机裁剪位置
    cy = np.random.randint(H)
    
    # 限制B坐标区域不超过样本大小
    bbx1 = np.clip(cx - cut_w // 2, 0, W)  # 左上角x
    bby1 = np.clip(cy - cut_h // 2, 0, H)  # 左上角y
    bbx2 = np.clip(cx + cut_w // 2, 0, W)  # 右下角x
    bby2 = np.clip(cy + cut_h // 2, 0, H)  # 右下角y

    return bbx1, bby1, bbx2, bby2  # 要裁剪区域（左上角和右下角）

def cutmix(batch_x, y, alpha=1.0, use_cuda=False):
    '''Returns mixed inputs, pairs of targets, and lambda
    args:
        batch_x: [batch_size, channel, w, h]
    '''
    assert alpha > 0
    lam = np.random.beta(alpha, alpha)  # 设定lamda的值，服从beta分布
    batch_size = batch_x.size()[0]

    if use_cuda:
        index = torch.randperm(batch_size).cuda()
    else:
        index = torch.randperm(batch_size)

    y_a, y_b = y, y[index]

    bbx1, bby1, bbx2, bby2 = rand_bbox(batch_x.size(), lam)
    batch_x[:, :, bbx1:bbx2, bby1:bby2] = batch_x[index, :, bbx1:bbx2, bby1:bby2]

    # adjust lambda to exactly match pixel ratio  随机lamda值
    lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (batch_x.size()[-1] * batch_x.size()[-2]))

    return batch_x, y_a, y_b, lam

if __name__ == "__main__":
    # 4、Cutmix
    transform = transforms.Compose([
        transforms.RandomResizedCrop((400, 400)),
        transforms.ToTensor(),
    ])
    train_root = "F:\\yolov4\\module\\data\\flowers"
    trainset = datasets.ImageFolder(root=train_root, transform=transform)

    trainloader = torch.utils.data.DataLoader(trainset,
                                              batch_size=2,
                                              shuffle=True)
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        img1 = inputs[0].numpy().transpose([1, 2, 0])
        mixed_x, _, _, _ = cutmix(inputs, targets)
        for i in range(2):
            img = cv2.cvtColor(mixed_x[i, :].numpy().transpose([1, 2, 0]), cv2.COLOR_BGR2RGB)
            cv2.imshow('img', img)
            cv2.waitKey(0)

train.py

...
for (inputs, targets) in tqdm(trainloader):
      inputs, targets = inputs.to(device), targets.to(device)

      r = np.random.rand(1)
      if r < 0.5: # 做cutmix的概率为0.5
          inputs, targets_a, targets_b, lam = cutmix_data(inputs, targets)
          inputs, targets_a, targets_b = map(Variable, (inputs, targets_a, targets_b))
          outputs = net(inputs)
          loss = cutmix_criterion(criterion, outputs, targets_a.long(), targets_b.long(), lam)
      else:
          outputs = net(inputs)
          loss = criterion(outputs, targets.long())
...