数据增广：Mixup, Cutout 和 CutMix

最新推荐文章于 2023-06-08 11:46:27 发布

马鹤宁

最新推荐文章于 2023-06-08 11:46:27 发布

阅读量1.2k

点赞数

分类专栏：机器学习和深度学习之旅文章标签： python 深度学习计算机视觉数据增广

本文链接：https://blog.csdn.net/weixin_42111770/article/details/127740726

版权

机器学习和深度学习之旅专栏收录该内容

84 篇文章 35 订阅

订阅专栏

Mixup, Cutout 和 CutMix

文章目录

Mixup, Cutout 和 CutMix

Mixup

Mixup的实现很简单，单纯地将两张图片融合在一起，通过这种混合来增强模型的泛化性能。假设有两张图片 $x_{i}, y_{i})$ 和 $x_{j}, y_{j})$ , 其中 $x$ 表示图像， $y$ 是one-hot标签。 Mixup的核心公式如下所示，其中 $\lambda$ 服从beta分布， $\lambda \in [0,1]$ 。
$\tilde{x} = \lambda x_{i} + (1 - \lambda) x_{j} \\ \tilde{y} = \lambda y_{i} + (1 - \lambda) y_{j} \\$

#下述代码来自于mmclassification/mmcls/models/utils/augment/cutmix.py
def mixup(self, img, gt_label):
    # 将gt_label转换成one-hot编码方式
	one_hot_gt_label = one_hot_encoding(gt_label, self.num_classes)
	# lambda，服从beta分布
	lam = np.random.beta(self.alpha, self.alpha)
	# batch 中的图片打乱
	batch_size = img.size(0)
	index = torch.randperm(batch_size)
	# 图像混合
	mixed_img = lam * img + (1 - lam) * img[index, :]
	# 标签混合
	mixed_gt_label = lam * one_hot_gt_label + (
	    1 - lam) * one_hot_gt_label[index, :]
	
	return mixed_img, mixed_gt_label

Cutout

Cutout是卷积神经网络的一种最简单正则化技术，它去除输入图像的连续部分，有效地利用现有样本的部分遮挡来增强数据集。Cutout实现很简单，在训练时，将固定大小的零掩码（zero-mask）随机地应用在每一个epoch中的输入图像上。实验发现cutout的区域大小参数比形状参数更重要，因此，选择正方形作为cutout的区域应用于所有实验。当cutout应用于图像时，随机选择图像中的一个像素坐标作为中心点，然后在该位置周围放置cutout mask。允许并非cutout mask 的所有部分都包含在图像中。cutout论文发现允许部分patch位于图像边界外是cutout实现良好性能的关键。cutout的代码如下所示：

def cutout(img, shape, pad_val=0):
    """Randomly cut out a rectangle from the original img.
    Args:
        img (ndarray): Image to be cutout.
        shape (tuple[int]): Expected cutout shape (h, w).
        pad_val (float | tuple[int | float]): Values to be filled in the
            cut area. Defaults to 0.
    Returns:
        ndarray: The cutout image.
    """
    img_h, img_w = img.shape[:2]
    y0 = np.random.uniform(img_h)
    x0 = np.random.uniform(img_w)

    y1 = int(max(0, y0 - cut_h / 2.))
    x1 = int(max(0, x0 - cut_w / 2.))
    y2 = min(img_h, y1 + cut_h)
    x2 = min(img_w, x1 + cut_w)
    
    patch_shape = (y2 - y1, x2 - x1)

    img_cutout = img.copy()
    patch = np.array(
        pad_val, dtype=img.dtype) * np.ones(
            patch_shape, dtype=img.dtype)  # mask
    img_cutout[y1:y2, x1:x2, ...] = patch  

    return img_cutout

CutMix

假设 $\in \mathbb{R}^{W \times H \times C}$ 和 $y$ 分别表示一个训练图像和它的标签。CutMix通过合并两个训练样本 $\left( x_{A}, y_{A}\right)$ 和 $\left( x_{B}, y_{B}\right)$ 生成一个新的训练样本 $\left( \tilde{x}, \tilde{y} \right)$ 。新生成的训练样本被用来训练模型。用公式表示为：

$\begin{matrix} \tilde{x} = M \odot x_{A} + (1 - M) \odot x_{B} \\ \tilde{y} = \lambda y_{A} + (1-\lambda) y_{B} \qquad \quad \end{matrix}$

其中 $\in {0,1}^{W \times H}$ 是一个二进制掩码，指示两个图像中删除和填充的位置。 $1$ 是一个全1的二进制编码。 $\odot$ 表示按元素相乘。与Mixup相似，参数 $\lambda$ 属于beta分布 $Beta\left( \alpha, \alpha \right)$ 。在CutMix中，设置 $\alpha=1$ ，参数 $\lambda$ 服从均匀分布 $\lambda \sim U\left( 0, 1 \right)$ 。

# 下述代码来自于mmclassification/mmcls/models/utils/augment/cutmix.py
def cutmix(self, img, gt_label):
    # 将gt_label转换成one-hot编码
    one_hot_gt_label = one_hot_encoding(gt_label, self.num_classes)
    # 参数lambda 生成
    lam = np.random.beta(self.alpha, self.alpha)
    # 将0~n-1（包括0和n-1）随机打乱后获得的数字序列
    batch_size = img.size(0)
    index = torch.randperm(batch_size)

    # 生成mask  M
    (bby1, bby2, bbx1,
     bbx2), lam = self.cutmix_bbox_and_lam(img.shape, lam)
    # 填充
    img[:, :, bby1:bby2, bbx1:bbx2] = \
        img[index, :, bby1:bby2, bbx1:bbx2]
    # 分类结果按照一定比例分配
    mixed_gt_label = lam * one_hot_gt_label + (
        1 - lam) * one_hot_gt_label[index, :]
    return img, mixed_gt_label

要获得二进制掩码 $M$ ，首先获得裁剪区域的边界框坐标 $B=(r_{x}, r_{y}, r_{w}, r_{h})$ ，裁剪的区域 $B$ 在图像 $x_{A}$ 中移除，然后用从另一张图像 $x_{B}$ 中裁剪的区域 $B$ 进行填充。确保剪裁面积比为 $\frac{r_{w}r_{h}}{WH} = 1- \lambda$ ，依据以下公式对bbox边界坐标进行均匀采样。
$r_{x} \sim Unif(0,W) , r_{w} = W \sqrt{(1 - \lambda)} \\ r_{y} \sim Unif(0,H) , r_{h} = H \sqrt{(1 - \lambda)}$
得到剪裁区域之后，二进制掩码 $M$ 中相对应的bbox B 区域填充为0，其他区域为1。

# 下述代码来自于mmclassification/mmcls/models/utils/augment/cutmix.py
def rand_bbox(self, img_shape, lam, margin=0., count=None):
    """Standard CutMix bounding-box that generates a random square bbox
    based on lambda value. This implementation includes support for
    enforcing a border margin as percent of bbox dimensions.

    Args:
        img_shape (tuple): Image shape as tuple
        lam (float): Cutmix lambda value
        margin (float): Percentage of bbox dimension to enforce as margin
            (reduce amount of box outside image). Default to 0.
        count (int, optional): Number of bbox to generate. Default to None
    """
    ratio = np.sqrt(1 - lam)
    img_h, img_w = img_shape[-2:] # H,W
    cut_h, cut_w = int(img_h * ratio), int(img_w * ratio) # r_h, r_w 
    margin_y, margin_x = int(margin * cut_h), int(margin * cut_w)
    cy = np.random.randint(0 + margin_y, img_h - margin_y, size=count) # r_x
    cx = np.random.randint(0 + margin_x, img_w - margin_x, size=count) # r_y
    yl = np.clip(cy - cut_h // 2, 0, img_h)
    yh = np.clip(cy + cut_h // 2, 0, img_h)
    xl = np.clip(cx - cut_w // 2, 0, img_w) 
    xh = np.clip(cx + cut_w // 2, 0, img_w)
    return yl, yh, xl, xh # 左上顶点坐标(xl, yl) 右下顶点坐标：(xh, yh)

Mixup,Cutout和CutMix的区别

Mixup：将随机的两张样本图像按比例混合，样本标签按照比例分配，如下图的Mixup子图，狗和猫的标签不再是1，分别是0.5和0.5。
Cutout：随机地将样本图像中的部分区域裁减掉，剪裁的区域填充0像素值，样本标签不变。
CutMix：将某一样本图像中的一部分区域剪裁掉，然后随机地填充另一张图片相对应地区域像素值，标签按照两张图片的区域在原图像中的占比分配，如下图的CutMix子图，猫的概率为0.4，狗的概率为0.6。
CutMix相比于Mixup是两个图片的叠加，而不是两张图片的混合。CutMix相比于Cutout，剪裁的区域填充的不是0像素值，而是另一张图片的部分区域。