文章目录
Heuristics-driven
Pad and Crop
即:
import torchvision.transforms as T
T.Compose([
T.Pad(4, padding_mode='reflect'),
T.RandomCrop(32), # for CIFAR-10 and CIFAR-100
T.RandomHorizontalFlip(),
T.ToTensor()
])
注: T.RandomCrop(200)
Cutout
形象点说, 就是给图片挖几个洞(不过作者的解释挺有趣的, 输入层的连续dropout, 这个想法有趣).
# uoguelph-mlrg cutout
# https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py
# 原文代码
import torch
import numpy as np
class Cutout(object):
"""Randomly mask out one or more patches from an image.
Args:
n_holes (int): Number of patches to cut out of each image.
length (int): The length (in pixels) of each square patch.
"""
def __init__(self, n_holes, length):
self.n_holes = n_holes
self.length = length
def __call__(self, img):
"""
Args:
img (Tensor): Tensor image of size (C, H, W).
Returns:
Tensor: Image with n_holes of dimension length x length cut out of it.
"""
h = img.size(1)
w = img.size(2)
mask = np.ones((h, w), np.float32)
for n in range(self.n_holes):
y = np.random.randint(h)
x = np.random.randint(w)
y1 = np.clip(y - self.length // 2, 0, h)
y2 = np.clip(y + self.length // 2, 0, h)
x1 = np.clip(x - self.length // 2, 0, w)
x2 = np.clip(x + self.length // 2, 0, w)
mask[y1: y2, x1: x2] = 0.
mask = torch.from_numpy(mask)
mask = mask.expand_as(img)
img = img * mask
return img
MixUp
MixUp 就是将两个图片按照一定比例相加
x
~
=
λ
x
A
+
(
1
−
λ
)
x
B
,
\tilde{x} = \lambda x_A + (1 - \lambda) x_B,
x~=λxA+(1−λ)xB,
λ
∼
B
e
t
a
(
α
,
α
)
,
α
∈
(
0
,
+
∞
)
\lambda \sim \mathrm{Beta}(\alpha, \alpha), \alpha \in (0, +\infty)
λ∼Beta(α,α),α∈(0,+∞), 同时
y
~
=
λ
y
A
+
(
1
−
λ
)
y
B
.
\tilde{y} = \lambda y_A + (1 - \lambda) y_B.
y~=λyA+(1−λ)yB.
CutMix
CutMix 就是将一个图片裁剪掉一块, 然后用另一张图片的对应位置填补:
x
~
=
M
∘
x
A
+
(
1
−
M
)
∘
x
B
,
\tilde{x} = M \circ x_A + (1 - M) \circ x_B,
x~=M∘xA+(1−M)∘xB,
其中
M
∈
{
0
,
1
}
H
×
W
M \in \{0, 1\}^{H \times W}
M∈{0,1}H×W 是掩码矩阵, 按照如下方式生成:
r
x
∼
U
[
0
,
W
]
,
r
y
=
U
[
0
,
H
]
,
r
w
=
W
1
−
λ
,
r
h
=
H
1
−
λ
,
r_x \sim U[0, W], \: r_y = U[0, H], \\ r_w = W\sqrt{1 - \lambda} , \: r_h = H\sqrt{1 - \lambda},
rx∼U[0,W],ry=U[0,H],rw=W1−λ,rh=H1−λ,
得到一个bounding box, 在其中的
M
i
j
=
0
M_{ij}= 0
Mij=0, 否则为
1
1
1,
λ
\lambda
λ控制框的大小的:
r
w
r
h
H
W
=
1
−
λ
.
\frac{r_w r_h}{HW} = 1 - \lambda.
HWrwrh=1−λ.
此时, 标签需要发生相应变化:
y
~
=
λ
y
A
+
(
1
−
λ
)
y
B
,
\tilde{y} = \lambda y_{A} + (1 - \lambda)y_B,
y~=λyA+(1−λ)yB,
这里标签是one-hot形式(或者概率向量).
AugMix
Data-driven
AutoAugment
根据数据集选择合理的augmentations策略.
RandAugment
从
K
K
K中augmentations中随机选择
N
N
N(replace=True), 每个的magnitude都是
M
M
M.
其中
M
,
N
M, N
M,N是通过网格搜索找到的(太粗暴了吧)
DeepAugment
假设 f ( ⋅ ; θ ) f(\cdot;\theta) f(⋅;θ)是一个Image-to-Image的网络, 则DeepAugment是指对 θ \theta θ添加扰动, 比如扰动权重, 改变激活函数等等.
和这个思想类似的还有AdversarialAugment.