arXiv-2020
文章目录
1 Background and Motivation
最近 Regional dropout strategies 被提出以提升 CNN 模型的泛化能力(overlaying a patch of either black pixels or random noise)
缺点:it leads to information loss and inefficiency during training
CNN 对数据是饥渴的,How can we maximally utilize the deleted regions, while taking advantage of better generalization and localization using regional dropout?
作者提出 CutMix 来缓解上述问题——the removed regions are filled with patches from another training images,提升了网络的分类 / 定位 / 泛化能力
2 Related Work
- Regional dropout:CutMix = Cut+fill
- Synthesizing training data
- Mixup:locally ambiguous and unnatural
- Tricks for training deep networks:eg BN / weight decay / drop out,CutMix is complementary to the above methods because it operates on the data level
3 Advantages / Contributions
提出 CutMix 数据增广方法
- no computational overhead
- surprisingly effective on various tasks(分类,半监督目标定位)
- 用 CutMix-ImageNet pretrained model as the initialized backbone of the object detection and image captioning brings overall performance improvements.
- 提升 model robustness against input corruptions and its out-of-distribution detection performances
4 Method
patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches,
M ∈ { 0 , 1 } W × H M \in \{0,1\}^{W \times H} M∈{0,1}W×H,是一个 binary mask indicating where to drop out and fill in from two images
⨀ \bigodot ⨀ 是 element-wise multiplication
λ \lambda λ 服从 B e t a ( α , α ) Beta(\alpha, \alpha) Beta(α,α) 分布,文中作者把 α \alpha α 设置为了 1,也即均匀分布
随机出来的 remove and fill 区域中心坐标
r
x
r_x
rx,
r
y
r_y
ry 和长宽
r
h
r_h
rh,
r
w
r_w
rw 如下
随机出来的面积为
1
−
λ
1-\lambda
1−λ
具体选取流程如下:
r
w
r_w
rw 和
r
h
r_h
rh 分别少了
w
w
w 和
h
h
h
合并方式很精髓哟,哈哈
这就是代码的魅力,shuffle 后的 input_s 截取后覆盖掉原始数据的对应区域,进而实现 A B 图的 remove and fill
标签也合并在了一起
1)What does model learn with CutMix?
CutMix can take advantage of the mixed region
Mixup introduces unnatural artifacts
2)Analysis on validation error
5 Experiments
5.1 Datasets
- ImageNet
- CIFAR10
- CIFAR100
- PASCAL VOC object detection
- MS-COCO image captioning
- CUB2002011,弱监督,top-1 localization accuracy(IoU>0.5)
5.2 Image Classification
1)ImageNet Classification
Comparison against baseline augmentations:
Cut-Mix come at little or memory or computational time(无推理代价的提升)
Comparison against architectural improvements:
2)CIFAR Classification
The superior performance of Cutout and Mixup combination shows that mixing via cut and paste manner is better than interpolation, as much evidenced by CutMix performances.
CutMix for various models:
CutMix for CIFAR-10:
3)Ablation Studies
探索了 Beta 分布的形式,以及 feature-level CutMix
下面这个表探索了 CutMix 的形式
Center Gaussian CutMix 用高斯分布代替均匀分布抽样
r
x
r_x
rx 和
r
y
r_y
ry
那 remove and fill 的区域中心分布在图片的中心的概率较大
Fixed-size CutMix 固定 r w r_w rw 和 r h r_h rh 的尺寸为 16x16,
Scheduled CutMix,随着训练的进行,linearly increases the probability to apply CutMix,0~1
Onehot CutMix,不融合标签,mix 后,谁的面积大则采用谁的标签
Complete-label CutMix,标签融合的形式 55 开,也即 0.5 y A + 0.5 y B 0.5y_A + 0.5y_B 0.5yA+0.5yB
5.3 Weakly Supervised Object Localization
Mixup tends to make a classifier focus on small regions
more ambiguity in Mixup samples make a classifier focus on even more discriminative parts of objects
CutMix focus on learning spatially dispersed representations
5.4 Transfer Learning of Pretrained Model
PASCAL VOC object detection
MS-COCO image captioning(附录版)
5.5 Robustness and Uncertainty
adversarial samples, occluded samples, and in-between class samples
梯度上升干扰,配置 noise scale
ϵ
=
8
/
255
\epsilon = 8/255
ϵ=8/255
遮挡时 cutout 和 cutmix 伯仲之间
CutMix even improves the robustness to the unseen Mixup in-between class samples
产生 in-between class samples 的方式为
x
=
λ
x
A
+
(
1
−
λ
)
x
B
x = \lambda x_A + (1-\lambda)x_B
x=λxA+(1−λ)xB,employ the center mask instead of the random mask,remove 的大小 0~244
From the experiments, we observe that our proposed CutMix enhances the OOD detector performance while Mixup and Cutout produce more overconfident predictions to OOD samples than the baseline
6 Conclusion / Future work
1)《A baseline for detecting misclassified and out-of-distribution examples in neural networks》(ICLR-2017)
2)《Enhancing the reliability of out-of-distribution image detection in neural networks》(ICLR-2018)
3)基于深度模型Out of Distribution(OOD)基础技术路线研究