[ECCV 2020] FGVC via progressive multi-granularity training of jigsaw patches

最新推荐文章于 2023-02-10 20:16:55 发布

连理o

最新推荐文章于 2023-02-10 20:16:55 发布

阅读量764

点赞数

文章标签： ECCV 2020

本文链接：https://blog.csdn.net/weixin_42437114/article/details/128976079

版权

papers 专栏收录该内容

39 篇文章 1 订阅

订阅专栏

Introduction
Progressive Multi-Granularity (PMG) training framework
Experiments
References

Introduction

不同于显式地寻找特征显著区域并抽取其特征，作者充分利用了 CNN 不同 stage 输出的特征图的语义粒度信息，并使用 Jigsaw Puzzle Generator 进行数据增强来帮助模型学得多粒度的图像特征，提高模型的细粒度分类性能。值得注意的是，Jigsaw Puzzle Generator 进行数据增强的过程非常类似于 Swin Transformer 合并 image patch 的过程，并且文章也进一步证明了融合多个 stage 的预测结果对细粒度分类是有很大提升的

在这里插入图片描述

Progressive Multi-Granularity (PMG) training framework

在这里插入图片描述

Network Architecture：PMG 可以采用任意 backbone $F$ . 假设它有 $L$ stages，其中 $l$ -th stage 输出的特征图为 $F^l\in\R^{H_l\times W_l\times C_l}$ . 此外，由于作者还想在最后 $S$ 个 $F^l$ 上施加分类损失，因此每个 stage 还对应一个 convolution block $H_{conv}^l$ (2 conv + max pooling) 用来得到特征向量 $V^l=H_{conv}^l(F^l)$ ，最后再经过 $H^l_{class}$ with Batchnorm and Elu (2 FC + Softmax) 即可得到 $y^l = H^l_{class}(V^l)$ . 此外，将最后 $S$ 个 stage 对应的 $V^l$ concat 起来可以得到
还可以在 $V^{concat}$ 上施加分类损失 $y^{concat} = H^{concat}_{class} (V^{concat})$ (作者选取 $S = 3$ )
Progressive Training：作者采用了 progressive training，即先训练 low stage，再逐步训练后续 stage (At each iteration, a batch of data $d$ will be used for $S + 1$ steps)。由于 low stage 的感受野和表达能力有限，因此为了正确分类，它更容易关注到一些 discriminative information from local details (i.e. object textures) (this increment nature allows the model to locate discriminative information from local details to global structures when the features are gradually sent into higher stages, instead of learning all the granularities simultaneously)
Jigsaw Puzzle Generator：Jigsaw Puzzle solving (Wei, Chen, et al. “Iterative reorganization with weak spatial constraints: Solving arbitrary jigsaw puzzles for unsupervised representation learning.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.) 已被证明是一种有效的图像增强方法。作者使用 Jigsaw Puzzle 来为不同 step 的输入数据进行数据增强来强制模型学习与当前 stage 相对应的粒度信息 (devise diﬀerent granularity regions and force the model to learn information speciﬁc to the corresponding granularity level at each training step; only the last step (combined step) is still trained with original images)。给定输入图像 $d\in\R^{3\times W\times H}$ ，可以将其分为 $n\times n$ patches，然后将 patches 随机 shuﬄe 组成新的图像， $n$ 越大 patches 对应的粒度也就越小。每个 stage 的 $n$ 需要满足如下条件：(i) patch size 应该小于当前 stage 的感受野；(ii) patch size 应该随着当前 stage 感受野的增加而增加。由于相邻 stage 感受野通常减半，因此作者将 $l$ -th stage 的 $n$ 设为 $2^{L-l+1}$ 。需要注意的是，jigsaw puzzle generator 并不能总是保证细粒度特征区域在同一个 patch 内，但由于作者采用了 random cropping，因此这一问题不会带来模型性能降低
Inference：可以只使用 concat feature 进行分类
也可以融合多个 stage 的分类结果进行分类

Experiments

Implementation Details 见 4.1 (The input images are resized to a fixed size of $550 \times 550$ and randomly cropped into $448 \times 448$ )

Comparisons with State-of-the-Art Methods
Ablation Study
Visualization (Grad-CAM)

References

连理o

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
[ECCV 2020] FGVC via progressive multi-granularity training of jigsaw patches

[ECCV 2020] FGVC via progressive multi-granularity training of jigsaw patches
复制链接

扫一扫