【论文阅读】【元学习/小样本学习】【ICLR2020】CROSS-DOMAIN FEW-SHOT CLASSIFICATION

本文链接：https://blog.csdn.net/Egg_Hu/article/details/113779121

【论文阅读】【元学习/小样本学习】【ICLR2020】CROSS-DOMAIN FEW-SHOT CLASSIFICATION VIA LEARNED FEATURE-WISE TRANSFORMATION

在这里插入图片描述

Few-Shot Classification

小样本分类问题通常描述为N-ways K-shots问题（下图描述的就是3-ways 3-shots问题）。核心思想是利用Support set中N个类别、每个类别K个样本的有标签小样本数据对Query set中的无标签样本进行分类。
在这里插入图片描述

Metric-Based Approach

基于度量的方法包含一个Feature encoder和一个Metric function。首先同时对support set中的有标签小样本和query set中的无标签样本提取feature，之后使用Metric funtion对query set中的样本进行分类。

在这里插入图片描述

不同基于度量的方法在Metric Function的设计上不同。

Meta-Learning Setting

Meta learning

Problem formulation and motivation：Cross-Domain Few-Shot Classification

在ICLR2019的文章中，假设meta-training的domain是mini-ImageNet，1）如果meta-testing的domain还是mini-ImageNet，则在5-shot的分类任务上，准确率在70%以上；2）如果meta-testing的domain是CUB（更加细粒度的分类），则在5-shot的分类任务上，准确率在50%左右。
在这里插入图片描述

上述问题出现的原因是因为两个domain的特征分布不同，Metric function不能够泛化到新的特征分布上。（As a result, during the training stage, the metric function may overfit to the feature distributions encoded only from the seen domains and thus fail to generalize to unseen domains.
）

我的理解是，mini-ImageNet是一个粗粒度的分类数据集，而CUB是一种细粒度的分类数据集。在meta-training阶段，我们的目的是拉大类间距离（比如鸟类和美食类），而鸟类又包含了各种鸟，我们没有关注细粒度类间距离。在meta-testing阶段，使用CUB数据集导致我们提取的特征都聚集在bird类附近，但各种鸟类之间距离没有通过训练来增大。

在这里插入图片描述

mete-testing阶段特征分布和meta-training阶段不同（通常是由于meta-training和meta-testing数据来自不同数据集导致的，但是每一个task的support set和query set的数据来自同一个domain数据集）。

FEATURE-WISE TRANSFORMATION LAYER

由于seen和unseen domain中task数据特征分布的差异性，metric function $M$ 可能会过拟合到seen domain而不能泛化到unseen domain上。

核心思想是diversify feature distribution。在训练阶段利用仿射变换增强图像的特征，模拟不同domain下的各种特征分布，从而提高Metric function $M$ 的泛化能力。
在这里插入图片描述

具体做法是在特征提取器 $E$ 中增加feature-wise transformation layer，该转换层利用由超参数θγ和θβ参数化的高斯分布采样的缩放项 $\gamma$ 和偏差项 $\beta$ ,来改变中间特征激活z。
在这里插入图片描述

在这里插入图片描述

LEARNING THE FEATURE-WISE TRANSFORMATION LAYERS

凭经验选择超参数 $\theta_f={\theta_\gamma , \theta_\beta}$ 是有困难的。作者提出了一种learning-to-learn的方法来选择超参数。
当前模型在Pseudo-unseen domain上的表现体现的是该模型在其他domain上的泛化能力。Learning to Generalize的核心思想是通过优化参数 $\theta_f$ 来提升当前模型在unseen domain上的表现。The core idea is that training the metric-based model integrated with the proposed layers on the seen domains should improve the performance of the model on the unseen domains.

为什么不在Pseudo-seen domain上更新参数 $\theta_f$ ？
因为 $\theta_f$ 不是用来减小分类误差的，而是为了diversify the feature distribution，从而提高模型在其他domain上的分类精度，所以应该用该模型在其他domain上的分类误差来更新 $\theta_f$ 。

这里有点类似MAML，参数的更新是为了减小在其他分类任务中使用该参数作为初始参数的模型的分类误差。
MAML
在这里插入图片描述
存在二阶导，消耗GPU memory

在这里插入图片描述
在每个训练的iteration $t$ 上，从一系列seen domains上采样了一个pseudo-seen domain 和一个pseudo-unseen domain 。给定一个metric-based 模型，其Feature Encoder为 $E_{\theta_e^t}$ ，其Metric Function为 $M_{\theta_m^t}$ 。首先将超参数为 $\theta_f^t$ 的变换层插入到Feature Encoder$E_{\theta_e^t} 中，使用 pseudo-seen task 更新metric-based 模型里的参数，如公式(5)所示。在这里插入图片描述
然后使用更新的模型来测试generalization性能。首先，移除模型里的特征变换层，然后使用pseudo-unseen task 来计算分类损失，用来更新特征变换层的参数。如公式(6)和公式(7)所示。

在这里插入图片描述

实验

两种实验设置：
在这里插入图片描述

凭经验预决定超参数 $\theta_f={\theta_\gamma , \theta_\beta}$ ，并且分析feature-wise transformation layers的影响。在mini-ImageNet domain 上meta-trainging模型，在其余四个CUB、Cars、Places、Plantae domain上meta-testing。
分析learning-to-learn的作用。使用leave-one-out 留一法策略：从CUB、Cars、Places、Plantae四个domain上选择一个作为unseen domain，其余三个和mini-ImagNet一起作为seen domain用来训练模型。

backbone：ResNet-10

Pre-trained feature encoder：预训练feature encoder $E$ ，最小化mini-ImageNet 64分类问题的standard cross-entropy classification loss。

Table 1：hand-tuned
feature-wise transformation 没有使用learning-to-learn，凭经验选择参数
在这里插入图片描述
Table 2：train on multiple training sets, test on one set ,LFT指使用了learning-to-learn