Learning Embedding Adaptation for Few-Shot Learning---- 论文阅读笔记

最新推荐文章于 2023-04-10 18:58:28 发布

wuxtwu

最新推荐文章于 2023-04-10 18:58:28 发布

阅读量1.8k

点赞数 2

分类专栏： AI

AI 专栏收录该内容

32 篇文章 1 订阅

订阅专栏

Abstract

之前的方法：is task-agnostic: the embedding function is not learned optimally discriminative with respect to the unseen classes, where discerning among them is the target task.
改进的地方：to adapt the embedding model to the target classiﬁcation task, yielding embeddings that are task-speciﬁc and are discriminative.
具体的做法：Transformer to transform the embeddings from task-agnostic to task-speciﬁc by focusing on relating instances from the test instances to the training instances in both seen and unseen classes.
在测试和训练阶段，关注于测试样本和训练样本的相关性。

Introduction

The main idea is to discover transferable visual knowledge in the SEEN classes, which have ample labeled instances, and leverage it to construct the desired classiﬁer. 总结了之前few-shot learning的核心一点是从SEEN类别中学习到可迁移的知识，将其迁移到最后所需要的理想分类器中。 ----- 引出一些metric learning的做法。
指出，之前的metric learning方法有一个不太合理的假设，即对于每个任务，使用的是共同的embedding space. : Assuming a common embedding space implies that the discovered knowledge. 然而， Intuitively, each task uses a different set of discriminative features. ， Thus, 已有的方法 ﬁrst needs to be able to extract discerning features for either task at the same time，即时这个共同的features 被提取到了，由于任务的不同，对于猫和老虎有用的feature，对于猫和狗而言可能不那么相关或者是有噪音的。缺少了“自适应”策略 from the SEEN classes to the UNSEEN ones

Related Work

Metric learning 的关键： The key assumption is that the embeddings capture all necessarily discriminative representations of data such that simple classiﬁers are sufﬁced, hence avoiding the danger of overﬁtting on a small number of labeled instances
和之前的 “任务无关” 不同，本文提出的是“任务具体的”，因此得到的 embeddings 能更好的对齐每个具体任务的判别性。

Method

3.1 Learning Embedding for Task-agnostic Few-Shot Learning

两个公式，（1）对于标准的FSL训练阶段的问题设定：
(2) 特别地，对于metric learning的方法：

3.2 Adapting Embedding for Task-speciﬁc Few Shot Learning

3.2.1 Adapting Embedding for Task-speciﬁc Few Shot Learning

和3.1相比， We argue that the embedding φx is not ideal. In particular,the embeddings do not necessarily highlight the most discriminative representation for a speciﬁc target task. 即，3.1的embeddings并没有对一个具体的任务，强调出其相应的最具判别力的部分。
因此，提出了自适应的部分。

可以和（2）公式对比，输入的训练embenddings是经过transformers的。

3.2.2 Detail (Transformer as a Set Function for Adaptation)

核心： we employ self-attention mechanism [31, 51] to transform each instance embedding with consideration to its contextual instances。即将每个实例的嵌入向量，转化为其相应的上下文实例向量。
这里的“自然解释”暂时不太懂：“premutation invariant” : Note that it naturally satisﬁes the desired properties of T because it outputs reﬁned instance embeddings and is permutation invariant （这个词在后面实验部分的 T 结构介绍也出现，具体含义暂时未理解透……）
- permutation invariant 应该是排序不变性，指输入顺序的改变不影响最终输出的结果。
Transformer 框架流程：
- 输入有三个，Q , K , V.
- Q: query point 实例的集合；
- K: 也是实例的集合，其 embeddings已经被计算出来，保存在 V 集合中；
- V：K 集合的 embeddings。
- 输出：Q集合每一个query point 的 values。
- 流程：
  - 首先对实例进行线性映射：
  - 计算 query point 相对于每个 key 对应 value 的 权重系数：
  - 进行加权求和，得到 qurey point 对应的 value:
关于这些集合 sets 的选择，论文里大致分了两类： 1）来自训练集；2）来自训练集合测试集。

3.2.3 Contrastive Learning of Set Functions

关于这个 contrastive learning，暂时还不没理解透彻，应该是度量学习的一种吧。论文里提到这样设计的目的是：
It is designed to make sure that instances
embeddings after adaptation is similar to the same class neighbors and dissimilar to those from different classes.
在 Transformer 之后，使用对比目标函数，使得训练实例离它相应的类别中心近，远离不是其它的类别中心。
具体做法，是在公式（1）的后面加入该对比目标：

3.2.4 Implementation details

网络结构有两大块， 1）四层卷积；2）ResNet。
使用了 pre-training 策略。
浅的 Transformer 有好的效果？？ We empirically observed that the shallow transformer (with one set of projection and one stacked layer) gives the best overall performance

Experimental Setups

4.1 Main Results

注意还用了 OfficeHome，为了验证在不同 domain 的性能。
提到了之前的验证方式，在600个目标任务（每个类15个样本）上测试模型的效果，会有较大的方差，提出，在1000个目标任务上测试结果。仍然用了95置信度。
Baseline 基本上是 Matching Net 和 Prototype Net, 并在 logit 上利用了 尺度参数（temperature）。用 PN 提取 embedding.
关于 embedding adaptation functions T 的学习，主要有一下几种，其区别在于集合函数 set function的选择：
- BILSTM
- BILSTM* ：增加测试实例进行联合嵌入向量的自适应。
- DEEPSETS ：用了一个排序不变性的结构 DeepSets 作为 T。 Note : DeepSets将训练实例聚合到了一个过去的集合向量中。之后，用MLP 将当前的实例和过去的向量一起作为输入，然后输出实例的嵌入向量。
- DEEPSETS* :
以预训练的 PN 作为 baseline。 BILSTM和DEEPSETS并不是一直有效，Transformer 确是。论文再次提到原因为：
Transformer naturally implements the permutation invariance set-to-set adaptation function。

4.2 Ablation Studies and Analysis

分析一：简单分析验证 adaption 是否有效。
研究 “Interpolation and Extrapolation of few-shot tasks” ？？用 20-way 训练固定的embedding adaption 以及用 5-way 训练。最后用 N = {5,10，15,20}这样类别数的任务上测试。
BILSTM 表现最差，论文解释原因可能为它不是序列不变的（permutation invariant），那么可能会拟合到实例之间的任意的相关性。
可视化分析了原因（HOW），让 support embeddings 远离杂乱的。
增加层和 heads ?? （Transformer 的复杂度研究）影响不大。

4.3 Extended Few-Shot Learning Tasks

Cross-Domain FSL。需要模型识别到物体的固有属性而不是外观 (即，当visual appearance 改变时)；需要类比识别。 ???
Transductive FSL。测试样本是全部同时进来？？？？
是把无标签的测试样本加入到 Transformer 结构中的 key 和 values中。那因此，也把测试样本的相关关系考虑进来。
Generalized FSL & Low-shot Learnin。测试样本来自 SEEN 和 UNSEEN。

Discussion

本文提出，用一个训练任务和目标任务共同的 embedding spcace，对于目标任务而言，这个embedding space 是不够具有判别力度的，尤其是当样本量少的情况。
自适应的 embedding space 能够利用到目标任务中训练实例和测试实例之间的相互关系。从而可以得到更具判别力度的实例表示。

wuxtwu

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
8
评论
Learning Embedding Adaptation for Few-Shot Learning---- 论文阅读笔记

Abstract之前的方法：is task-agnostic: the embedding function is not learned optimally discriminative with respect to the unseen classes, where discerning among them is the target task.改进的地方：to adapt the ...
复制链接

扫一扫

专栏目录