Abstract
- by learning an instance embedding function from seen classes, and apply the function to instances from unseen classes with limited labels.
- usually learn a discriminative instance embedding model from the SEEN categories, and apply the embedding model to visual data in UNSEEN categories
- no-parametric classifiers to avoid learning complicated recognition models from a small number of examples.
- The most useful features for discerning “cat” versus “tiger” could be irrelevant and noise to the task of discerning “cat” versus “dog”
Introduce
- What is lacking in the current approaches for few-shot learning is an adaptation strategy that tailors the visual knowledge extracted from the SEEN classes to the UNSEEN ones in a target task. In other words, we desire separate embedding spaces where each one of them is customized such that the visual features are most discriminative for a given task
- The key assumption is that the embeddings capture all necessarily discriminative representations of data such that simple classifiers are sufficed。
- We use the Transformer architecture to implement T. In particular, we employes self-attention mechanism to improve each instance embedding with consideration to its contextual embedding
实验
use pre-train??
- to demonstrate the effectiveness of using a permutation-invariant set function instead of a sequence model. Please see supplementary for details.
- Transformer is set-to-set transformation
- customizes a task-specific embedding spaces via a self-attention architecture