Feature Transformation Ensemble Model with Batch Spectral Regularization for Cross-Domain Few-Shot Classification
setup: source with data and label; target with data and no label
3 works:
Construct an ensemble prediction model by performing diverse feature transformations after a feature extraction network
1. a batch spectral regularization (BSR) mechanism: suppress all the singular values of the feature matrix in pre-training so that the pre-trained model can avoid overfitting to the source domain and generalize well to the target domain
2. feature transformation ensemble model: build multiple predictors in projected diverse feature spaces to facilitate cross-domain adaptation and increase prediction robustness
3. to mitigate the shortage of labeled data in the target domain:
- exploit the unlabeled query set in fine-tuning through entropy minimization
- label propagation (LP) step to refine the original classification results
Y ∗ = ( I − α L ) − 1 × Y ^ 0 Y^{*}=(I-\alpha L)^{-1} \times \hat{Y}^{0} Y∗=(I−αL)−1×Y^0 - data augmentation techniques to augment both the few-shot and test instances from different angles to improve prediction performance
others:
penalizing smaller singular values of a feature matrix can help mitigate negative transfer in fine-tuning
?
为什么要让feature*不同的正交矩阵得到multiple diverse feature representation spaces
(1)算法的效果很好,但是使用了太多的方法(这些方法之间又没有什么联系),又没做ablation study,说不清楚到底是哪种方法起到了效果。
(2)与第一篇文章一样,这篇文章也是对输入进行了特征变换。这篇文章则采用了新的变换方式。那么,找到一个好的变换方式是很有必要的。好的变换方式包括
Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels
setup: source with data and little label; target with data and no label
related
Some semi-supervised learning techniques such as entropy minimization [9], pseudo-labeling [16], and Virtual Adversarial Training (VAT) [21] have been often used in domain adaptation (e.g., [17,31,44]).
domain adaptation methods such as [7,18,31] with few source labels.
Prior work accomplished this by using an adversarial domain classifier [7]
or Mean Maximum Discrepancy [19] to align two domain feature distributions. Optimal transport [38] is often used to find a matching pair of two distributions, but this scales poorly [4] and is limited to find a matching in a batch [1]
main idea
learns features that are not only domain-invariant but also class-discriminative
work
captures apparent visual similarity with in-domain self- supervision in a domain adaptive manner and performs cross-domain feature matching with across-domain self-supervision.
Our CDS consists of two objectives: (1) learning visual similarity with in-domain supervision and (2) cross-domain matching with across-domain supervision.
in-domain self-supervision encourages a model to learn discriminative features by separating every instance within a domain
- Instance Discrimination [39] (ID) : treating all the other images as negative pairs
- measure the similarity of features in-domain, and then perform in-domain instance discrimination to learn visual similarity i