Mate&Few-Shot Learning
1. 基本概念
- Meta Learning
- F e w − s h o t l e a r n i n g \color{red}Few-shot\;learning Few−shotlearning is a kind of m e t a l e a r n i n g \color{red}meta\;learning metalearning.
- Meta learning: l e a r n t o l e a r n \color{red}learn\;to\;learn learntolearn.
- Supervised Learning vs. Few-Shot Learning
- Traditional supervised learning:
- T e s t \color{red}Test Test samples are n e v e r s e e n b e f o r e \color{red}never\;seen\;before neverseenbefore.
- T e s t \color{red}Test Test samples are from k n o w n c l a s s e s \color{blue}known\;classes knownclasses.
- Few-shot learning:
- Q u e r y \color{red}Query Query samples are n e v e r s e e n b e f o r e \color{red}never\;seen\;before neverseenbefore.
- Q u e r y \color{red}Query Query samples are from u n k n o w n c l a s s e s \color{red}unknown\;classes unknownclasses.
- Traditional supervised learning:
- 术语
- Training Set:
- Support Set:
- k \color{red}k k-way: the support set has k \color{red}k k classes.
-
n
\color{green}n
n-shot: every class has
n
\color{green}n
n samples.
- 3-way is easier than 6-way;
- 2-shot is easier than 1-shot.
- Query:
- Idea: Learn a Similarity Function
- Basic Idea:
- Learn a similarity function: s i m ( x , x ∗ ) sim(x,x^* ) sim(x,x∗).
- Ideally, s i m ( x 1 , x 2 ) = 1 sim(x_1,x_2 )=1 sim(x1,x2)=1, s i m ( x 1 , x 3 ) = 0 sim(x_1,x_3 )=0 sim(x1,x3)=0, and s i m ( x 2 , x 3 ) = 0 sim(x_2,x_3 )=0 sim(x2,x3)=0.
- Step:
- First, learn a similarity function from large-scale training dataset.
- Then, apply the similarity function for prediction.
- Compare the q u e r y \color{red}query query with every sample in the s u p p o r t s e t \color{red}support\;set supportset.
- Find the sample with the highest similarity score.
- Basic Idea:
- Datasets
- Omniglot
- Official website: https://github.com/brendenlake/omniglot/
- TensorFlow: https://www.tensorflow.org/datasets/catalog/omniglot
- Omniglot
2. Siamese Network
2.1 Learning Pairwise Similarity Scores
Ref:
- Bromley et al. Signature verification using a Siamese time delay neural network. In NIPS. 1994.
- Koch, Zemel, & Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML, 2015.
- Data for Training set
Select 2 pieces of training data each time and label them.
- CNN for Feature Extraction
- Training Siamese Network
- Forword Network
- BackWord Network
Update the parameter of CNN.
- One-shot Prediction
The training data (for the Siamese network) does not contain the support set classes and the query.
2.2 Triplet Loss
Ref:
- Schroff, Kalenichenko, & Philbin. Facenet: A unified embedding for face recognition and clustering. InCVPR, 2015
- Data for Training set
Select 3 pieces of training data each time and label them.
- CNN for Feature Extraction
Feature Extraction use the same CNN.
- Triplet Loss
- One-Shot Prediction
2.3 Basic Idea of Few-Shot Learning
- Train a S i a m e s e n e t w o r k \color{red}Siamese\;network Siamesenetwork on large-scale training set.
- Given a
s
u
p
p
o
r
t
s
e
t
\color{red}support\;set
supportset of 𝑘-way 𝑛-shot.
- 𝑘-way means 𝑘 classes.
- 𝑛-shot means every class has 𝑛 samples.
- The training set does not contain the 𝑘 classes.
- Given a
q
u
e
r
y
\color{red}query
query, predict its class.
- Use the Siamese network to compute similarity or distance.
3. Pretraining and Fine Tuning
-
Cosine Similarity
-
Softmax Function
-
Softmax Classifier(全连接层+softmax函数)
Here, 𝑘 is number of classes, and 𝑑 is number of features.
3.1 Few-Shot Prediction Using Pretrained CNN
Reference:
- Dhillon, Chaudhari, Ravichandran, & Soatto. A baseline for few-shot image classification. In ICLR, 2020.
- Chen, Wang, Liu, Xu, & Darrell. A New Meta-Baseline for Few-Shot Learning. arXiv, 2020
- Pretraining
- Pretrain a CNN for f e a t u r e e x t r a c t i o n \color{red}feature\;extraction featureextraction (aka embedding).
- The CNN can be pretrained using
s
t
a
n
d
a
r
d
s
u
p
e
r
v
i
s
e
d
l
e
a
r
n
i
n
g
\color{red}standard\;supervised\;learning
standardsupervisedlearning or
S
i
a
m
e
s
e
n
e
t
w
o
r
k
\color{red}Siamese \;network
Siamesenetwork.
- Deal with the Support set
- Making Few-Shot Prediction
q q q is Query.
- Summary
3.2 Benefit of Fine Tuning
Reference:
- Chen, Liu, Kira, Wang, & Huang. A Closer Look at Few-shot Classification. In ICLR, 2019.
- Dhillon, Chaudhari, Ravichandran, & Soatto. A baseline for few-shot image classification. In ICLR, 2020.
- Chen, Wang, Liu, Xu, & Darrell. A New Meta-Baseline for Few-Shot Learning. arXiv, 2020.
- Fine-Tuning is a improved algorithm for Few-Shot Prediction Using Pretrained CNN.
- The process of Few-Shot Prediction Using Pretrained CNN is:
the j j j of x j x_j xj is the q u e r y \color{red}query query. W \color{red}W W and b \color{red}b b are from the support set.
- Trick 1: A Good Initialization
We can train W \color{red}W W and b \color{red}b b on the support set. (Fine tuning.) - Trick 2: Entropy Regularization
- Trick 3: Cosine Similarity + Softmax Classifier
- summary