文章目录
BY:TA陈建层
除了machine learning之外
又有了新的意义:Meta Learning
本节内容太繁杂,笔记只能记录大概。
内容简介
Outline
● What is meta learning?
● Why meta learning?
● How and what to do meta learning?
● Categories
● Datasets
● Models
公式输入请参考:在线Latex公式
What is meta learning?
这块上节有讲,不啰嗦
就是learn to learn
Usually considered to achieve Few-shot learning (but not limited to)
Why meta learning?
- Too many tasks to learn, to learn more efficiently
○ Faster learning methods (adaptation)
○ Better hyper-parameters / learning algorithms
○ Related to:
■ transfer learning
■ domain adaptation
■ multi-task learning
■ life-long learning - Too little data, to fit more accurately
(Better learner, fit more quickly)
○ Traditional supervised may not work
How and what to do meta learning?
Categories
命名大全
● MAML (Model Agnostic Meta-Learning)
● Reptile (???)这个应该跟训练的路线像爬虫一样
● SNAIL (Simple Neural AttentIve Learner)
●PLATIPUS (Probabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning)鸭嘴兽
● LLAMA (Lightweight Laplace Approximation for Meta-Adaptation)骆马
● ALPaCA (Adaptive Learning for Probabilistic Connectionist Architectures)羊驼
● CAML (Conditional class-Aware Meta Learning)骆驼
●LEO (Latent Embedding Optimization)(拉丁)狮子
● LEOPARD(Learning to generate softmax parameters for diverse classification)豹
● CAVIA (Context Adaptation via meta-learning)(not CAML)豚鼠
● R2-D2 (Ridge Regression Differentiable Discriminator)机器人
分类
将上面这么多的模型按学习的对象来分:
- Model Parameters (suitable for Few-shot framework)
○ Initializations
○ Embeddings / Representations / Metrics
○ Optimizers
○ Reinforcement learning (Policies / other settings) - Hyperparameters (e.g. AutoML )自动调参
(beyond the scope of today, but can be viewed as kind of meta learning)
○ Hyperparameters search ((training) settings)李老师调参课程
○ Network architectures → Network architecture search (NAS)
(related to: evolutional strategy, genetic algorithm…) - Others
○ Algorithm itself (literally, not a network)…… (More in DLHLP)
Datasets
了解一下ML常用数据集:
-
Omniglot(omni = all, glot = language)
○ Launched by linguist Simon Ager in 1998
○ As a dataset by Lake in 2015, Science
○ Concept learning
除了真实世界的文字:
还有二次元的文字:
脑洞打开,可以把动漫中的文字做进来:
-
miniImageNet
○ from ImageNet but few-shot
-
CUB (Caltech-UCSD Birds)
Models
分类
Meta-LSTM可以看做是黄绿融合
还有一些融合方法:
下面针对这四个颜色的分类进行简单讲解。
Black-box
思想是每个任务都对应有一个
f
θ
f_\theta
fθ
那把任务看做是数据,丢到RNN中,希望RNN能够预测出新任务对应的参数
也有用LSTM来做的,并加上注意力机制
Optimization / Gradient based
Learn model initialization用来学习初始化参数的模型:
● MAML (Model Agnostic Meta Learning)
● Reptile
● Meta-LSTM (can be also viewed as RNN black-box)
improvements of MAML针对MAML进行改进的模型:
● Meta-SGD
● MAML++
● AlphaMAML
● DEML
● CAVIA
different meta-parameters 学习其他参数的模型:
● iMAML
● R2-D2 / LR-D2
● ALPaCA
● MetaOptNet
Problems of MAML
● Learning rate → Meta-SGD, MAML++:每个任务的参数
θ
\theta
θ都用相同的LR是不太合适
Meta-SGD:“Adaptive learning rate” version of MAML,加入了一个参数
α
\alpha
α来解决不同任务不同LR的问题
● Second-order derivatives (instability) → MAML++:在MAML推导过程中忽略的二次偏导导致结果不准确
● Batch Normalization → MAML++:在训练过程加入BN
以上issue导致MAML有下面问题:
- Training Instability外层循环的参数梯度不稳定,容易爆炸或消失
○ Gradient issues - Second Order Derivative Cost
○ Expensive to compute
○ First-order → harmful to performance - Batch Normalization Statistics
○ No accumulation
○ Shared bias - Shared (across step and across parameter) inner loop learning rate
○ Not well scaled - Fixed outer loop learning rate
Solutions proposed
- Training Instability ⇒ Multi-Step Loss Optimization (MSL)多更新几次内部循环的参数在更新外循环的参数(这不是Reptile吗?)
○ Gradient issues - Second Order Derivative Cost ⇒ Derivative-Order Annealing (DA)更新内循环参数的前几次忽略二次偏导项,后面则不忽略
○ Expensive to compute
○ First-order → harmful to performance - Batch Normalization Statistics
○ No accumulation ⇒ Per-Step Batch Normalization Running Statistics
○ Shared bias ⇒ Per-Step Batch Normalization Weights & Biases - Shared (across step and across parameter) inner loop learning rate
⇒ Learning Per-Layer Per-Step Learning Rates & Gradient Directions (LSLR) - Fixed outer loop learning rate
⇒ Cosine Annealing of Meta-Optimizer Learning Rate (CA)
MAML的其他改进方式
当然还有两种从构架上都和MAML不一样的来改进初始化参数的模型:
● Implicit gradients → iMAML
左边是原始MAML,中间是忽略了二次偏导的MAML,右边是iMAML
● Closed-form on feature extraction → R2-D2:用L2正则代替CNN中最后的FC分类。
Metric-based / non-parametric
Learn to compare!
之前的Meta Learning的思想是学一个F来选出一个合适的f,解决某个任务f的参数是
θ
^
\hat\theta
θ^
那上面的分类模型我们可以转换思想,不一定要学一个f来进行分辨testing data中是猫咪还狗狗
而是直接判断testing data与左边的猫和狗的相似度,像猫咪就归类为猫咪,模型构架就变成下面的样子。
模型函数就变成抽取特征,将数据都变成向量表示,最后用KNN、L2等来衡量相似度即可。
常见模型:
• Siamese network孪生网络
• Prototypical network:已知原型的表示,然后将数据抽取后与原型的特征向量进行比较
• Matching network:在上面的模型的基础上考虑不同分类之间的关系,用BiLSTM来存储这些关系。
• Relation network
另外两种方法:
- IMP (Infinite Mixture Prototypes)
• Modified from prototypical
• The number of mixture determined from data through Bayesian nonparametric methods - GNN
Problems of metric-based
• When the K in N-way K-shot large → difficult to scale(数据量大不好分类)
• Limited to classification (only learning to compare)
Hybrid
Optimization based on model + Metric based embedding (RelationNet z)
这里用的Encoder和Decoder,训练任务通过Encoder映射为Z,然后经过Decoder还原回任务对应的参数
θ
\theta
θ
Bayesian meta-learning
额外讲一个贝叶斯元学习模型
PS:居然有嘻哈帝国的Lucious Lion
左边数据有三个特征那么如果数据如下图只有两个特征怎么归类:
目前解决这个Uncertainty problems问题的模型有:
Black-box:
• VERSA
Optimization:
• PLATIPUS
• Bayesian MAML (BMAML)
• Probabilistic MAML (PMAML)