- 本文为李宏毅 2021 ML 课程的笔记
目录
Transfer Learning
- Transfer Learning: 运用已有的相关知识来学习新的知识 (Data not directly related to the task considered)
- e.g. We have a Dog/Cat Classifier.
But we want to apply it to the following two tasks:
- e.g. We have a Dog/Cat Classifier.
Transfer Learning - Overview
labelled → \rightarrow → labelled
Task description
- Source data: ( x s , y s ) (x^s,y^s) (xs,ys) (A large amount)
- Target data: ( x t , y t ) (x^t,y^t) (xt,yt) (Very little)
One-shot learning: only a few examples in target domain
- Challenge: only limited target data, so be careful about overfitting
Model Fine-tuning
- Model Fine-tuning: training a model by source data, then fine-tune the model by target data
Conservative Training
- 为了避免在 target data 上过拟合,可以加上正则项,例如 (1) 使模型在迁移学习前后,输入同一个 input 得到的 output 相近 (output close),或 (2) 参数的 L2 norm 相近 (parameter close)
Layer Transfer
- Layer Transfer: 只用 target data 训练某几层的参数,保持其余层的参数不变 (如果有足够的 target data 的话,也可以 fine-tune 整个模型)
- e.g. 保留前面层的参数,只修改最后一层或几层的参数 (数据越多,修改的层数越多),并构造新的输出层。这样做的话就相当于只训练一个浅层网路
- e.g. 保留前面层的参数,只修改最后一层或几层的参数 (数据越多,修改的层数越多),并构造新的输出层。这样做的话就相当于只训练一个浅层网路
- Which layer can be transferred (copied)?
- Speech: usually copy the last few layers (前几层一般用来去除语者信息,后几层一般用于处理语义信息)
- Image: usually copy the first few layers (前几层一般是提取纹理等基础特征)
大部分深度学习框架都可以设置来不训练某些层的参数
Multitask Learning
- Multitask Learning: 要求模型在 target data 和 source data 上都有不错的性能
The multi-layer structure makes NN suitable for multitask learning
Multitask Learning - Multilingual Speech Recognition
Progressive Neural Networks
- paper: Progressive Neural Networks
labelled → \rightarrow → unlabeled
Task description
- Source data: ( x s , y s ) (x^s,y^s) (xs,ys)
- Target data: ( x t ) (x^t) (xt)
Examples
可以把 source data 看作 training data,target data 看作 testing data
Domain-adversarial training
Zero-shot learning
Representing each class by its attributes
- 模型不直接输出图像的类别,而是输出该类别的属性。在 testing data 上测试时,通过模型输出的属性找到其对应的类别
Attribute embedding
- 图像对应的属性可能非常多,会导致模型参数量变得过大,此时可以将图像和类别对应的属性都映射到同一个 embedding space,使得同一类的图像和类别属性的映射尽量接近,不同类的图像和属性的映射尽量远离。最后在测试时,只需将图像映射到 embedding space 中,看看图像与哪一个类别属性的 embedding 最接近即可
f ∗ , g ∗ = arg min f , g ∑ n max ( 0 , k − f ( x n ) ⋅ g ( y n ) + max m ≠ n f ( x n ) ⋅ g ( y m ) ) f^*,g^*=\argmin_{f,g}\sum_n\max\bigg(0,k-f(x^n)\cdot g(y^n)+\max_{m\neq n}f(x^n)\cdot g(y^m)\bigg) f∗,g∗=f,gargminn∑max(0,k−f(xn)⋅g(yn)+m=nmaxf(xn)⋅g(ym))- k k k: Margin you defined
- Zero loss:
Attribute embedding + word embedding
- 如果图像对应的属性多到难以在属性和类别之间进行一一映射,此时可以将对类别属性的映射改为对词向量的映射
Convex Combination of Semantic Embedding
- 还有一种更简单的 Zero-shot learning 的方法:直接用现成的 Imagenet 分类和词向量的模型,按照网络最后输出各个类别的概率,直接对所有类别的词向量进行加权求和即可
More about Zero-shot learning
- Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M. Mitchell, “Zero-shot Learning with Semantic Output Codes”, NIPS 2009
- Zeynep Akata, Florent Perronnin, Zaid Harchaoui and Cordelia Schmid, “Label-Embedding for Attribute-Based Classification”, CVPR 2013
- Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc’Aurelio Ranzato, Tomas Mikolov, “DeViSE: A Deep Visual-Semantic Embedding Model”, NIPS 2013
- Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean, “Zero-Shot Learning by Convex Combination of Semantic Embeddings”, arXiv preprint 2013
- Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko,
“Captioning Images with Diverse Objects”, arXiv preprint 2016
unlabeled → \rightarrow → labelled
Self-taught learning
- Self-taught learning: Learning to extract better representation from the source data (unsupervised approach) (e.g. learn an auto-encoder as a feature extractor)
Self-taught learning 和 Semi-supervised learning 有所不同,虽然都是有大量的 unlabeled data 和少量的 labelled data,但 Self-taught learning 中的 labelled data 和 unlabeled data 之间没有直接关联
unlabeled → \rightarrow → unlabeled
Self-taught Clustering
- paper: Self-taught clustering