Conception
将预训练好的模型学到的知识迁移到别的任务/模型上,使其之前的知识得以保留,辅助在其他任务/模型上的训练。
Motivation
- Exploit a model trained on one task for a related task
- Popular in deep learning as DNNs are data hungry and training cost is high
Approaches
- Features extraction (Word2Vec, ResNet50 features, I3D features)
- Train a model on a related task and reuse it.
- Fine-Tuning from a pretrained model
Fine-Tuning techniques
- Initialize the model with the feature extractor parameters of a pretrained model
- Random Initialize the output layer
- Train with a small learning rate with just a few epochs
- Control the search space of your model, don’t be large.
Just modifies a little, or their will be nonsense with the pretrained model.
Freeze Bottom Layers
在神经网络的底层上学到的一般都是数据的底层的根本的特征,随着层数越多,学习到的全局的知识越多,越具体,向数据的标号空间靠近
在Fine-Tuning中将预训练模型的底层freeze,学习率调为0.
Focus on learning task specific features.
Keep low-level universal features intact.