知识蒸馏是模型压缩的一种方法,指将复杂的、参数量大的模型的特征表达能力迁移到简单的、参数量小的模型中,以加快模型的推理。
知识蒸馏的关键两点:一是如何定义知识,二是使用什么损失函数来度量student网络和teacher 网络之间的相似度。
方法介绍
|
reference name |
year, publication |
title |
Knowledge |
Loss |
URL |
---|---|---|---|---|---|---|
|
reference name |
year, publication |
title |
Knowledge |
Loss |
URL |
1 | DD | 2014, NIPS | Do Deep Nets Really Need to be Deep? | logits | L2 | https://arxiv.org/abs/1312.6184v7 |
2 | KD | 2014, NIPS | Distilling the Knowledge in a Neural Network | softmax with T | cross entropy | https://arxiv.org/abs/1503.02531v1 |
3 | FitNet | 2015, ICLR | FitNets: Hints for Thin Deep Nets | feature maps | L2 | https://arxiv.org/abs/1412.6550 |
4 | NST | 2017 | Like What You Like: Knowledge Distill via Neuron Selectivity Transfer | distribution of activations | MMD | https://arxiv.org/abs/1707.01219v2 |
5 | FSP | 2017, CVPR | A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning | FSP matrix (inner product between feature maps from two layers) |
L2 | |
6 | DML | 2018, CVPR | Deep Mutual Learning | softmax | KL divergence | https://arxiv.org/abs/1706.00384v1 |
7 | ONE | 2018, NIPS | Knowledge Distillation by On-the-Fly Native Ensemble | softmax | KL divergence | https://arxiv.org/abs/1806.04606 |
8 | KDFM | 2018 | Knowledge Distillation with Feature Maps for Image Classification | feature maps & softmax | cross entropy & GAN | https://arxiv.org/abs/1812.00660 |
9 | FM | 2018 | Feature Matters: A Stage-by-Stage Approach for Knowledge Transfer | feature maps | L2 |