目录
2. FitNet: Hints for thin deep nets
6. VID: Variational Information Distillation
7. RKD: Relation Knowledge Distillation
8. PKT:Probabilistic Knowledge Transfer
11. FSP: Flow of Solution Procedure
12. NST: Neuron Selectivity Transfer
13. CRD: Contrastive Representation Distillation
1. KD: Knowledge Distillation
全称:Distilling the Knowledge in a Neural Network
链接:https://arxiv.org/pdf/1503.02531.pdf
发表:NIPS14
最经典的,也是明确提出知识蒸馏概念的工作,通过使用带温度的softmax函数来软化教师网络的逻辑层输出作为学生网络的监督信息,
使用KL divergence来衡量学生网络与教师网络的差异,具体流程如下图所示