知识蒸馏

最新推荐文章于 2024-04-16 21:57:23 发布

hi我是大嘴巴

最新推荐文章于 2024-04-16 21:57:23 发布

阅读量603

点赞数

本文链接：https://blog.csdn.net/weixin_38740463/article/details/97116797

版权

知识蒸馏是一种模型压缩技术，通过将大型复杂模型的知识转移给小型模型，以提高推理速度。它涉及如何定义知识和选择损失函数来衡量学生模型与教师模型的相似性。常用方法包括直接模仿教师模型的logits、使用软标签和特征映射约束等。教师模型通过指导学生模型学习其输出，促进模型优化。

摘要由CSDN通过智能技术生成

知识蒸馏是模型压缩的一种方法，指将复杂的、参数量大的模型的特征表达能力迁移到简单的、参数量小的模型中，以加快模型的推理。

知识蒸馏的关键两点：一是如何定义知识，二是使用什么损失函数来度量student网络和teacher 网络之间的相似度。

方法介绍

	reference name	year, publication	title	Knowledge	Loss	URL
	reference name	year, publication	title	Knowledge	Loss	URL
1	DD	2014, NIPS	Do Deep Nets Really Need to be Deep?	logits	L2	https://arxiv.org/abs/1312.6184v7
2	KD	2014, NIPS	Distilling the Knowledge in a Neural Network	softmax with T	cross entropy	https://arxiv.org/abs/1503.02531v1
3	FitNet	2015, ICLR	FitNets: Hints for Thin Deep Nets	feature maps	L2	https://arxiv.org/abs/1412.6550
4	NST	2017	Like What You Like: Knowledge Distill via Neuron Selectivity Transfer	distribution of activations	MMD	https://arxiv.org/abs/1707.01219v2
5	FSP	2017, CVPR	A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning	FSP matrix (inner product between feature maps from two layers)	L2	http://openaccess.thecvf.com/ conte nt_cvpr_2017/papers /Yim_A_Gift_From_CVPR_2017_paper.pdf
6	DML	2018, CVPR	Deep Mutual Learning	softmax	KL divergence	https://arxiv.org/abs/1706.00384v1
7	ONE	2018, NIPS	Knowledge Distillation by On-the-Fly Native Ensemble	softmax	KL divergence	https://arxiv.org/abs/1806.04606
8	KDFM	2018	Knowledge Distillation with Feature Maps for Image Classification	feature maps & softmax	cross entropy & GAN	https://arxiv.org/abs/1812.00660
9	FM	2018	Feature Matters: A Stage-by-Stage Approach for Knowledge Transfer	feature maps	L2