【GitHub】知识蒸馏从入门到精通

最新推荐文章于 2022-11-12 06:00:00 发布

zenRRan

最新推荐文章于 2022-11-12 06:00:00 发布

阅读量2.4k

点赞数 1

文章标签：算法人工智能机器学习编程语言 css

点击上方，选择星标，每天给你送干货！

转载自 | 专知

【导读】知识蒸馏（Knowledge Distilling）是由大神Geoffrey Hinton、Oriol Vinyals、Jeff Dean在NIPS2015上提出的。作为模型压缩的一种方法，知识蒸馏能够利用已经训练的一个较复杂的模型，来指导一个较轻量的模型训练，从而在减小模型大小和计算资源的同时，尽量保持原始大模型的准确率的方法。随着越来越多的AI算法落地工业界，知识蒸馏在大量工业场景上发光发热。Github上的dkozlov同学，整理了Knowledge Distilling的paper、教程、代码，看完这些资料，你一定有所收获。

Github地址：

https://github.com/dkozlov/awesome-knowledge-distillation

作者：

dkozlov

【文章列表】

Combining labeled and unlabeled data with co-training, A. Blum, T. Mitchell, 1998
Model Compression, Rich Caruana, 2006
Dark knowledge, Geoffrey Hinton , OriolVinyals & Jeff Dean, 2014
Learning with Pseudo-Ensembles, Philip Bachman, Ouais Alsharif, Doina Precup, 2014
Distilling the Knowledge in a Neural Network, Hinton, J.Dean, 2015
Cross Modal Distillation for Supervision Transfer, Saurabh Gupta, Judy Hoffman, Jitendra Malik, 2015
Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization, Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li, Leonid Sigal, 2015
Distilling Model Knowledge, George Papamakarios, 2015
Unifying distillation and privileged information, David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik, 2015
Learning Using Privileged Information: Similarity Control and Knowledge Transfer, Vladimir Vapnik, Rauf Izmailov, 2015
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks, Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, Ananthram Swami, 2016
Do deep convolutional nets really need to be deep and convolutional?, Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, Matt Richardson, 2016
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, Sergey Zagoruyko, Nikos Komodakis, 2016
FitNets: Hints for Thin Deep Nets, Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio, 2015
Deep Model Compression: Distilling Knowledge from Noisy Teachers, Bharat Bhusan Sau, Vineeth N. Balasubramanian, 2016
Knowledge Distillation for Small-footprint Highway Networks, Liang Lu, Michelle Guo, Steve Renals, 2016
Sequence-Level Knowledge Distillation, deeplearning-papernotes, Yoon Kim, Alexander M. Rush, 2016
MobileID: Face Model Compression by Distilling Knowledge from Neurons, Ping Luo, Zhenyao Zhu, Ziwei Liu, Xiaogang Wang and Xiaoou Tang, 2016
Recurrent Neural Network Training with Dark Knowledge Transfer, Zhiyuan Tang, Dong Wang, Zhiyong Zhang, 2016
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, Sergey Zagoruyko, Nikos Komodakis, 2016
Adapting Models to Signal Degradation using Distillation, Jong-Chyi Su, Subhransu Maji,2016
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, Antti Tarvainen, Harri Valpola, 2017
Data-Free Knowledge Distillation For Deep Neural Networks, Raphael Gontijo Lopes, Stefano Fenu, 2017
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer, Zehao Huang, Naiyan Wang, 2017
Learning Loss for Knowledge Distillation with Conditional Adversarial Networks, Zheng Xu, Yen-Chang Hsu, Jiawei Huang, 2017
DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang, 2017
Knowledge Projection for Deep Neural Networks, Zhi Zhang, Guanghan Ning, Zhihai He, 2017
Moonshine: Distilling with Cheap Convolutions, Elliot J. Crowley, Gavin Gray, Amos Storkey, 2017
Local Affine Approximators for Improving Knowledge Transfer, Suraj Srinivas and Francois Fleuret, 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model, Jiasen Lu1, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra 2017
Learning Efficient Object Detection Models with Knowledge Distillation, Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, Manmohan Chandraker, 2017
Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification, Chong Wang, Xipeng Lan and Yangang Zhang, 2017
Learning Transferable Architectures for Scalable Image Recognition, Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le, 2017
Revisiting knowledge transfer for training object class detectors, Jasper Uijlings, Stefan Popov, Vittorio Ferrari, 2017
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning, Junho Yim, Donggyu Joo, Jihoon Bae, Junmo Kim, 2017
Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net, Zihao Liu, Qi Liu, Tao Liu, Yanzhi Wang, Wujie Wen, 2017
Data Distillation: Towards Omni-Supervised Learning, Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He, 2017
Interpreting Deep Classifiers by Visual Distillation of Dark Knowledge, Kai Xu, Dae Hoon Park, Chang Yi, Charles Sutton, 2018
Efficient Neural Architecture Search via Parameters Sharing, Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean, 2018
Transparent Model Distillation, Sarah Tan, Rich Caruana, Giles Hooker, Albert Gordo, 2018
Defensive Collaborative Multi-task Training - Defending against Adversarial Attack towards Deep Neural Networks, Derek Wang, Chaoran Li, Sheng Wen, Yang Xiang, Wanlei Zhou, Surya Nepal, 2018
Deep Co-Training for Semi-Supervised Image Recognition, Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, Alan Yuille, 2018
Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples, Zihao Liu, Qi Liu, Tao Liu, Yanzhi Wang, Wujie Wen, 2018
Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling, Abrar H. Abdulnabi, Bing Shuai, Zhen Zuo, Lap-Pui Chau, Gang Wang, 2018
Born Again Neural Networks, Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar, 2018
YASENN: Explaining Neural Networks via Partitioning Activation Sequences, Yaroslav Zharov, Denis Korzhenkov, Pavel Shvechikov, Alexander Tuzhilin, 2018
Knowledge Distillation with Adversarial Samples Supporting Decision Boundary, Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi, 2018
Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons, Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi, 2018
Self-supervised knowledge distillation using singular value decomposition, Seung Hyun Lee, Dae Ha Kim, Byung Cheol Song, 2018
Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection, Yongcheng Liu, Lu Sheng, Jing Shao, Junjie Yan, Shiming Xiang, Chunhong Pan, 2018
Learning to Steer by Mimicking Features from Heterogeneous Auxiliary Networks, Yuenan Hou, Zheng Ma, Chunxiao Liu, Chen Change Loy, 2018
Deep Face Recognition Model Compression via Knowledge Transfer and Distillation, Jayashree Karlekar, Jiashi Feng, Zi Sian Wong, Sugiri Pranata, 2019
Relational Knowledge Distillation, Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho, 2019
Graph-based Knowledge Distillation by Multi-head Attention Network, Seunghyun Lee, Byung Cheol Song, 2019
Knowledge Adaptation for Efficient Semantic Segmentation, Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan, 2019
Structured Knowledge Distillation for Semantic Segmentation, Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, Jingdong Wang, 2019

【视频教程】

Dark knowledge, Geoffrey Hinton, 2014
Model Compression, Rich Caruana, 2016

【代码实现】

MXNet

Bayesian Dark Knowledge

PyTorch

Attention Transfer
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Interpreting Deep Classifier by Visual Distillation of Dark Knowledge
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
Mean teachers are better role models
Neural Network Distiller by Intel AI Lab, distiller/knowledge_distillation.py
Relational Knowledge Distillation
Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons

Lua

Example for teacher/student-based learning

Torch

Distilling knowledge to specialist ConvNets for clustered classification
Sequence-Level Knowledge Distillation, Neural Machine Translation on Android
cifar.torch distillation

Theano

FitNets: Hints for Thin Deep Nets
Transfer knowledge from a large DNN or an ensemble of DNNs into a small DNN

Lasagne + Theano

Experiments-with-Distilling-Knowledge

Tensorflow

Deep Model Compression: Distilling Knowledge from Noisy Teachers
Distillation
An example application of neural network distillation to MNIST
Data-free Knowledge Distillation for Deep Neural Networks
Inspired by net2net, network distillation
Deep Reinforcement Learning, knowledge transfer
Knowledge Distillation using Tensorflow
Knowledge Distillation Methods with Tensorflow
Zero-Shot Knowledge Distillation in Deep Networks in ICML2019

Caffe

Face Model Compression by Distilling Knowledge from Neurons
KnowledgeDistillation Layer (Caffe implementation)
Knowledge distillation, realized in caffe
Cross Modal Distillation for Supervision Transfer
Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection

Keras

Knowledge distillation with Keras
keras google-vision's distillation
Distilling the knowledge in a Neural Network

更多内容，请访问该Github库。



说个正事哈

由于微信平台算法改版，公号内容将不再以时间排序展示，如果大家想第一时间看到我们的推送，强烈建议星标我们和给我们多点点【在看】。星标具体步骤为：（1）点击页面最上方“深度学习自然语言处理”，进入公众号主页。（2）点击右上角的小点点，在弹出页面点击“设为星标”，就可以啦。
感谢支持，比心。投稿或交流学习，备注：昵称-学校（公司）-方向，进入DL&NLP交流群。
方向有很多：机器学习、深度学习，python，情感分析、意见挖掘、句法分析、机器翻译、人机对话、知识图谱、语音识别等。记得备注呦

推荐两个专辑给大家：专辑 | 李宏毅人类语言处理2020笔记专辑 | NLP论文解读专辑 | 情感分析

整理不易，还望给个在看！

zenRRan

关注

1
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
【GitHub】知识蒸馏从入门到精通

点击上方，选择星标，每天给你送干货！转载自 | 专知【导读】知识蒸馏（Knowledge Distilling）是由大神Geoffrey Hinton、Oriol Vinyals、Jeff...
复制链接

扫一扫