注:本文用作相关领域的调研和学习,包含大量链接,如有侵权请联系作者删除
一、知识蒸馏
总览:
1. 知识蒸馏是什么?一份入门随笔
https://zhuanlan.zhihu.com/p/90049906
- 最后提到了延伸工作:Sequence-Level Knowledge Distillation 论文解读
2. 如何理解soft target这一做法?
https://www.zhihu.com/question/50519680/answer/136406661
- 第二条提到了generalized distillation,Unifying distillation and privileged information(没搞懂)
3. Knowledge Distillation(知识蒸馏,DistillBERT,Theseus)
https://blog.csdn.net/qq_39388410/article/details/103857064
知识延伸:
- Distilling the Knowledge in a Neural Network
- FITNETS:Hints for Thin Deep Nets 论文解读
- A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning
- Deep Mutual Learning
- KDGAN: Knowledge Distillation with Generative Adversarial Networks (没读懂)
- DistillBERT
- BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
- Ensembled CTR Prediction via Knowledge Distillation(在推荐系统里的应用)
4. 白话Distillation
https://zhuanlan.zhihu.com/p/115571514
5. 知识蒸馏简述(一)
https://zhuanlan.zhihu.com/p/92166184
6. 知识蒸馏简述(二)
https://zhuanlan.zhihu.com/p/92269636?from_voters_page=true
7. 一文带你总览知识蒸馏,详解经典论文
https://www.sohu.com/a/364718676_99979179
二、知识蒸馏 in MT
自回归MT
- Sequence-Level Knowledge Distillation 论文解读
- Autoregressive Knowledge Distillation through Imitation Learning. 2020 ACL
- Acquiring Knowledge from Pre-trained Model to Neural Machine Translation. Weng, Rongxiang et al. AAAI 2020
非自回归MT
- Non-Autoregressive Neural Machine Translation 论文解读
- Understanding Knowledge Distillation in Non-autoregressive Machine Translation. arXiv:1911.02727
- Imitation Learning for Non-Autoregressive Neural Machine Translation
多语言MT
- Multilingual Neural Machine Translation with Knowledge Distillation. 2019 ICLR
- Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation. 2020 ACL
低资源MT
- Collective Wisdom:Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation.2020 coling
- Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation. 2020 IEEE
领域相关
- Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation. 2020 ACL
- Building a Multi-Domain Neural Machine Translation Model Using Knowledge Distillation. 2020 ECAI
other
- Future-Aware Knowledge Distillation for Neural Machine Translation
- Insertion Transformer: Flexible Sequence Generation via Insertion Operations
- 浅析基于隐变量的非自回归神经机器翻译方法
https://www.leiphone.com/category/academic/FQ1HdiHYBcr5EX7z.html - 博客延伸:Hint-based training for non-autoregressive translation
- 博客延伸:Imitation Learning for Non-Autoregressive Neural Machine Translation
- 论文阅读——Aligned Cross Entropy for Non-Autoregressive Machine Translationhttps://blog.csdn.net/liuy9803/article/details/105505965/
- TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing
- DE-RRD: A Knowledge Distillation Framework for Recommender System
- UNSUPERVISED NEURAL MACHINE TRANSLATION
四、KD的论文合集
https://www.ctolib.com/FLHonker-Awesome-Knowledge-Distillation.html
五、其他
语义相似度 SimCSE: Simple Contrastive Learning of Sentence Embeddings
句子重构 Distantly Supervised Relation Extraction with Sentence Reconstruction and Knowledge Base Priors
生成对抗网络
https://easyai.tech/ai-definition/gan/
https://zhuanlan.zhihu.com/p/106717106
六、基础知识
交叉熵、相对熵和负对数似然的理解
如何通俗的理解beam search?
Pretraning in NLP(预训练ELMo,GPT,BERT,XLNet)