Contents References TL;DR. 作者把 Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning 的自蒸馏重构 SFT 数据集的方法用在了恢复层剪枝模型上 (剪掉连续 n n n 个 blocks,层剪枝指标采用 angular cosine metric),相比 SFT 有较大精度提升 References Thangarasa, Vithursan, et al. “Self-Data Distillation for Recovering Quality in Pruned Large Language Models.” arXiv preprint arXiv:2410.09982 (2024).