https://arxiv.org/pdf/2408.11796
英伟达仅用380B tokens训练刷新8B模型新SoTA,剪枝和蒸馏应该这么用-CSDN博客
论文:LLM Pruning and Distillation in Practice
https://arxiv.org/pdf/2408.11796
英伟达仅用380B tokens训练刷新8B模型新SoTA,剪枝和蒸馏应该这么用-CSDN博客
论文:LLM Pruning and Distillation in Practice