【LLM+知识蒸馏】Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and

最新推荐文章于 2025-02-17 16:43:50 发布

Arachis_X

最新推荐文章于 2025-02-17 16:43:50 发布

阅读量2.4k

点赞数 22

分类专栏： nlp 文章标签：语言模型人工智能自然语言处理

本文链接：https://blog.csdn.net/Arachis_X/article/details/136751802

版权

nlp 专栏收录该内容

24 篇文章

订阅专栏

研究提出Distillingstep-by-step，通过利用LLM生成的额外监督和多任务训练，用更少的标注数据和更小的模型尺寸提升性能。实验结果显示，该方法在NLP任务上超越大模型，且对资源要求远低于传统微调和蒸馏。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes 逐步蒸馏！以较少的训练数据和较小的模型规模超越较大的语言模型

2023. 7 ACL2023 Findings

论文地址
 代码地址
 论文笔记（LLM+蒸馏）：Distilling step-by-step+代码分析

在这里插入图片描述

Abstract

Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for training small models within a multi-task framework. We present three findings across 4 NLP benchmarks: First, compared to both finetuning and distillation, our mechanism achieves better performance with much fewer labeled/unlabeled training examples. Second, compared to few-shot prompted LLMs, we achieve better performance using substantially smaller model sizes. Third, we reduce both the model size and the amount of data required to outperform LLMs; our finetuned 770M T5 model outperforms the few-shot prompted 540B PaLM model using only 80% of available data on a benchmark, whereas standard finetuning the same T5 model struggles to match even by using 100% of the dataset. We release the code at: https://github.com/google-research/distilling-step-by-step

部署大型语言模型（LLM）具有挑战性，因为它们在实际应用中内存效率低、计算密集。

为此，研究人员通过使用人类标签进行微调或使用 LLM 生成的标签进行蒸馏，来训练针对特定任务的较小模型。

然而，微调和蒸馏需要大量的训练数据，才能达到与 LLM 相当的性能。

我们逐步引入蒸馏法，这是一种新的机制，