【实战】基于阿里云百炼平台的知识蒸馏实战：用QwQ提升Qwen2.5-7B数学计算能力

小波才露尖尖角

已于 2025-05-15 22:45:41 修改

阅读量864

点赞数 22

文章标签：阿里云 LLM chatgpt

于 2025-05-15 21:16:42 首次发布

本文链接：https://blog.csdn.net/2504_90578160/article/details/147992198

版权

引言

在大模型应用落地的过程中，我们常常面临一个矛盾：大模型（如QwQ）拥有强大的能力但计算成本高昂，而小模型（如qwen2.5-7B）部署便捷却性能有限。知识蒸馏（Knowledge Distillation）技术正是解决这一矛盾的有效方法。

本文将详细介绍如何利用阿里云百炼平台，实现从QwQ到Qwen小模型的知识蒸馏全过程，显著提升小模型的计算效率与推理能力。

目标

通过知识蒸馏技术，利用教师模型的推理能力，生成推理数据，来对学生模型进行sft，从而提升学生模型在三位数乘法数学计算任务中的表现。

实验验证了仅通过生成推理类SFT数据和简单数据清洗，即可显著增强小模型的垂域能力。

知识蒸馏简介

什么是知识蒸馏？

知识蒸馏（Knowledge Distillation, KD）是一种迁移学习技术，通过让一个复杂的大模型（教师模型）指导轻量级的小模型（学生模型）学习，从而在减少参数量的同时保留性能。其核心思想是将教师模型的“知识”（如输出概率分布或精确答案）作为监督信号，训练学生模型。

本实验采用的蒸馏类型：QwQ（教师）生成精确答案作为学生模型Qwen2.5的训练标签。

实验设计与步骤

1. 数据生成

生成6000条三位数乘法题，确保问题唯一性，分为训练集（5500条）和测试集（500条）。

import json
import random

def generate_multiplication_questions(num_questions):
    questions = set()  # 使用集合确保唯一性
    while len(questions) < num_questions:
        num1 = random.randint(100, 999)
        num2 = random.randint(100, 999)
        question = f"{num1} * {num2} = ?"
        answer = num1 * num2
        questions.add((question, answer))
    return list(questions)

def main():
    num_questions = 6000
    questions_answers = generate_multiplication_questions(num_questions)
    data = [{"question": q, "answer": a} for q, a in questions_answers]

    with open("multiplication_questions.json", "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=4)

if __name__ == '__main__':
    main()

2. 模型蒸馏（教师模型生成标签）

通过阿里云DashScope API调用QwQ模型，批量生成问题的正确答案，输出到qwq_output.json中

友情提示：

百练平台大模型免费的token额度不多，所以如果想用免费的自己跑一跑链路不额外花钱的话，建议生成70条数据，60条训练，10条对比，如果算力足够，那么数据就6000条

import json
import time
from http import HTTPStatus
import dashscope

# 这里要自己去申请key
dashscope.api_key = 'sk-xxx'

def call_function(question):
    response = dashscope.Generation.call(
        model='qwq-32b-preview',
        messages=[{'role': 'user', 'content': question}],
        result_format='message',
        stream=True,
        incremental_output=True,
        temperature=0.01
    )
    
    if response.status_code != HTTPStatus.OK:
        print(response)
        return ["执行时出错", 0]

    full_content = ""
    first_bag = None
    start_time = time.time()
    for chunk in response:
        if chunk.status_code == HTTPStatus.OK:
            full_content += chunk.output.choices[0].message.content
            if first_bag is None:
                first_bag = time.time() - start_time

    return [full_content, first_bag]

def main():
    with open("multiplication_questions.json", "r", encoding="utf-8") as f:
        questions_data = json.load(f)

    results = []
    for i, item in enumerate(questions_data, start=1):
        question = item["question"]
        answer = item["answer"]
        try:
            response, response_time = call_function(question)
            if response_time is not None:
                results.append({
                    "Prompt": question,
                    "Completion": response,
                    "Answer": answer,
                    "ResponseTime": response_time
                })
                print(f"已处理 {i}/{len(questions_data)} 个问题")
        except Exception as e:
            print(f"处理问题 {question} 时出错: {str(e)}")
        # ⚠️⚠️注意百练平台大模型免费的token额度不多，所以如果想用免费的自己跑一跑链路不额外花钱的话，建议生成100条数据
        if i>70:
          break;

    with open("qwq_output.json", "w", encoding="utf-8") as f:
        json.dump(results, f, ensure_ascii=False, indent=4)

if __name__ == '__main__':
    main()