如何使用LLM进行知识蒸馏：从GPT-4到GPT-3.5的模型微调

最新推荐文章于 2024-09-25 11:58:15 发布

llzwxh888

最新推荐文章于 2024-09-25 11:58:15 发布

阅读量507

点赞数 4

文章标签： gpt-3 java 人工智能 python

本文链接：https://blog.csdn.net/ppoojjj/article/details/140665653

版权

在这篇文章中，我们将探讨如何使用大语言模型（LLM）进行知识蒸馏。具体来说，我们将展示如何使用llama_index库从GPT-4 Judge模型中蒸馏知识到GPT-3.5 Judge模型。以下是我们将要进行的步骤：

生成数据集：训练集和测试集
进行知识蒸馏
评估微调后的GPT-3.5 Judge模型在测试数据集上的表现

第一步：生成数据集

首先，我们需要生成训练和测试数据集。我们将使用WikipediaReader读取多个城市的历史页面，并生成相关的问题和答案。

import os
import nest_asyncio
from llama_index.readers.wikipedia import WikipediaReader
from llama_index.core.evaluation import DatasetGenerator
from llama_index.llms.openai import OpenAI

nest_asyncio.apply()

# 定义所需的城市
cities = ["San Francisco", "Toronto", "New York", "Vancouver", "Montreal", "Tokyo", "Singapore", "Paris"]

# 使用WikipediaReader加载数据
documents = WikipediaReader().load_data(pages=[f"History of {x}" for x in cities])

# 定义问题生成提示
QUESTION_GEN_PROMPT = (
    "You are a Teacher/ Professor. Your task is to setup "
    "a quiz/examination. Using the provided context, formulate "
    "a single question that captures an important fact from the "
    "context. Restrict the question to the context information provided."
)

# 生成问题
gpt_35_llm = OpenAI(model="gpt-3.5-turbo", temperature=0.3)
dataset_generator = DatasetGenerator.from_documents(documents, question_gen_query=QUESTION_GEN_PROMPT, llm=gpt_35_llm, num_questions_per_chunk=25)
qrd = dataset_generator.generate_dataset_from_nodes(num=350)

第二步：进行知识蒸馏

我们将使用GPT-4 Judge模型评估Llama-2生成的答案，并微调GPT-3.5模型以接近GPT-4的评估。

from llama_index.llms.openai import OpenAI
from llama_index.finetuning.callbacks import OpenAIFineTuningHandler
from llama_index.core.callbacks import CallbackManager
from llama_index.core.evaluation import CorrectnessEvaluator
from llama_index.finetuning import OpenAIFinetuneEngine

# 初始化GPT-4 Judge模型
finetuning_handler = OpenAIFineTuningHandler()
callback_manager = CallbackManager([finetuning_handler])
gpt_4_llm = OpenAI(temperature=0, model="gpt-4", callback_manager=callback_manager)
gpt4_judge = CorrectnessEvaluator(llm=gpt_4_llm)

# 评估训练数据集
for data_entry in train_dataset:
    eval_result = await gpt4_judge.aevaluate(
        query=data_entry["question"],
        response=data_entry["response_data"]["text"],
        context=data_entry["response_data"]["context"],
        reference=data_entry["reference"],
    )
    judgement = {"llm": "gpt_4", "score": eval_result.score, "text": eval_result.response}
    data_entry["evaluations"] = [judgement]

finetuning_handler.save_finetuning_events("correction_finetuning_events.jsonl")

# 微调GPT-3.5模型
finetune_engine = OpenAIFinetuneEngine("gpt-3.5-turbo", "correction_finetuning_events.jsonl")
finetune_engine.finetune()

第三步：评估微调后的模型

微调完成后，我们将评估微调后的GPT-3.5模型在测试数据集上的表现，并与GPT-4的评估结果进行比较。

import numpy as np

# 评估微调后的GPT-3.5模型
ft_llm = finetune_engine.get_finetuned_model()
ft_gpt_3p5_judge = CorrectnessEvaluator(llm=ft_llm)
for data_entry in test_dataset:
    eval_result = await ft_gpt_3p5_judge.aevaluate(
        query=data_entry["question"],
        response=data_entry["response_data"]["text"],
        context=data_entry["response_data"]["context"],
        reference=data_entry["reference"],
    )
    judgement = {"llm": "ft_gpt_3p5", "score": eval_result.score, "text": eval_result.response}
    data_entry["evaluations"] += [judgement]

# 计算相关性
scores = {"gpt_4": [], "gpt_3p5": [], "ft_gpt_3p5": []}
for d in test_dataset:
    for e in d["evaluations"]:
        scores[e["llm"]].append(e["score"])

np_scores_gpt_4 = np.array(scores["gpt_4"])
np_scores_gpt_3p5 = np.array(scores["gpt_3p5"])
np_scores_ft_gpt_3p5 = np.array(scores["ft_gpt_3p5"])

corr_ft = np.corrcoef(np_scores_gpt_4, np_scores_ft_gpt_3p5)[0, 1]
corr_no_ft = np.corrcoef(np_scores_gpt_4, np_scores_gpt_3p5)[0, 1]

print(f"微调后的GPT-3.5模型与GPT-4的相关性: {corr_ft}")
print(f"未微调的GPT-3.5模型与GPT-4的相关性: {corr_no_ft}")