手把手教学，DeepSeek-R1微调全流程拆解

最新推荐文章于 2025-05-08 10:26:45 发布

大模型老炮

最新推荐文章于 2025-05-08 10:26:45 发布

阅读量3.7k

点赞数 24

文章标签：职场和发展人工智能 Deepseek 学习大模型大模型教程 AI大模型

本文链接：https://blog.csdn.net/2401_85375151/article/details/145750009

版权

DeepSeek 通过发布其开源推理模型 DeepSeek-R1 颠覆了 AI 格局，该模型使用创新的强化学习技术，以极低的成本提供与 OpenAI 的 o1 相当的性能。

更令人印象深刻的是，DeepSeek 已将其推理能力提炼成几个较小的模型。这篇文章，我们将使用其蒸馏版本之一引导大家完成 DeepSeek-R1 的整个微调过程。

本文章将演示了如何微调其中一个模型（使用我们自己的自定义思维链数据集），然后保存和部署微调后的模型。

高级推理模型微调

DeepSeek 简介

DeepSeek-R1 是由深度求索（DeepSeek）公司开发的突破性推理模型。DeepSeek-R1 基于 DeepSeek-V3-Base（总共 671B 个参数，每次推理 37B 处于活动状态）构建，使用强化学习（RL）在提供最终答案之前生成思路链（CoT）。

为了使这些功能更易于访问，DeepSeek 将其 R1 输出提炼成几个较小的模型：

基于 Qwen 的蒸馏模型：1.5B、7B、14B 和 32B
基于 Llama 的蒸馏模型：8B 和 70B

注意：对于 14B 模型，正确的变体是 DeepSeek-R1-Distill-Qwen-14B。

1）为什么 DeepSeek-R1 越来越受欢迎

DeepSeek-R1 因其性能、可访问性和成本效益的结合而在 AI 社区中迅速受到关注。以下是它成为开发人员和研究人员首选的原因：

**开源可用性：**完全开源，允许不受限制地使用、修改和分发。
**具有成本效益的培训：**训练成本仅为 500 万美元，仅为大型语言模型成本的一小部分。
**强化学习和 CoT 推理：**采用先进的强化学习技术来开发思维链推理。
**高效蒸馏：**Distilled 模型在资源效率高的同时保持了强大的推理能力。
**活跃的社区和生态系统：**不断增长的工具、微调模型和社区驱动型资源的生态系统。

2）DeepSeek-R1 与 OpenAI 的 O3-Mini-High Reasoning 模型有何不同

虽然 DeepSeek-R1 和 OpenAI 的 O3-Mini-High 推理模型都是为高级问题解决而设计的，但它们有很大的不同：

a. 开源与专有：

DeepSeek-R1：完全开源。
OpenAI O3-Mini-High：专有的，有使用限制。

b. 费用和可访问性：

DeepSeek-R1：培训和作成本更低。
OpenAI O3-Mini-High：API 费用导致运营成本较高。

c. 性能和效率：

DeepSeek-R1：使用 RLHF 和 CoT 推理实现高效资源使用。
OpenAI O3-Mini-High：封闭的自然限制了优化洞察。

d. 社区和生态系统支持：

DeepSeek-R1：在 Hugging Face 上通过微调模型不断壮大的社区。
OpenAI O3-Mini-High：通过 OpenAI 的生态系统提供强大支持，但受到专有限制的限制。

这些差异使 DeepSeek-R1 成为没有专有限制的高推理性能的有吸引力的替代方案。

下面是微调 DeepSeek-R1 以进行高级推理的完整过程。

01 环境设置和身份验证

a. 安装依赖项

使用具有 GPU 访问权限的首选环境。run：

!pip install unsloth``!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

说明：这将安装 Unsloth，一个可加快微调速度（速度提高 2× 并减少 70% 内存使用量的框架）。

在这里插入图片描述

b. 登录Hugging Face和Weights & Biases

安全地检索 API Token：

from huggingface_hub import login``   ``hf_token = “your_huggingface_token”``login（hf_token）

然后，初始化权重和偏差（wandb）：

import wandb``   ``wb_token = “your_wandb_token”``wandb.login（key=wb_token）``run = wandb.init（` `project='合法 COT 数据集上的 Fine-tune-DeepSeek-R1-Distill-Qwen-14B'，``  job_type=“training”，  `` anonymous=“allow”``)

说明：这些步骤可确保安全的模型下载和实验跟踪。

02 加载 Model 和 Tokenizer

使用具有 4 位量化的 Unsloth 加载蒸馏的 14B 模型，DeepSeek-R1-Distill-Qwen-14B：

from unsloth import FastLanguageModel``   ``max_seq_length = 2048` `dtype = None` `load_in_4bit = True``model， tokenizer = FastLanguageModel.from_pretrained（` `model_name = “unsloth/DeepSeek-R1-Distill-Qwen-14B”，` `max_seq_length = max_seq_length，` `dtype = dtype，` `load_in_4bit = load_in_4bit，` `token = hf_token,``)

说明：为最多 2048 个 tokens 的序列配置模型，并使用 4-bit 量化来提高内存效率。

03 预微调推理

使用法律推理提示测试模型的基准性能。

定义 Prompt 并运行推理

prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.` `Write a response that appropriately completes the request.` `Before answering, think carefully about the question and create a step-by-step chain of thought to ensure a logical and accurate response.``   ``### Instruction:``You are a legal expert with advanced knowledge in legal reasoning, case analysis, and interpretation of laws.` `Please answer the following legal question.``### Question:``{}``### Response:``<think>{}"""``   ``question = "A contract was signed between two parties, but one party claims they were under duress. What legal principles apply to determine the contract's validity?"``FastLanguageModel.for_inference(model)``inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")``outputs = model.generate(`    `input_ids=inputs.input_ids,`    `attention_mask=inputs.attention_mask,`    `max_new_tokens=1200,`    `use_cache=True,``)``response = tokenizer.batch_decode(outputs)``print(response[0].split("### Response:")[1])

说明：生成一个响应，其中包括模型的思路链，后跟其最终答案。

04 准备训练数据

加载并格式化数据集（这里我们使用 legal chain-of-mind 数据集）。

更新提示模板

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.` `Write a response that appropriately completes the request.` `Before answering, think carefully about the question and create a step-by-step chain of thought to ensure a logical and accurate response.``   ``### Instruction:``You are a legal expert with advanced knowledge in legal reasoning, case analysis, and interpretation of laws.` `Please answer the following legal question.``### Question:``{}``### Response:``<think>``{}``</think>``{}"""``EOS_TOKEN = tokenizer.eos_token

定义 formatting 函数

def formatting_prompts_func(examples):`    `inputs = examples["Question"]`    `cots = examples["Complex_CoT"]`    `outputs = examples["Response"]`    `texts = []`    `for input_text, cot, output_text in zip(inputs, cots, outputs):`        `text = train_prompt_style.format(input_text, cot, output_text) + EOS_TOKEN`        `texts.append(text)`    `return {"text": texts}

加载和映射数据集

from datasets import load_dataset``dataset = load_dataset("kienhoang123/QR-legal", "en", split="train[0:500]", trust_remote_code=True)``dataset = dataset.map(formatting_prompts_func, batched=True)``print(dataset["text"][0])``   ``####Note: This is a pseudo dataset. Please create or use an appropriate dataset for your own use case.

说明：使用问题、详细的思路和最终答案来格式化每个训练示例，并附加 EOS 令牌。

05 设置 LoRA 以进行微调

使用 LoRA （Low-Rank Adaptation）通过仅适配关键层来有效地微调模型：

model = FastLanguageModel.get_peft_model(`    `model,`    `r=16,`    `target_modules=[`        `"q_proj",`        `"k_proj",`        `"v_proj",`        `"o_proj",`        `"gate_proj",`        `"up_proj",`        `"down_proj",`    `],`    `lora_alpha=16,`    `lora_dropout=0,`    `bias="none",`    `use_gradient_checkpointing="unsloth",`    `random_state=3407,`    `use_rslora=False,`    `loftq_config=None,``)

说明：将 LoRA 适配器应用于关键投影层，从而减少微调期间的内存和计算要求。

06 配置和运行训练过程

从 TRL 初始化 SFTTrainer 以及相应的训练参数。

from trl import SFTTrainer``from transformers import TrainingArguments``from unsloth import is_bfloat16_supported``   ``trainer = SFTTrainer(`    `model=model,`    `tokenizer=tokenizer,`    `train_dataset=dataset,`    `dataset_text_field="text",`    `max_seq_length=max_seq_length,`    `dataset_num_proc=2,`    `args=TrainingArguments(`        `per_device_train_batch_size=2,`        `gradient_accumulation_steps=4,`        `warmup_steps=5,`        `max_steps=60,`        `learning_rate=2e-4,`        `fp16=not is_bfloat16_supported(),`        `bf16=is_bfloat16_supported(),`        `logging_steps=10,`        `optim="adamw_8bit",`        `weight_decay=0.01,`        `lr_scheduler_type="linear",`        `seed=3407,`        `output_dir="outputs",`    `),``)

开始训练：

trainer_stats = trainer.train()

说明：此配置使用小批量和有限的演示步骤。根据需要进行调整以进行全面微调。

07 微调后的推理

使用相同的提示结构测试微调后的模型。

question = "A contract was signed between two parties, but one party claims they were under duress. What legal principles apply to determine the contract’s validity?"``   ``   ``FastLanguageModel.for_inference(model)``inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")``outputs = model.generate(`    `input_ids=inputs.input_ids,`    `attention_mask=inputs.attention_mask,`    `max_new_tokens=1200,`    `use_cache=True,``)``response = tokenizer.batch_decode(outputs)``print(response[0].split("### Response:")[1])

说明：输出应具有简洁的思路链和清晰的最终答案。

08 保存和发布微调模型

本地保存

new_model_local = "DeepSeek-R1-Legal-COT"``model.save_pretrained(new_model_local)``tokenizer.save_pretrained(new_model_local)``model.save_pretrained_merged(new_model_local, tokenizer, save_method="merged_16bit")

推送到 Hugging Face Hub

new_model_online = "yourusername/DeepSeek-R1-Legal-COT"``model.push_to_hub(new_model_online)``tokenizer.push_to_hub(new_model_online)``model.push_to_hub_merged(new_model_online, tokenizer, save_method="merged_16bit")

说明：替换为您的实际存储库名称。合并版本集成了 LoRA 适配器，以便于部署。

“yourusername/DeepSeek-R1-Legal-COT”

09 在 Ollama 中使用微调模型

要将微调模型 DeepSeek-R1-Legal-COT 与 Ollama 结合使用，请执行以下步骤：

准备模型文件

确保您的微调模型以 SafeTensors 格式保存。
将模型文件组织到系统上的目录中。

创建 Modelfile

在包含模型文件的目录中，创建一个名为（不带任何扩展名）的文件。Modelfile
将以下行添加到：Modelfile

FROM /path/to/base/model``ADAPTER .

替换为您在微调过程中使用的基础模型的路径。/path/to/base/model
该行表示适配器（您的微调模型）位于当前目录中。ADAPTER .

使用 Ollama 构建模型

打开终端并导航到包含。Modelfile
执行以下命令，在 Ollama 中创建模型。

ollama create deepseek-r1-legal-cot

此命令将构建模型并使其可在 Ollama 中使用。

运行模型

成功创建模型后，你可以使用以下方法与模型进行交互：

ollama run deepseek-r1-legal-cot

此命令允许您输入提示并接收来自微调模型的响应。

10 其他注意事项

型号兼容性：确保中指定的基本模型与微调期间使用的基本模型匹配，以避免出现兼容性问题。Modelfile
量化：如果您希望优化模型的性能，请考虑在步骤中对其进行量化。例如：ollama create

ollama create --quantize q4_K_M deepseek-r1-legal-cot

此命令对模型进行量化，以减少内存使用并可能提高推理速度。

有关将模型和适配器导入 Ollama 的详细信息，请参阅官方 Ollama 文档。

https://github.com/ollama/ollama/blob/main/docs/import.md

11 其他提示和建议

硬件设置：使用至少具有 24–32GB VRAM 的 GPU。
数据预处理：确保您的数据集包含字段、和 .“Question”“Complex_CoT”“Response”
超参数优化：根据数据集大小调整和 epochs。max_steps
监控训练：使用 wandb 控制面板跟踪损失和指标。
LoRA 洞察：LoRA 仅调整关键投影层，从而减少内存使用。
部署：如果需要，将模型转换为 GGUF 等格式以进行本地部署。

Last but not least

DeepSeek-R1 代表了以推理为中心的 AI 的新时代。通过将高效的强化学习与监督式微调和蒸馏相结合，DeepSeek 生成的模型可与专有系统相媲美，同时具有开源性和成本效益。本指南将引导您完成每个步骤，从设置环境和加载模型，到数据准备和基于 LoRA 的微调，再到推理和部署。

对于 14B 蒸馏模型，请记住，正确的名称是 DeepSeek-R1-Distill-Qwen-14B。借助这些详细说明，我们现在可以微调和部署高性能推理模型，即使在适度的硬件上也是如此，从而为创新的 AI 应用程序铺平道路。

AI大模型学习路线

如果你对AI大模型入门感兴趣，那么你需要的话可以点击这里大模型重磅福利：入门进阶全套104G学习资源包免费分享！

扫描下方csdn官方合作二维码获取哦！

在这里插入图片描述

这是一份大模型从零基础到进阶的学习路线大纲全览，小伙伴们记得点个收藏！

请添加图片描述
第一阶段： 从大模型系统设计入手，讲解大模型的主要方法；

第二阶段： 在通过大模型提示词工程从Prompts角度入手更好发挥模型的作用；

第三阶段： 大模型平台应用开发借助阿里云PAI平台构建电商领域虚拟试衣系统；

第四阶段： 大模型知识库应用开发以LangChain框架为例，构建物流行业咨询智能问答系统；

第五阶段： 大模型微调开发借助以大健康、新零售、新媒体领域构建适合当前领域大模型；

第六阶段： 以SD多模态大模型为主，搭建了文生图小程序案例；

第七阶段： 以大模型平台应用与开发为主，通过星火大模型，文心大模型等成熟大模型构建大模型行业应用。

100套AI大模型商业化落地方案

请添加图片描述

大模型全套视频教程

请添加图片描述

200本大模型PDF书籍

请添加图片描述

👉学会后的收获：👈

• 基于大模型全栈工程实现（前端、后端、产品经理、设计、数据分析等），通过这门课可获得不同能力；

• 能够利用大模型解决相关实际项目需求：大数据时代，越来越多的企业和机构需要处理海量数据，利用大模型技术可以更好地处理这些数据，提高数据分析和决策的准确性。因此，掌握大模型应用开发技能，可以让程序员更好地应对实际项目需求；

• 基于大模型和企业数据AI应用开发，实现大模型理论、掌握GPU算力、硬件、LangChain开发框架和项目实战技能，学会Fine-tuning垂直训练大模型（数据准备、数据蒸馏、大模型部署）一站式掌握；

• 能够完成时下热门大模型垂直领域模型训练能力，提高程序员的编码能力：大模型应用开发需要掌握机器学习算法、深度学习框架等技术，这些技术的掌握可以提高程序员的编码能力和分析能力，让程序员更加熟练地编写高质量的代码。

LLM面试题合集

请添加图片描述

大模型产品经理资源合集

请添加图片描述

大模型项目实战合集

请添加图片描述

👉获取方式：
😝有需要的小伙伴，可以保存图片到wx扫描二v码免费领取【保证100%免费】🆓