如何在24G显存机器上搭建一个超过gpt效果的DeepSeek-R1？

最新推荐文章于 2025-05-12 20:09:11 发布

leader_song(小宋编码)

最新推荐文章于 2025-05-12 20:09:11 发布

阅读量958

点赞数 16

文章标签： gpt

本文链接：https://blog.csdn.net/leader_song/article/details/147755997

版权

DeepSeek-R1蒸馏模型概述与应用指南

![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/5af5a8b13de14bd1a95837bbf1ccaf2a.png#pic_center)

引言

DeepSeek-R1作为一款先进的AI推理模型，在性能上已超越GPT-4o和Claude-3.5等主流开源模型。为满足更广泛应用需求，推出了基于不同架构的精简版模型，旨在提供高性能同时兼顾计算效率。

模型架构与变体

本系列提供以下六种精简版模型：

Qwen架构系列

+ DeepSeek-R1-Distill-Qwen-1.5B + DeepSeek-R1-Distill-Qwen-7B + DeepSeek-R1-Distill-Qwen-14B + DeepSeek-R1-Distill-Qwen-32B

Llama架构系列

+ DeepSeek-R1-Distill-Llama-8B + DeepSeek-R1-Distill-Llama-70B

性能概览

各精简模型在关键基准测试中表现优异：

模型优势

1. ** 高效性** ：精简设计，计算效率显著提升。 2. ** 强推理能力** ：继承自DeepSeek-R1的核心算法。 3. ** 开源开放** ：方便开发者自由使用和扩展。

与其他模型对比

与同类强化学习训练模型相比，我们的蒸馏方法：

计算成本更低
性能表现更优

例如，DeepSeek-R1-Distill-Qwen-32B精简版在AIME测试中优于同规模的强化学习版本。

使用指南

方法一：Ollama平台部署

```plain ollama run deepseek-r1:32b ```

方法二：vLLM框架运行

```css vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \

–tensor-parallel-size 2 \

–max-model-len 32768 \

–enforce-eager


<h3 id="4c7c805c"><font style="color:rgb(51, 51, 51);">模型显卡配置表</font></h3>
![](https://i-blog.csdnimg.cn/img_convert/ad1071c7dceef494a902054efccd638d.webp?x-oss-process=image/format,png)