国内第二个超百B的开源LLM——DeepSeek V2 236B

深度求索Deepseek近日发布了v2版本的模型,沿袭了1月发布的 Deepseek-MoE(混合专家模型)的技术路线,采用大量的小参数专家进行建模,同时在训练和推理上加入了更多的优化。它由 236B 个参数组成,其中 21B 个参数用于激活每个标记。与 DeepSeek 67B 相比,DeepSeek-V2 性能更强,同时节省了 42.5% 的训练成本,减少了 93.3% 的 KV 缓存,最大生成吞吐量提高到 5.76 倍。

在这里插入图片描述
在技术报告的开始,Deepseek团队用多个数字和两张图直观地概括了目前模型取得的效果。模型参数量方面达到236B ,同时由于模型小专家混合的特性,模型在推理时的激活参数很少,可以实现高推理速度。在通用能力的表现上,模型在MMLU多选题benchmark上拿到78.5 分,取得了第二名,Deepseek-V2在众多开源模型中表现仅次于70B 的 LLaMA3,超过了他们此前发布的V1代67B的非MoE模型。在成本效率方面,相比V1的稠密模型,V2模型节约了42.5%的训练成本,减少了推理时93.3%的 KV-cache 显存占用,将生成的吞吐量也提升到了原来的5.76倍。借助YaRN优化的长度外推训练方法,模型的上下文能力得以扩展到了128k大小。下面我们结合代码和技术报告,对Deepseek-V2模型进行详细的解读。

Base Model

BenchmarkDomainLLaMA3 70BMixtral 8x22BDeepSeek-V1 (Dense-67B)DeepSeek-V2 (MoE-236B)
MMLUEnglish78.977.671.378.5
BBHEnglish81.078.968.778.9
C-EvalChinese67.558.666.181.7
CMMLUChinese69.360.070.884.0
HumanEvalCode48.253.145.148.8
MBPPCode68.664.257.466.6
GSM8KMath83.080.363.479.2
MathMath42.242.518.743.6

Chat Model

BenchmarkDomainQWen1.5 72B ChatMixtral 8x22BLLaMA3 70B InstructDeepSeek-V1 Chat (SFT)DeepSeek-V2 Chat (SFT)DeepSeek-V2 Chat (RL)
MMLUEnglish76.277.880.371.178.477.8
BBHEnglish65.978.480.171.781.379.7
C-EvalChinese82.260.067.965.280.978.0
CMMLUChinese82.961.070.767.882.481.6
HumanEvalCode68.975.076.273.876.881.1
MBPPCode52.264.469.861.470.472.0
LiveCodeBench (0901-0401)Code18.825.030.518.328.732.5
GSM8KMath81.987.993.284.190.892.2
MathMath40.649.848.532.652.753.9

中文推理

模型开源/闭源总分中文推理中文语言
gpt-4-1106-preview闭源8.017.738.29
DeepSeek-V2 Chat (RL)开源7.917.458.35
erniebot-4.0-202404 (文心一言)闭源7.897.618.17
DeepSeek-V2 Chat (SFT)开源7.747.308.17
gpt-4-0613闭源7.537.477.59
erniebot-4.0-202312 (文心一言)闭源7.366.847.88
moonshot-v1-32k-202404 (月之暗面)闭源7.226.428.02
Qwen1.5-72B-Chat (通义千问)开源7.196.457.93
DeepSeek-67B-Chat开源6.435.757.11
Yi-34B-Chat (零一万物)开源6.124.867.38
gpt-3.5-turbo-0613闭源6.085.356.71

代码

文本生成

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/DeepSeek-V2"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# `max_memory` should be set based on your devices
max_memory = {i: "75GB" for i in range(8)}
# `device_map` cannot be set to `auto`
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

聊天机器人

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/DeepSeek-V2-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# `max_memory` should be set based on your devices
max_memory = {i: "75GB" for i in range(8)}
# `device_map` cannot be set to `auto`
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

messages = [
    {"role": "user", "content": "Write a piece of quicksort code in C++"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)

我原本以为阿里 Qwen 已经天下无敌,没想到杭州居然还有高手!哈哈哈!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值