SmolLM2:适用于设备上应用的新型最佳小型模型

在这里插入图片描述

SmolLM2 是一个紧凑型语言模型系列,有三种尺寸:135M、360M 和 1.7B 参数。 它们既能解决各种任务,又足够轻便,可以在设备上运行。

在这里插入图片描述

1.7B 变体与其前身 SmolLM1-1.7B 相比有显著进步,尤其是在指令遵循、知识、推理和数学方面。 它使用不同的数据集组合在 11 万亿个词块上进行训练: FineWeb-Edu, DCLM, The Stack,以及我们策划并即将发布的新数学和编码数据集。 我们结合使用公共数据集和我们自己策划的数据集,通过监督微调(SFT)开发了指导版本。 然后,我们使用 UltraFeedback 进行了直接偏好优化 (DPO)。

借助 Argilla 开发的 Synth-APIGen-v0.1 等数据集,指导模型还支持文本改写、摘要和函数调用等任务。

基础预训练模型

MetricSmolLM2-1.7BLlama-1BQwen2.5-1.5BSmolLM1-1.7B
HellaSwag68.761.266.462.9
ARC (Average)60.549.258.559.9
PIQA77.674.876.176.0
MMLU-Pro (MCF)19.411.713.710.8
CommonsenseQA43.641.234.138.0
TriviaQA36.728.120.922.5
Winogrande59.457.859.354.7
OpenBookQA42.238.440.042.4
GSM8K (5-shot)31.07.261.35.5

指令模型

MetricSmolLM2-1.7B-InstructLlama-1B-InstructQwen2.5-1.5B-InstructSmolLM1-1.7B-Instruct
IFEval (Average prompt/inst)56.753.547.423.1
MT-Bench6.135.486.524.33
OpenRewrite-Eval (micro_avg RougeL)44.939.246.9NaN
HellaSwag66.156.160.955.5
ARC (Average)51.741.646.243.7
PIQA74.472.373.271.6
MMLU-Pro (MCF)19.312.724.211.7
BBH (3-shot)32.227.635.325.7
GSM8K (5-shot)48.226.842.84.62

Demo

pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-1.7B-Instruct"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

Chat in TRL

pip install trl
trl chat --model_name_or_path HuggingFaceTB/SmolLM2-1.7B-Instruct --device cpu

基础预训练模型

MetricsSmolLM2-360MQwen2.5-0.5BSmolLM-360M
HellaSwag54.551.251.8
ARC (Average)53.045.450.1
PIQA71.769.971.6
MMLU (cloze)35.833.734.4
CommonsenseQA38.031.635.3
TriviaQA16.94.39.1
Winogrande52.554.152.8
OpenBookQA37.437.437.2
GSM8K (5-shot)3.233.41.6

指令模型

MetricSmolLM2-360M-InstructQwen2.5-0.5B-InstructSmolLM-360M-Instruct
IFEval (Average prompt/inst)41.031.619.8
MT-Bench3.664.163.37
HellaSwag52.148.047.9
ARC (Average)43.737.338.8
PIQA70.867.269.4
MMLU (cloze)32.831.730.6
BBH (3-shot)27.330.724.4
GSM8K (5-shot)7.4326.81.36

Demo

pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-360M-Instruct"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

Chat in TRL

pip install trl
trl chat --model_name_or_path HuggingFaceTB/SmolLM2-360M-Instruct --device cpu
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值