DeepSeek-R1 蒸馏前后 Qwen Tokenizer 的变化_<|begin▁of▁sentence|>-CSDN博客

本文链接：https://blog.csdn.net/sinat_37574187/article/details/145674987

DeepSeek-R1 蒸馏前后 Qwen Tokenizer 的变化

作者：木尧

原文：https://zhuanlan.zhihu.com/p/23524663411

根据 DeepSeek-R1 的论文，DeepSeek-R1-Distill-Qwen-32B 是基于 Qwen2.5-32B 预训练 base 模型进行蒸馏 SFT 训练，而不是基于 Qwen2.5-32B-Instruct 的 chat 模型。

Qwen2.5-32B
：https://modelscope.cn/models/Qwen/Qwen2.5-32B
Qwen2.5-32B-Instruct：https://modelscope.cn/models/Qwen/Qwen2.5-32B-Instruct
DeepSeek-R1-Distill-Qwen-32B：https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

接下来对比这三个开源模型的几个主要配置文件，并分析 tokenizer 的异同，最后实测 chat 模版及 special token。

配置对比：config.json

结论：相比 base 模型，DeepSeek 蒸馏前后无变化。

配置对比：tokenizer_config.json

结论：蒸馏前后，tokenizer 配置文件变化较大，如图。

- 更改了 bos_token、eos_token、pad_token
- tokenizer_class 由 Qwen2Tokenizer 更改为 LlamaTokenizerFast
- chat_templete 也沿用了 DeepSeek 的模版

补充：Qwen 本身 base 和 instruct 模型相比基本一致，除了 instruct 模型在 chat_template 里的默认 system 里加了“You are Qwen, created by Alibaba Cloud. ”。

先看下 Qwen 的 chat 模版：

{%- iftools%}
    {
  {-'<|im_start|>system\n'}}
    {%- ifmessages[0]['role']=='system'%}
        {
  {-messages[0]['content']}}
    {%- else%}
        {
  {-'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.'}}
    {%- endif%}
    {
  {-"\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>"}}
    {%- fortoolintools%}
        {
  {-"\n"}}
        {
  {-tool|tojson}}
    {%- endfor%}
    {
  {-"\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json