OpenCompass模型评估DeepSeek&Qwen

Open Compass模型评估

前言

OpenCompass 是一个开源框架,旨在为构建和评估大规模语言模型提供一个全面、高效且可扩展的解决方案。它集成了数据管理、模型训练、评估分析等多种功能,帮助研究人员和开发者快速上手并专注于模型的设计和创新。

一、环境搭建

#安装虚拟环境
conda create --name opencompass python=3.10 -y
conda activate opencompass
#安装OpenCompass
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .

二、数据准备

数据下载到opencompass目录下,直接解压即可,不需要改名字,也不要移路径,解压后就是一个data目录。

 # 下载数据集到 data/ 处
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-
20240207.zip
unzip OpenCompassData-core-20240207.zip

评估数据集展示,gsm8k里面有4个jsonl文件,截取第一个文件里面部分数据,可以看到里面都是一些问答对。

{"question": "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?", "answer": "Janet sells 16 - 3 - 4 = <<16-3-4=9>>9 duck eggs a day.\nShe makes 9 * 2 = $<<9*2=18>>18 every day at the farmer\u2019s market.\n#### 18"}
{"question": "A robe takes 2 bolts of blue fiber and half that much white fiber.  How many bolts in total does it take?", "answer": "It takes 2/2=<<2/2=1>>1 bolt of white fiber\nSo the total amount of fabric is 2+1=<<2+1=3>>3 bolts of fabric\n#### 3"}
{"question": "Josh decides to try flipping a house.  He buys a house for $80,000 and then puts in $50,000 in repairs.  This increased the value of the house by 150%.  How much profit did he make?", "answer": "The cost of the house and repairs came out to 80,000+50,000=$<<80000+50000=130000>>130,000\nHe increased the value of the house by 80,000*1.5=<<80000*1.5=120000>>120,000\nSo the new value of the house is 120,000+80,000=$<<120000+80000=200000>>200,000\nSo he made a profit of 200,000-130,000=$<<200000-130000=70000>>70,000\n#### 70000"}
{"question": "James decides to run 3 sprints 3 times a week.  He runs 60 meters each sprint.  How many total meters does he run a week?", "answer": "He sprints 3*3=<<3*3=9>>9 times\nSo he runs 9*60=<<9*60=540>>540 meters\n#### 540"}

三、下模

魔塔社区下载模型, 找一个deepseek的,找一个qwen的。

from modelscope import snapshot_download
model_dir = snapshot_download('deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B')
from modelscope import snapshot_download
model_dir = snapshot_download('Qwen/Qwen1.5-1.8B-Chat')

下载完以后的路径,默认在c盘用户的.cache目录下,如果需要指定路径,后面加一个参数cache_dir=‘需要指定的下载路径path’
在这里插入图片描述
linux默认路径就是root下面的.cache
在这里插入图片描述
下载完成。

Downloading [model.safetensors]: 100%|██████████████████████████████████████████▉| 3.42G/3.42G [1:Downloading [model.safetensors]: 100%|██████████████████████████████████████████▉| 3.42G/3.42G [1:Downloading [model.safetensors]: 100%|██████████████████████████████████████████▉| 3.42G/3.42G [1:Downloading [model.safetensors]: 100%|██████████████████████████████████████████▉| 3.42G/3.42G [1:Downloading [model.safetensors]: 100%|███████████████████████████████████████████| 3.42G/3.42G [1:12:28<00:00, 845kB/s]
Processing 1 items: 100%|████████████████████████████████████████████████████████| 1.00/1.00 [1:12Processing 1 items: 100%|████████████████████████████████████████████████████████| 1.00/1.00 [1:12:31<00:00, 4.35ks/it]
2025-02-06 11:20:18,570 - modelscope - INFO - Download model 'Qwen/Qwen1.5-1.8B-Chat' successfully.
2025-02-06 11:20:18,571 - modelscope - INFO - Creating symbolic link [C:\Users\gyton\.cache\modelscope\hub\Qwen\Qwen1.5-1.8B-Chat].
(opencompass) PS G:\damoxing\opencompass>

下载模型

(opencompass) PS G:\damoxing\opencompass> python .\download.py
Downloading Model to directory: C:\Users\gyton\.cache\modelscope\hub\deepseek-ai\DeepSeek-R1-Distill-Qwen-1.5B
2025-02-06 00:56:53,900 - modelscope - INFO - Got 9 files, start to download ...
Downloading [configuration.json]: 100%|███████████████████████████████████████████████| 73.0/73.0 [00:00<00:00, 220B/s]
Downloading [README.md]: 100%|████████████████████████████████████████████████████| 18.5k/18.5k [00:00<00:00, 55.0kB/s]
Downloading [generation_config.json]: 100%|█████████████████████████████████████████████| 181/181 [00:00<00:00, 519B/s]
Downloading [LICENSE]: 100%|██████████████████████████████████████████████████████| 1.04k/1.04k [00:00<00:00, 2.89kB/s]
Downloading [config.json]: 100%|██████████████████████████████████████████████████████| 679/679 [00:00<00:00, 1.82kB/s]
Downloading [tokenizer_config.json]: 100%|████████████████████████████████████████| 2.99k/2.99k [00:00<00:00, 8.78kB/s]
Downloading [figures/benchmark.jpg]: 100%|███████████████████████████████████████████| 759k/759k [00:05<00:00, 132kB/s]
Downloading [tokenizer.json]: 100%|████████████████████████████████████████████████| 6.71M/6.71M [00:21<00:00, 328kB/s]
Downloading [model.safetensors]: 100%|███████████████████████████████████████████| 3.31G/3.31G [1:43:02<00:00, 575kB/s]
Processing 9 items: 100%|██████████████████████████████████████████████████████████| 9.00/9.00 [1:43:02<00:00, 687s/it]
2025-02-06 02:39:56,779 - modelscope - INFO - Download model 'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B' successfully.
2025-02-06 02:39:56,779 - modelscope - INFO - Creating symbolic link [C:\Users\gyton\.cache\modelscope\hub\deepseek-ai\DeepSeek-R1-Distill-Qwen-1.5B].]:   0%|                                           | 2.00M/3.31G [00:04<2:04:53, 474kB/s]
(opencompass) PS G:\damoxing\opencompass>                                        | 8.00M/3.31G [00:20<2:25:02, 407kB/s]
Downloading [model.safetensors]: 100%|███████████████████████████████████████████| 3.31G/3.31G [1:43:02<00:00, 728kB/s]
Downloading [tokenizer.json]: 100%|████████████████████████████████████████████████| 6.71M/6.71M [00:21<00:00, 348kB/s]

#四、评估
把–models后面模型改成自己路径,数据集就用gsm8k

opencompass --models hf_internlm2_5_1_8b_chat --datasets demo_gsm8k_chat_gen

直接运行评估命令,会找不到模型,需要手动修改配置文件。

直接运行评估命令,会找不到模型。
```python
(opencompass) PS G:\damoxing\opencompass> opencompass --models C:\Users\gyton\.cache\modelscope\hub\deepseek-ai\DeepSeek-R1-Distill-Qwen-1___5B --datasets demo_gsm8k_chat_gen
signal.SIGALRM is not available on this platform
signal.SIGALRM is not available on this platform
02/06 09:00:03 - OpenCompass - INFO - Loading demo_gsm8k_chat_gen: C:\Users\gyton\anaconda3\envs\opencompass\lib\site-packages\opencompass\configs\./datasets\demo\demo_gsm8k_chat_gen.py
Traceback (most recent call last):
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\gyton\anaconda3\envs\opencompass\Scripts\opencompass.exe\__main__.py", line 7, in <module>
    sys.exit(main())
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\site-packages\opencompass\cli\main.py", line 227, in main
    cfg = get_config_from_arg(args)
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\site-packages\opencompass\utils\run.py", line 168, in get_config_from_arg
    for model in match_cfg_file(models_dir, [model_arg]):
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\site-packages\opencompass\utils\run.py", line 70, in match_cfg_file
    raise ValueError(err_msg)
ValueError: The provided pattern matches 0 or more than one config. Please verify your pattern and try again. You may use tools/list_configs.py to list or locate the configurations.
+----------------------------------------------------------------------------------+
| Not matched patterns                                                             |
|----------------------------------------------------------------------------------|
| C:\Users\gyton\.cache\modelscope\hub\deepseek-ai\DeepSeek-R1-Distill-Qwen-1___5B |
+----------------------------------------------------------------------------------+
(opencompass) PS G:\damoxing\opencompass>

这里需要修改配置文件,现在config找到deepseek
在这里插入图片描述

没有下载的那个1.5b的,那就随便找一个,找个名字接近的,把路径改一下
改动前

from opencompass.models import HuggingFacewithChatTemplate

models = [
    dict(
        type=HuggingFacewithChatTemplate,
        abbr='deepseek-7b-chat-hf',
        path='deepseek-ai/deepseek-llm-7b-chat',
        max_out_len=1024,
        batch_size=8,
        run_cfg=dict(num_gpus=1),
    )
]

改动后

from opencompass.models import HuggingFacewithChatTemplate

models = [
    dict(
        type=HuggingFacewithChatTemplate,
        abbr='deepseek-7b-chat-hf',
        path='C:\Users\gyton\.cache\modelscope\hub\deepseek-ai\DeepSeek-R1-Distill-Qwen-1___5B',
        max_out_len=1024,
        batch_size=8,
        run_cfg=dict(num_gpus=1),
    )
]

再把Qwen的配置文件修改一下

from opencompass.models import HuggingFacewithChatTemplate

models = [
    dict(
        type=HuggingFacewithChatTemplate,
        abbr='qwen1.5-1.8b-chat-hf',
        path='/root/autodl-tmp/llm/Qwen/Qwen1___5-1___8B-Chat',
        max_out_len=1024,
        batch_size=8,
        run_cfg=dict(num_gpus=1),
        stop_words=['<|im_end|>', '<|im_start|>'],
    )
]

可以看到配置文件里有这个对应的模型配置,只需要把路径改一下。
在这里插入图片描述
相同命令,在本地操作时候报出一下错误,

(opencompass) PS G:\damoxing\opencompass> opencompass --models hf_deepseek_7b_chat --datasets demo_gsm8k_chat_gen
signal.SIGALRM is not available on this platform
signal.SIGALRM is not available on this platform
02/06 09:21:28 - OpenCompass - INFO - Loading demo_gsm8k_chat_gen: C:\Users\gyton\anaconda3\envs\opencompass\lib\site-packages\opencompass\configs\./datasets\demo\demo_gsm8k_chat_gen.py
02/06 09:21:28 - OpenCompass - INFO - Loading hf_deepseek_7b_chat: C:\Users\gyton\anaconda3\envs\opencompass\lib\site-packages\opencompass\configs\./models\deepseek\hf_deepseek_7b_chat.py
02/06 09:21:28 - OpenCompass - INFO - Loading example: C:\Users\gyton\anaconda3\envs\opencompass\lib\site-packages\opencompass\configs\./summarizers\example.py
02/06 09:21:28 - OpenCompass - INFO - Current exp folder: outputs\default\20250206_092128
02/06 09:21:28 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
02/06 09:21:28 - OpenCompass - INFO - Partitioned into 1 tasks.
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\gyton\anaconda3\envs\opencompass\Scripts\opencompass.exe\__main__.py", line 7, in <module>
    sys.exit(main())
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\site-packages\opencompass\cli\main.py", line 308, in main
    runner(tasks)
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\site-packages\opencompass\runners\base.py", line 39, in __call__
    status_list = list(status)  # change into list format
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\concurrent\futures\_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\concurrent\futures\_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\concurrent\futures\_base.py", line 451, in result
    return self.__get_result()
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\concurrent\futures\_base.py", line 403, in __get_result
    raise self._exception
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Users\gyton\anaconda3\envs\opencompass\lib\site-packages\opencompass\runners\local.py", line 166, in submit
    assert len(gpus) >= num_gpus
AssertionError
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]
(opencompass) PS G:\damoxing\opencompass>

移到服务器上重新运行,都能正常运行,估计是torch版本不匹配问题。

四、评估

deepseek评估结果

(opencompass) root@autodl-container-dec7479388-77891342:~/autodl-tmp/opencompass# opencompass --models hf_deepseek_7b_chat --datasets demo_gsm8k_chat_gen
02/06 13:25:12 - OpenCompass - INFO - Loading demo_gsm8k_chat_gen: /root/autodl-tmp/opencompass/opencompass/configs/./datasets/demo/demo_gsm8k_chat_gen.py
02/06 13:25:12 - OpenCompass - INFO - Loading hf_deepseek_7b_chat: /root/autodl-tmp/opencompass/opencompass/configs/./models/deepseek/hf_deepseek_7b_chat.py
02/06 13:25:12 - OpenCompass - INFO - Loading example: /root/autodl-tmp/opencompass/opencompass/configs/./summarizers/example.py
02/06 13:25:12 - OpenCompass - INFO - Current exp folder: outputs/default/20250206_132512
02/06 13:25:12 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
02/06 13:25:12 - OpenCompass - INFO - Partitioned into 1 tasks.
launch OpenICLInfer[deepseek-7b-chat-hf/demo_gsm8k] on GPU 0                                                                                               
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [02:10<00:00, 130.55s/it]
02/06 13:27:22 - OpenCompass - INFO - Partitioned into 1 tasks.
launch OpenICLEval[deepseek-7b-chat-hf/demo_gsm8k] on CPU                                                                                                  
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:09<00:00,  9.24s/it]
dataset     version    metric    mode      deepseek-7b-chat-hf
----------  ---------  --------  ------  ---------------------
demo_gsm8k  1d7fe4     accuracy  gen                     40.62
02/06 13:27:32 - OpenCompass - INFO - write summary to /root/autodl-tmp/opencompass/outputs/default/20250206_132512/summary/summary_20250206_132512.txt
02/06 13:27:32 - OpenCompass - INFO - write csv to /root/autodl-tmp/opencompass/outputs/default/20250206_132512/summary/summary_20250206_132512.csv


The markdown format results is as below:

| dataset | version | metric | mode | deepseek-7b-chat-hf |
|----- | ----- | ----- | ----- | -----|
| demo_gsm8k | 1d7fe4 | accuracy | gen | 40.62 |

千问模型

(opencompass) root@autodl-container-dec7479388-77891342:~/autodl-tmp# opencompass --models hf_qwen1_5_1_8b_chat --datasets demo_gsm8k_chat_gen
02/06 13:28:33 - OpenCompass - INFO - Loading demo_gsm8k_chat_gen: /root/autodl-tmp/opencompass/opencompass/configs/./datasets/demo/demo_gsm8k_chat_gen.py
02/06 13:28:33 - OpenCompass - INFO - Loading hf_qwen1_5_1_8b_chat: /root/autodl-tmp/opencompass/opencompass/configs/./models/qwen/hf_qwen1_5_1_8b_chat.py
02/06 13:28:33 - OpenCompass - INFO - Loading example: /root/autodl-tmp/opencompass/opencompass/configs/./summarizers/example.py
02/06 13:28:33 - OpenCompass - INFO - Current exp folder: outputs/default/20250206_132833
02/06 13:28:33 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
02/06 13:28:33 - OpenCompass - INFO - Try to load the data from /root/.cache/opencompass/./data/gsm8k/
02/06 13:28:33 - OpenCompass - INFO - Partitioned into 1 tasks.
launch OpenICLInfer[qwen1.5-1.8b-chat-hf/demo_gsm8k] on GPU 0                                                                                              
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:32<00:00, 92.35s/it]
02/06 13:30:06 - OpenCompass - INFO - Partitioned into 1 tasks.
launch OpenICLEval[qwen1.5-1.8b-chat-hf/demo_gsm8k] on CPU                                                                                                 
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:10<00:00, 10.15s/it]
dataset     version    metric    mode      qwen1.5-1.8b-chat-hf
----------  ---------  --------  ------  ----------------------
demo_gsm8k  1d7fe4     accuracy  gen                      26.56
02/06 13:30:16 - OpenCompass - INFO - write summary to /root/autodl-tmp/outputs/default/20250206_132833/summary/summary_20250206_132833.txt
02/06 13:30:16 - OpenCompass - INFO - write csv to /root/autodl-tmp/outputs/default/20250206_132833/summary/summary_20250206_132833.csv


The markdown format results is as below:

| dataset | version | metric | mode | qwen1.5-1.8b-chat-hf |
|----- | ----- | ----- | ----- | -----|
| demo_gsm8k | 1d7fe4 | accuracy | gen | 26.56 |

可以看到DeepSeek-R1-Distill-Qwen-1.5B评估结果40.62,远高于wen1___5-1___8B-Chat 的26.56的结果。

五、总结

OpenCompass 作为一个开源框架,为大规模语言模型的构建和评估提供了一整套解决方案。它不仅具备强大的数据管理、模型训练和评估功能,还具有高度的可扩展性和灵活性。通过使用 OpenCompass,开发者可以更加高效地开展语言模型相关的工作。时间充裕的话,也可以下一些那些百亿,千亿,参数量更大的模型。可以评估自己微调的大模型,也可以使用一些公开的评测数据集,或者是自定义数据集。

### DeepSeek蒸馏Qwen模型介绍 DeepSeek蒸馏Qwen模型是基于蒸馏技术从较大规模的基础模型中提炼而来的紧凑型版本[^1]。这种模型不仅保留了原始大型模型的主要特征表示能力,还显著减少了参数量和计算资源需求,从而提高了推理速度并降低了部署成本。 #### 技术特点 - **高效性**:相比原版的大规模预训练模型,经过蒸馏后的Qwen模型具有更少的层数和更低的复杂度,在保持较高性能的同时实现了更快的速度。 - **适应性强**:该模型能够很好地适配不同的应用场景和服务环境,无论是云端服务器还是边缘设备都能轻松运行。 - **易于集成**:由于其轻量化设计,使得与其他应用程序或系统的集成就变得更加简单快捷。 ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("deepseek/Qwen-distilled") model = AutoModelForCausalLM.from_pretrained("deepseek/Qwen-distilled") input_text = "你好,世界" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` #### 应用场景 在实际应用方面,随着大模型平台的发展趋势进入第七阶段——即以大模型平台的应用与开发为核心时期,像星火大模型、文心大模型这样的先进成果被广泛应用于各个行业中去构建特定领域内的解决方案[^2]。对于DeepSeek蒸馏Qwen而言: - 可以为企业提供定制化的自然语言处理服务; - 支持智能客服系统快速响应用户咨询; - 协助开发者创建更加智能化的产品原型;
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值