书生·浦语2.0(InternLM2)大模型实战--Day06 OpenCompass | 评测 internlm2-chat-1_8b 模型在 C-Eval 数据集上的性能

华尔街的幻觉

于 2024-04-30 15:25:58 发布

阅读量494

点赞数 5

分类专栏：书生浦语大模型文章标签：书生浦语 InternLM2 OpenCompass

本文链接：https://blog.csdn.net/sinat_29950703/article/details/138344526

版权

书生浦语大模型专栏收录该内容

9 篇文章 1 订阅

订阅专栏

在这里插入图片描述

视频地址：https://www.bilibili.com/video/BV1Pm41127jU/
课程文档：https://github.com/InternLM/Tutorial/blob/camp2/opencompass/readme.md
课程作业：https://github.com/InternLM/Tutorial/blob/camp2/opencompass/homework.md

1. 配置

1.1 环境配置

对比官方操作，我新增了cd /root命令，在这个目录下拉取 opencompass

studio-conda -o internlm-base -t opencompass
source activate opencompass
cd /root
git clone -b 0.2.4 https://github.com/open-compass/opencompass
cd opencompass
pip install -e . # 或 pip install -r requirements.txt

1.2 数据准备

cp /share/temp/datasets/OpenCompassData-core-20231110.zip /root/opencompass/
unzip OpenCompassData-core-20231110.zip

1.3 查看支持的数据集和模型

列出所有跟 InternLM 及 C-Eval 相关的配置

python tools/list_configs.py internlm ceval

在这里插入图片描述

2. 启动评测

python run.py --datasets ceval_gen --hf-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 1024 --max-out-len 16 --batch-size 2 --num-gpus 1 --debug

执行了以下3种方法，才成功启动评测

解决方案1
```
pip install protobuf
```

解决方案2

export MKL_SERVICE_FORCE_INTEL=1
#或
export MKL_THREADING_LAYER=GNU

解决方案3：科学上网

在这里插入图片描述

命令解析

python run.py
--datasets ceval_gen \
--hf-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \  # HuggingFace 模型路径
--tokenizer-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \  # HuggingFace tokenizer 路径（如果与模型路径相同，可以省略）
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \  # 构建 tokenizer 的参数
--model-kwargs device_map='auto' trust_remote_code=True \  # 构建模型的参数
--max-seq-len 1024 \  # 模型可以接受的最大序列长度
--max-out-len 16 \  # 生成的最大 token 数
--batch-size 2  \  # 批量大小
--num-gpus 1  # 运行模型所需的 GPU 数量
--debug

华尔街的幻觉

关注

5
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
书生·浦语2.0(InternLM2)大模型实战--Day06 OpenCompass | 评测 internlm2-chat-1_8b 模型在 C-Eval 数据集上的性能

列出所有跟 InternLM 及 C-Eval 相关的配置。执行了以下3种方法，才成功启动评测。
复制链接

扫一扫