目标
使用 OpenCompass 评测 internlm2-chat-1.8b 模型在 ceval 数据集上的性能
复现过程
安装环境
看看测试集的内容:InternLM 及 C-Eval 相关的配置
超参数参数准备
from opencompass.models import HuggingFaceCausalLM
models = [
dict(
type=HuggingFaceCausalLM,
abbr='internlm2-1.8b-hf',
path="/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b",
tokenizer_path='/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b',
model_kwargs=dict(
trust_remote_code=True,
device_map='auto',
),
tokenizer_kwargs=dict(
padding_side='left',
truncation_side='left',
use_fast=False,
trust_remote_code=True,
),
max_out_len=100,
min_out_len=1,
max_seq_len=2048,
batch_size=8,
run_cfg=dict(num_gpus=1, num_procs=1),
)
]
路径如下:
出现问题
File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
原因是自动安装的numpy版本过高,numpy2.0以上似乎都不行
检查了一下requirements,根本不需要numpy2.0以上的版本
解决方案
numpy降级
pip install -U numpy==1.26.4
注意:这个numpy版本也是我随便找的常用版本
解决
测评效果
dataset version metric mode internlm2-1.8b-hf
---------------------------------------------- --------- ------------- ------ -----------------------
ceval-computer_network db9ce2 accuracy gen 47.37
ceval-operating_system 1c2571 accuracy gen 47.37
ceval-computer_architecture a74dad accuracy gen 23.81
ceval-college_programming 4ca32a accuracy gen 13.51
ceval-college_physics 963fa8 accuracy gen 42.11
ceval-college_chemistry e78857 accuracy gen 33.33
ceval-advanced_mathematics ce03e2 accuracy gen 10.53
跑完了,有好多好多任务的正确率。
参考
https://github.com/InternLM/Tutorial/blob/camp3/docs/L1/OpenCompass/task.md
https://github.com/InternLM/Tutorial/blob/camp3/docs/L1/OpenCompass/readme.md