OpenCompass 评测 InternLM-1.8B 实践

基础任务(完成此任务即完成闯关)

使用 OpenCompass 评测 internlm2-chat-1.8b 模型在 ceval 数据集上的性能,记录复现过程并截图。

环境配置

创建开发机和 conda 环境

在创建开发机界面选择镜像为 Cuda11.7-conda,并选择 GPU 为10% A100。

 安装——面向GPU的环境安装

conda create -n opencompass python=3.10
conda activate opencompass
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia -y

# 注意:一定要先 cd /root
cd /root
git clone -b 0.2.4 https://github.com/open-compass/opencompass
cd opencompass
pip install -e .


apt-get update
apt-get install cmake
pip install -r requirements.txt
pip install protobuf

数据准备

评测数据集

解压评测数据集到 /root/opencompass/data/ 处。(注意: 上方在git clone opencompass 时一定要将 opencompass clone 到 /root 路径下)

cp /share/temp/datasets/OpenCompassData-core-20231110.zip /root/opencompass/
unzip OpenCompassData-core-20231110.zip

将会在 OpenCompass 下看到data文件夹

InternLM和ceval 相关的配置文件

列出所有跟 InternLM 及 C-Eval 相关的配置

python tools/list_configs.py internlm ceval

将会看到

+----------------------------------------+----------------------------------------------------------------------+
| Model                                  | Config Path                                                          |
|----------------------------------------+----------------------------------------------------------------------|
| hf_internlm2_1_8b                      | configs/models/hf_internlm/hf_internlm2_1_8b.py                      |
| hf_internlm2_20b                       | configs/models/hf_internlm/hf_internlm2_20b.py                       |
| hf_internlm2_7b                        | configs/models/hf_internlm/hf_internlm2_7b.py                        |
| hf_internlm2_base_20b                  | configs/models/hf_internlm/hf_internlm2_base_20b.py                  |
| hf_internlm2_base_7b                   | configs/models/hf_internlm/hf_internlm2_base_7b.py                   |
| hf_internlm2_chat_1_8b                 | configs/models/hf_internlm/hf_internlm2_chat_1_8b.py                 |
| hf_internlm2_chat_1_8b_sft             | configs/models/hf_internlm/hf_internlm2_chat_1_8b_sft.py             |
| hf_internlm2_chat_20b                  | configs/models/hf_internlm/hf_internlm2_chat_20b.py                  |
| hf_internlm2_chat_20b_sft              | configs/models/hf_internlm/hf_internlm2_chat_20b_sft.py              |
| hf_internlm2_chat_20b_with_system      | configs/models/hf_internlm/hf_internlm2_chat_20b_with_system.py      |
| hf_internlm2_chat_7b                   | configs/models/hf_internlm/hf_internlm2_chat_7b.py                   |
| hf_internlm2_chat_7b_sft               | configs/models/hf_internlm/hf_internlm2_chat_7b_sft.py               |
| hf_internlm2_chat_7b_with_system       | configs/models/hf_internlm/hf_internlm2_chat_7b_with_system.py       |
| hf_internlm2_chat_math_20b             | configs/models/hf_internlm/hf_internlm2_chat_math_20b.py             |
| hf_internlm2_chat_math_20b_with_system | configs/models/hf_internlm/hf_internlm2_chat_math_20b_with_system.py |
| hf_internlm2_chat_math_7b              | configs/models/hf_internlm/hf_internlm2_chat_math_7b.py              |
| hf_internlm2_chat_math_7b_with_system  | configs/models/hf_internlm/hf_internlm2_chat_math_7b_with_system.py  |
| hf_internlm_20b                        | configs/models/hf_internlm/hf_internlm_20b.py                        |
| hf_internlm_7b                         | configs/models/hf_internlm/hf_internlm_7b.py                         |
| hf_internlm_chat_20b                   | configs/models/hf_internlm/hf_internlm_chat_20b.py                   |
| hf_internlm_chat_7b                    | configs/models/hf_internlm/hf_internlm_chat_7b.py                    |
| hf_internlm_chat_7b_8k                 | configs/models/hf_internlm/hf_internlm_chat_7b_8k.py                 |
| hf_internlm_chat_7b_v1_1               | configs/models/hf_internlm/hf_internlm_chat_7b_v1_1.py               |
| internlm_7b                            | configs/models/internlm/internlm_7b.py                               |
| lmdeploy_internlm2_chat_20b            | configs/models/hf_internlm/lmdeploy_internlm2_chat_20b.py            |
| lmdeploy_internlm2_chat_7b             | configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py             |
| ms_internlm_chat_7b_8k                 | configs/models/ms_internlm/ms_internlm_chat_7b_8k.py                 |
+----------------------------------------+----------------------------------------------------------------------+


评测模型的数据集如下图所示:

+-------------------------------+-----------------------------------------------------------------+
| Dataset                       | Config Path                                                     |
|-------------------------------+-----------------------------------------------------------------|
| cmmlu_gen                     | configs/datasets/cmmlu/cmmlu_gen.py                             |
| cmmlu_gen_c13365              | configs/datasets/cmmlu/cmmlu_gen_c13365.py                      |
| cmmlu_ppl                     | configs/datasets/cmmlu/cmmlu_ppl.py                             |
| cmmlu_ppl_041cbf              | configs/datasets/cmmlu/cmmlu_ppl_041cbf.py                      |
| cmmlu_ppl_8b9c76              | configs/datasets/cmmlu/cmmlu_ppl_8b9c76.py                      |
| mmlu_clean_ppl                | configs/datasets/mmlu/mmlu_clean_ppl.py                         |
| mmlu_contamination_ppl_810ec6 | configs/datasets/contamination/mmlu_contamination_ppl_810ec6.py |
| mmlu_gen                      | configs/datasets/mmlu/mmlu_gen.py                               |
| mmlu_gen_23a9a9               | configs/datasets/mmlu/mmlu_gen_23a9a9.py                        |
| mmlu_gen_4d595a               | configs/datasets/mmlu/mmlu_gen_4d595a.py                        |
| mmlu_gen_5d1409               | configs/datasets/mmlu/mmlu_gen_5d1409.py                        |
| mmlu_gen_79e572               | configs/datasets/mmlu/mmlu_gen_79e572.py                        |
| mmlu_gen_a484b3               | configs/datasets/mmlu/mmlu_gen_a484b3.py                        |
| mmlu_ppl                      | configs/datasets/mmlu/mmlu_ppl.py                               |
| mmlu_ppl_ac766d               | configs/datasets/mmlu/mmlu_ppl_ac766d.py                        |
| mmlu_zero_shot_gen_47e2c0     | configs/datasets/mmlu/mmlu_zero_shot_gen_47e2c0.py              |
+-------------------------------+-----------------------------------------------------------------+

启动评测 (10% A100 8GB 资源)

使用命令行配置参数法进行评测

打开 opencompass文件夹下configs/models/hf_internlm/的hf_internlm2_chat_1_8b.py ,贴入以下代码

from opencompass.models import HuggingFaceCausalLM


models = [
    dict(
        type=HuggingFaceCausalLM,
        abbr='internlm2-1.8b-hf',
        path="/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b",
        tokenizer_path='/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b',
        model_kwargs=dict(
            trust_remote_code=True,
            device_map='auto',
        ),
        tokenizer_kwargs=dict(
            padding_side='left',
            truncation_side='left',
            use_fast=False,
            trust_remote_code=True,
        ),
        max_out_len=100,
        min_out_len=1,
        max_seq_len=2048,
        batch_size=8,
        run_cfg=dict(num_gpus=1, num_procs=1),
    )
]

确保按照上述步骤正确安装 OpenCompass 并准备好数据集后,可以通过以下命令评测 InternLM2-Chat-1.8B 模型在 C-Eval 数据集上的性能。由于 OpenCompass 默认并行启动评估过程,我们可以在第一次运行时以 --debug 模式启动评估,并检查是否存在问题。在 --debug 模式下,任务将按顺序执行,并实时打印输出。

#环境变量配置
export MKL_SERVICE_FORCE_INTEL=1
#或
export MKL_THREADING_LAYER=GNU
python run.py --datasets ceval_gen --models hf_internlm2_chat_1_8b --debug

命令解析

python run.py
--datasets ceval_gen \ # 数据集准备
--models hf_internlm2_chat_1_8b \  # 模型准备
--debug

如果一切正常,您应该看到屏幕上显示:

[2024-08-09 16:48:07,016] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...

评测完成后,将会看到:

07/22 22:45:02 - OpenCompass - INFO - Partitioned into 57 tasks.
07/22 22:45:04 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_college_biology]: {'accuracy': 51.388888888888886}
07/22 22:45:05 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_college_chemistry]: {'accuracy': 34.0}
07/22 22:45:07 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_college_computer_science]: {'accuracy': 41.0}
07/22 22:45:08 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_college_mathematics]: {'accuracy': 32.0}
07/22 22:45:10 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_college_physics]: {'accuracy': 29.411764705882355}
07/22 22:45:11 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_electrical_engineering]: {'accuracy': 44.13793103448276}
07/22 22:45:12 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_astronomy]: {'accuracy': 48.026315789473685}
07/22 22:45:14 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_anatomy]: {'accuracy': 45.925925925925924}
07/22 22:45:15 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_abstract_algebra]: {'accuracy': 31.0}
07/22 22:45:16 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_machine_learning]: {'accuracy': 32.142857142857146}
07/22 22:45:18 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_clinical_knowledge]: {'accuracy': 51.320754716981135}
07/22 22:45:19 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_global_facts]: {'accuracy': 24.0}
07/22 22:45:20 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_management]: {'accuracy': 62.13592233009708}
07/22 22:45:22 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_nutrition]: {'accuracy': 48.36601307189542}
07/22 22:45:23 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_marketing]: {'accuracy': 65.8119658119658}
07/22 22:45:25 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_professional_accounting]: {'accuracy': 35.1063829787234}
07/22 22:45:26 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_geography]: {'accuracy': 56.060606060606055}
07/22 22:45:27 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_international_law]: {'accuracy': 49.586776859504134}
07/22 22:45:29 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_moral_scenarios]: {'accuracy': 24.46927374301676}
07/22 22:45:30 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_computer_security]: {'accuracy': 63.0}
07/22 22:45:32 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_microeconomics]: {'accuracy': 48.319327731092436}
07/22 22:45:33 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_professional_law]: {'accuracy': 31.095176010430247}
07/22 22:45:35 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_medical_genetics]: {'accuracy': 54.0}
07/22 22:45:36 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_professional_psychology]: {'accuracy': 42.48366013071895}
07/22 22:45:37 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_jurisprudence]: {'accuracy': 50.0}
07/22 22:45:39 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_world_religions]: {'accuracy': 60.81871345029239}
07/22 22:45:40 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_philosophy]: {'accuracy': 49.19614147909968}
07/22 22:45:41 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_virology]: {'accuracy': 37.34939759036144}
07/22 22:45:43 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_chemistry]: {'accuracy': 35.960591133004925}
07/22 22:45:44 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_public_relations]: {'accuracy': 53.63636363636364}
07/22 22:45:46 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_macroeconomics]: {'accuracy': 45.64102564102564}
07/22 22:45:47 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_human_sexuality]: {'accuracy': 54.19847328244275}
07/22 22:45:48 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_elementary_mathematics]: {'accuracy': 29.894179894179896}
07/22 22:45:50 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_physics]: {'accuracy': 34.437086092715234}
07/22 22:45:51 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_computer_science]: {'accuracy': 38.0}
07/22 22:45:52 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_european_history]: {'accuracy': 58.18181818181818}
07/22 22:45:54 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_business_ethics]: {'accuracy': 42.0}
07/22 22:45:55 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_moral_disputes]: {'accuracy': 43.641618497109825}
07/22 22:45:57 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_statistics]: {'accuracy': 40.27777777777778}
07/22 22:45:58 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_miscellaneous]: {'accuracy': 55.172413793103445}
07/22 22:45:59 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_formal_logic]: {'accuracy': 26.984126984126984}
07/22 22:46:01 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_government_and_politics]: {'accuracy': 60.62176165803109}
07/22 22:46:02 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_prehistory]: {'accuracy': 46.2962962962963}
07/22 22:46:04 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_security_studies]: {'accuracy': 55.10204081632652}
07/22 22:46:05 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_biology]: {'accuracy': 56.12903225806451}
07/22 22:46:07 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_logical_fallacies]: {'accuracy': 55.828220858895705}
07/22 22:46:08 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_world_history]: {'accuracy': 65.40084388185655}
07/22 22:46:10 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_professional_medicine]: {'accuracy': 45.588235294117645}
07/22 22:46:11 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_mathematics]: {'accuracy': 21.48148148148148}
07/22 22:46:12 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_college_medicine]: {'accuracy': 43.35260115606936}
07/22 22:46:14 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_us_history]: {'accuracy': 51.9607843137255}
07/22 22:46:15 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_sociology]: {'accuracy': 64.6766169154229}
07/22 22:46:16 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_econometrics]: {'accuracy': 31.57894736842105}
07/22 22:46:18 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_high_school_psychology]: {'accuracy': 65.5045871559633}
07/22 22:46:19 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_human_aging]: {'accuracy': 48.4304932735426}
07/22 22:46:20 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_us_foreign_policy]: {'accuracy': 69.0}
07/22 22:46:22 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/lukaemon_mmlu_conceptual_physics]: {'accuracy': 32.340425531914896}
dataset                                            version    metric            mode      opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b
-------------------------------------------------  ---------  ----------------  ------  ---------------------------------------------------------------------------------------
lukaemon_mmlu_college_biology                      caec7d     accuracy          gen                                                                                       51.39
lukaemon_mmlu_college_chemistry                    520aa6     accuracy          gen                                                                                       34
lukaemon_mmlu_college_computer_science             99c216     accuracy          gen                                                                                       41
lukaemon_mmlu_college_mathematics                  678751     accuracy          gen                                                                                       32
lukaemon_mmlu_college_physics                      4f382c     accuracy          gen                                                                                       29.41
lukaemon_mmlu_electrical_engineering               770ce3     accuracy          gen                                                                                       44.14
lukaemon_mmlu_astronomy                            d3ee01     accuracy          gen                                                                                       48.03
lukaemon_mmlu_anatomy                              72183b     accuracy          gen                                                                                       45.93
lukaemon_mmlu_abstract_algebra                     2db373     accuracy          gen                                                                                       31
lukaemon_mmlu_machine_learning                     0283bb     accuracy          gen                                                                                       32.14
lukaemon_mmlu_clinical_knowledge                   cb3218     accuracy          gen                                                                                       51.32
lukaemon_mmlu_global_facts                         ab07b6     accuracy          gen                                                                                       24
lukaemon_mmlu_management                           80876d     accuracy          gen                                                                                       62.14
lukaemon_mmlu_nutrition                            4543bd     accuracy          gen                                                                                       48.37
lukaemon_mmlu_marketing                            7394e3     accuracy          gen                                                                                       65.81
lukaemon_mmlu_professional_accounting              444b7f     accuracy          gen                                                                                       35.11
lukaemon_mmlu_high_school_geography                0780e6     accuracy          gen                                                                                       56.06
lukaemon_mmlu_international_law                    cf3179     accuracy          gen                                                                                       49.59
lukaemon_mmlu_moral_scenarios                      f6dbe2     accuracy          gen                                                                                       24.47
lukaemon_mmlu_computer_security                    ce7550     accuracy          gen                                                                                       63
lukaemon_mmlu_high_school_microeconomics           04d21a     accuracy          gen                                                                                       48.32
lukaemon_mmlu_professional_law                     5f7e6c     accuracy          gen                                                                                       31.1
lukaemon_mmlu_medical_genetics                     881ef5     accuracy          gen                                                                                       54
lukaemon_mmlu_professional_psychology              221a16     accuracy          gen                                                                                       42.48
lukaemon_mmlu_jurisprudence                        001f24     accuracy          gen                                                                                       50
lukaemon_mmlu_world_religions                      232c09     accuracy          gen                                                                                       60.82
lukaemon_mmlu_philosophy                           08042b     accuracy          gen                                                                                       49.2
lukaemon_mmlu_virology                             12e270     accuracy          gen                                                                                       37.35
lukaemon_mmlu_high_school_chemistry                ae8820     accuracy          gen                                                                                       35.96
lukaemon_mmlu_public_relations                     e7d39b     accuracy          gen                                                                                       53.64
lukaemon_mmlu_high_school_macroeconomics           a01685     accuracy          gen                                                                                       45.64
lukaemon_mmlu_human_sexuality                      42407c     accuracy          gen                                                                                       54.2
lukaemon_mmlu_elementary_mathematics               269926     accuracy          gen                                                                                       29.89
lukaemon_mmlu_high_school_physics                  93278f     accuracy          gen                                                                                       34.44
lukaemon_mmlu_high_school_computer_science         9965a5     accuracy          gen                                                                                       38
lukaemon_mmlu_high_school_european_history         eefc90     accuracy          gen                                                                                       58.18
lukaemon_mmlu_business_ethics                      1dec08     accuracy          gen                                                                                       42
lukaemon_mmlu_moral_disputes                       a2173e     accuracy          gen                                                                                       43.64
lukaemon_mmlu_high_school_statistics               8f3f3a     accuracy          gen                                                                                       40.28
lukaemon_mmlu_miscellaneous                        935647     accuracy          gen                                                                                       55.17
lukaemon_mmlu_formal_logic                         cfcb0c     accuracy          gen                                                                                       26.98
lukaemon_mmlu_high_school_government_and_politics  3c52f9     accuracy          gen                                                                                       60.62
lukaemon_mmlu_prehistory                           bbb197     accuracy          gen                                                                                       46.3
lukaemon_mmlu_security_studies                     9b1743     accuracy          gen                                                                                       55.1
lukaemon_mmlu_high_school_biology                  37b125     accuracy          gen                                                                                       56.13
lukaemon_mmlu_logical_fallacies                    9cebb0     accuracy          gen                                                                                       55.83
lukaemon_mmlu_high_school_world_history            048e7e     accuracy          gen                                                                                       65.4
lukaemon_mmlu_professional_medicine                857144     accuracy          gen                                                                                       45.59
lukaemon_mmlu_high_school_mathematics              ed4dc0     accuracy          gen                                                                                       21.48
lukaemon_mmlu_college_medicine                     38709e     accuracy          gen                                                                                       43.35
lukaemon_mmlu_high_school_us_history               8932df     accuracy          gen                                                                                       51.96
lukaemon_mmlu_sociology                            c266a2     accuracy          gen                                                                                       64.68
lukaemon_mmlu_econometrics                         d1134d     accuracy          gen                                                                                       31.58
lukaemon_mmlu_high_school_psychology               7db114     accuracy          gen                                                                                       65.5
lukaemon_mmlu_human_aging                          82a410     accuracy          gen                                                                                       48.43
lukaemon_mmlu_us_foreign_policy                    528cfe     accuracy          gen                                                                                       69
lukaemon_mmlu_conceptual_physics                   63588e     accuracy          gen                                                                                       32.34
mmlu-humanities                                    -          naive_average     gen                                                                                       47.19
mmlu-stem                                          -          naive_average     gen                                                                                       38.98
mmlu-social-science                                -          naive_average     gen                                                                                       53.9
mmlu-other                                         -          naive_average     gen                                                                                       47.13
mmlu                                               -          naive_average     gen                                                                                       45.85
mmlu-weighted                                      -          weighted_average  gen                                                                                       44.3
07/22 22:46:22 - OpenCompass - INFO - write summary to /root/opencompass/outputs/default/20240722_181832/summary/summary_20240722_181832.txt
07/22 22:46:22 - OpenCompass - INFO - write csv to /root/opencompass/outputs/default/20240722_181832/summary/summary_20240722_181832.csv

命令的解析:

python run.py
--datasets mmlu_gen \
--hf-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \  # HuggingFace 模型路径
--tokenizer-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \  # HuggingFace tokenizer 路径(如果与模型路径相同,可以省略)
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \  # 构建 tokenizer 的参数
--model-kwargs device_map='auto' trust_remote_code=True \  # 构建模型的参数
--max-seq-len 1024 \  # 模型可以接受的最大序列长度
--max-out-len 16 \  # 生成的最大 token 数
--batch-size 2  \  # 批量大小
--num-gpus 1  # 运行模型所需的 GPU 数量
--debug

使用配置文件修改参数法进行评测

除了通过命令行配置实验外,OpenCompass 还允许用户在配置文件中编写实验的完整配置,并通过 run.py 直接运行它。配置文件是以 Python 格式组织的,并且必须包括 datasets 和 models 字段。本次测试配置在 configs文件夹 中。此配置通过 继承机制 引入所需的数据集和模型配置,并以所需格式组合 datasets 和 models 字段。 运行以下代码,在configs文件夹下创建eval_tutorial_demo.py

cd /root/opencompass/configs
touch eval_tutorial_demo.py

打开eval_tutorial_demo.py 贴入以下代码

from mmengine.config import read_base

with read_base():
    from .datasets.ceval.ceval_gen import ceval_datasets
    from .models.hf_internlm.hf_internlm2_chat_1_8b import models as hf_internlm2_chat_1_8b_models

datasets = ceval_datasets
models = hf_internlm2_chat_1_8b_models

因此,运行任务时,我们只需将配置文件的路径传递给 run.py:

cd /root/opencompass
python run.py configs/eval_tutorial_demo.py --debug

如果一切正常,您应该看到屏幕上显示:

[2024-08-09 16:48:07,016] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...

评测完成后,将会看到:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值