书生·浦语大模型实战训练营--第二期第七节--OpenCompass大模型评测实战--homework

本文链接：https://blog.csdn.net/weixin_61573157/article/details/138098120

1.基础作业

一、配置环境

安装下面的顺序以及自己的文件路径配置环境

conda create -n opencompass python=3.10 -y

安装下面的包

absl-py
accelerate>=0.19.0
boto3
cn2an
cpm_kernels
datasets>=2.12.0
einops==0.5.0
evaluate>=0.3.0
fairscale
func_timeout
fuzzywuzzy
immutabledict
jieba
langdetect
ltp
mmengine-lite
nltk==3.8
numpy>=1.23.4
openai
OpenCC
opencompass
opencv-python-headless
pandas<2.0.0
prettytable
pyext
pypinyin
python-Levenshtein
rank_bm25==0.2.2
rapidfuzz
requests==2.31.0
rich
rouge
-e git+https://github.com/Isaac-JL-Chen/rouge_chinese.git@master#egg=rouge_chinese
rouge_score
sacrebleu
scikit_learn==1.2.1
seaborn
sentence_transformers==2.2.2
tabulate
tiktoken
timeout_decorator
tokenizers>=0.13.3
torch>=1.13.1
tqdm==4.64.1
transformers>=4.29.1
typer
protobuf

二、源码下载

# 下载源码
git clone -b 0.2.4 https://github.com/open-compass/opencompass

# 配置环境依赖库
pip install -r /root/autodl-tmp/opencompass/requirements.txt

# 解压评测数据集到 data/ 处
cp /share/temp/datasets/OpenCompassData-core-20231110.zip /root/autodl-tmp/opencompass/opencompass
unzip /share/temp/datasets/OpenCompassData-core-20231110.zip

# 列出所有跟 internlm 及 ceval 相关的配置
python /root/autodl-tmp/opencompass/opencompass/tools/list_configs.py

打开配置之后可以看到如下结果

三、启动评测

执行下列命令

# 启动评测 (10% A100 8GB 资源)
python /root/autodl-tmp/opencompass/opencompass/run.py --datasets ceval_gen --hf-path /root/autodl-tmp/opencompass/model/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-path /root/autodl-tmp/opencompass/model/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --max-seq-len 1024 --max-out-len 16 --batch-size 2 --num-gpus 1 

# 参数解释
python /root/autodl-tmp/opencompass/opencompass/run.py
--datasets ceval_gen \
--hf-path /root/autodl-tmp/opencompass/model/Shanghai_AI_Laboratory/internlm2-chat-1_8b \  # HuggingFace 模型路径
--tokenizer-path /root/autodl-tmp/opencompass/model/Shanghai_AI_Laboratory/internlm2-chat-1_8b \  # HuggingFace tokenizer 路径（如果与模型路径相同，可以省略）
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \  # 构建 tokenizer 的参数
--model-kwargs device_map='auto' trust_remote_code=True \  # 构建模型的参数
--max-seq-len 1024 \  # 模型可以接受的最大序列长度
--max-out-len 16 \  # 生成的最大 token 数
--batch-size 2  \  # 批量大小
--num-gpus 1  # 运行模型所需的 GPU 数量
--debug

结果如下：

记得注释掉这两个报错的包：/root/.conda/envs/opencompass/lib/python3.10/site-packages/opencompass/datasets/__init__.py里面的两行代码（我个人感觉是这个源码是有点问题的，他根本就没写全，总是漏了几个包）

模型加载

评测中

评测结果

dataset                                         version    metric         mode      opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b
----------------------------------------------  ---------  -------------  ------  ---------------------------------------------------------------------------------------
ceval-computer_network                          db9ce2     accuracy       gen                                                                                       47.37
ceval-operating_system                          1c2571     accuracy       gen                                                                                       47.37
ceval-computer_architecture                     a74dad     accuracy       gen                                                                                       23.81
ceval-college_programming                       4ca32a     accuracy       gen                                                                                       13.51
ceval-college_physics                           963fa8     accuracy       gen                                                                                       42.11
ceval-college_chemistry                         e78857     accuracy       gen                                                                                       33.33
ceval-advanced_mathematics                      ce03e2     accuracy       gen                                                                                       10.53
ceval-probability_and_statistics                65e812     accuracy       gen                                                                                       38.89
ceval-discrete_mathematics                      e894ae     accuracy       gen                                                                                       25
ceval-electrical_engineer                       ae42b9     accuracy       gen                                                                                       27.03
ceval-metrology_engineer                        ee34ea     accuracy       gen                                                                                       54.17
ceval-high_school_mathematics                   1dc5bf     accuracy       gen                                                                                       16.67
ceval-high_school_physics                       adf25f     accuracy       gen                                                                                       42.11
ceval-high_school_chemistry                     2ed27f     accuracy       gen                                                                                       47.37
ceval-high_school_biology                       8e2b9a     accuracy       gen                                                                                       26.32
ceval-middle_school_mathematics                 bee8d5     accuracy       gen                                                                                       36.84
ceval-middle_school_biology                     86817c     accuracy       gen                                                                                       80.95
ceval-middle_school_physics                     8accf6     accuracy       gen                                                                                       47.37
ceval-middle_school_chemistry                   167a15     accuracy       gen                                                                                       80
ceval-veterinary_medicine                       b4e08d     accuracy       gen                                                                                       43.48
ceval-college_economics                         f3f4e6     accuracy       gen                                                                                       32.73
ceval-business_administration                   c1614e     accuracy       gen                                                                                       36.36
ceval-marxism                                   cf874c     accuracy       gen                                                                                       68.42
ceval-mao_zedong_thought                        51c7a4     accuracy       gen                                                                                       70.83
ceval-education_science                         591fee     accuracy       gen                                                                                       55.17
ceval-teacher_qualification                     4e4ced     accuracy       gen                                                                                       59.09
ceval-high_school_politics                      5c0de2     accuracy       gen                                                                                       57.89
ceval-high_school_geography                     865461     accuracy       gen                                                                                       47.37
ceval-middle_school_politics                    5be3e7     accuracy       gen                                                                                       71.43
ceval-middle_school_geography                   8a63be     accuracy       gen                                                                                       75
ceval-modern_chinese_history                    fc01af     accuracy       gen                                                                                       52.17
ceval-ideological_and_moral_cultivation         a2aa4a     accuracy       gen                                                                                       73.68
ceval-logic                                     f5b022     accuracy       gen                                                                                       27.27
ceval-law                                       a110a1     accuracy       gen                                                                                       29.17
ceval-chinese_language_and_literature           0f8b68     accuracy       gen                                                                                       47.83
ceval-art_studies                               2a1300     accuracy       gen                                                                                       42.42
ceval-professional_tour_guide                   4e673e     accuracy       gen                                                                                       51.72
ceval-legal_professional                        ce8787     accuracy       gen                                                                                       34.78
ceval-high_school_chinese                       315705     accuracy       gen                                                                                       42.11
ceval-high_school_history                       7eb30a     accuracy       gen                                                                                       65
ceval-middle_school_history                     48ab4a     accuracy       gen                                                                                       86.36
ceval-civil_servant                             87d061     accuracy       gen                                                                                       42.55
ceval-sports_science                            70f27b     accuracy       gen                                                                                       52.63
ceval-plant_protection                          8941f9     accuracy       gen                                                                                       40.91
ceval-basic_medicine                            c409d6     accuracy       gen                                                                                       68.42
ceval-clinical_medicine                         49e82d     accuracy       gen                                                                                       31.82
ceval-urban_and_rural_planner                   95b885     accuracy       gen                                                                                       47.83
ceval-accountant                                002837     accuracy       gen                                                                                       36.73
ceval-fire_engineer                             bc23f5     accuracy       gen                                                                                       38.71
ceval-environmental_impact_assessment_engineer  c64e2d     accuracy       gen                                                                                       51.61
ceval-tax_accountant                            3a5e3c     accuracy       gen                                                                                       36.73
ceval-physician                                 6e277d     accuracy       gen                                                                                       42.86
ceval-stem                                      -          naive_average  gen                                                                                       39.21
ceval-social-science                            -          naive_average  gen                                                                                       57.43
ceval-humanities                                -          naive_average  gen                                                                                       50.23
ceval-other                                     -          naive_average  gen                                                                                       44.62
ceval-hard                                      -          naive_average  gen                                                                                       32
ceval                                           -          naive_average  gen                                                                                       46.19

2.进阶作业

一、将自定义数据集提交至OpenCompass官网

创建数据集读取类

cd /root/autodl-tmp/opencompass/opencompass/opencompass/datasets
cp ceval.py ceval_selfmake.py

将下述代码替换进ceval_selfmake.py中

from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import FixKRetriever
from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CEval_SelfMakeDataset
from opencompass.utils.text_postprocessors import first_capital_postprocess

ceval_subject_mapping = {
    'computer_network': ['Computer Network', '计算机网络', 'STEM']
}
ceval_all_sets = list(ceval_subject_mapping.keys())

ceval_datasets = []
for _split in ["val"]:
    for _name in ceval_all_sets:
        _ch_name = ceval_subject_mapping[_name][1]
        ceval_infer_cfg = dict(
            ice_template=dict(
                type=PromptTemplate,
                template=dict(
                    begin="</E>",
                    round=[
                        dict(
                            role="HUMAN",
                            prompt=
                            f"以下是中国关于{_ch_name}考试的单项选择题，请选出其中的正确答案。\n{{question}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\n答案: "
                        ),
                        dict(role="BOT", prompt="{answer}"),
                    ]),
                ice_token="</E>",
            ),
            retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]),
            inferencer=dict(type=GenInferencer),
        )

        ceval_eval_cfg = dict(
            evaluator=dict(type=AccEvaluator),
            pred_postprocessor=dict(type=first_capital_postprocess))

        ceval_datasets.append(
            dict(
                type=CEval_SelfMakeDataset,
                path="./data/ceval/formal_ceval",
                name=_name,
                abbr="ceval-" + _name if _split == "val" else "ceval-test-" +
                _name,
                reader_cfg=dict(
                    input_columns=["question", "A", "B", "C", "D"],
                    output_column="answer",
                    train_split="dev",
                    test_split=_split),
                infer_cfg=ceval_infer_cfg,
                eval_cfg=ceval_eval_cfg,
            ))

del _split, _name, _ch_name

修改一下_init_.py

在~/opencompass/opencompass/datasets/__init__.py的最后加入一行：from .ceval_selfmake import *，表示从ceval_selfmake.py中导入数据集类，使该数据集可以被正常读取

创建数据集配置文件

cd /root/autodl-tmp/opencompass/opencompass/configs/datasets/ceval
cp ceval_gen_5f30c7.py ceval_gen_selfmake.py

将下述代码替换进ceval_gen_selfmake.py中

from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import FixKRetriever
from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.datasets import CEval_SelfMakeDataset
from opencompass.utils.text_postprocessors import first_capital_postprocess

ceval_subject_mapping = {
    'computer_network': ['Computer Network', '计算机网络', 'STEM']
}
ceval_all_sets = list(ceval_subject_mapping.keys())

ceval_datasets = []
for _split in ["val"]:
    for _name in ceval_all_sets:
        _ch_name = ceval_subject_mapping[_name][1]
        ceval_infer_cfg = dict(
            ice_template=dict(
                type=PromptTemplate,
                template=dict(
                    begin="</E>",
                    round=[
                        dict(
                            role="HUMAN",
                            prompt=
                            f"以下是中国关于{_ch_name}考试的单项选择题，请选出其中的正确答案。\n{{question}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\n答案: "
                        ),
                        dict(role="BOT", prompt="{answer}"),
                    ]),
                ice_token="</E>",
            ),
            retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]),
            inferencer=dict(type=GenInferencer),
        )

        ceval_eval_cfg = dict(
            evaluator=dict(type=AccEvaluator),
            pred_postprocessor=dict(type=first_capital_postprocess))

        ceval_datasets.append(
            dict(
                type=CEval_SelfMakeDataset,
                path="./data/ceval/formal_ceval",
                name=_name,
                abbr="ceval-" + _name if _split == "val" else "ceval-test-" +
                _name,
                reader_cfg=dict(
                    input_columns=["question", "A", "B", "C", "D"],
                    output_column="answer",
                    train_split="dev",
                    test_split=_split),
                infer_cfg=ceval_infer_cfg,
                eval_cfg=ceval_eval_cfg,
            ))

del _split, _name, _ch_name

修改 C-Eval 默认评测集：将以下代码替换为ceval_gen.py

from mmengine.config import read_base

with read_base():
    from .ceval_gen_selfmake import ceval_datasets  # noqa: F401, F403

启动评测 (10% A100 8GB 资源)

export MKL_SERVICE_FORCE_INTEL=1
export MKL_THREADING_LAYER=GNU
python /root/autodl-tmp/opencompass/opencompass/run.py --datasets ceval_gen --hf-path /root/autodl-tmp/opencompass/model/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-path /root/autodl-tmp/opencompass/model/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 1024 --max-out-len 16 --batch-size 2 --num-gpus 1 --debug