一、基础作业
1、查看支持的数据集和模型
2、评测结果:
20240118_111205
tabulate format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dataset version metric mode opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b
---------------------------------------------- --------- ------------- ------ -------------------------------------------------------------------------
ceval-computer_network db9ce2 accuracy gen 31.58
ceval-operating_system 1c2571 accuracy gen 36.84
ceval-computer_architecture a74dad accuracy gen 28.57
ceval-college_programming 4ca32a accuracy gen 32.43
ceval-college_physics 963fa8 accuracy gen 26.32
ceval-college_chemistry e78857 accuracy gen 16.67
ceval-advanced_mathematics ce03e2 accuracy gen 21.05
ceval-probability_and_statistics 65e812 accuracy gen 38.89
ceval-discrete_mathematics e894ae accuracy gen 18.75
ceval-electrical_engineer ae42b9 accuracy gen 35.14
ceval-metrology_engineer ee34ea accuracy gen 50
ceval-high_school_mathematics 1dc5bf accuracy gen 22.22
ceval-high_school_physics adf25f accuracy gen 31.58
ceval-high_school_chemistry 2ed27f accuracy gen 15.79
ceval-high_school_biology 8e2b9a accuracy gen 36.84
ceval-middle_school_mathematics bee8d5 accuracy gen 26.32
ceval-middle_school_biology 86817c accuracy gen 61.9
ceval-middle_school_physics 8accf6 accuracy gen 63.16
ceval-middle_school_chemistry 167a15 accuracy gen 60
ceval-veterinary_medicine b4e08d accuracy gen 47.83
ceval-college_economics f3f4e6 accuracy gen 41.82
ceval-business_administration c1614e accuracy gen 33.33
ceval-marxism cf874c accuracy gen 68.42
ceval-mao_zedong_thought 51c7a4 accuracy gen 70.83
ceval-education_science 591fee accuracy gen 58.62
ceval-teacher_qualification 4e4ced accuracy gen 70.45
ceval-high_school_politics 5c0de2 accuracy gen 26.32
ceval-high_school_geography 865461 accuracy gen 47.37
ceval-middle_school_politics 5be3e7 accuracy gen 52.38
ceval-middle_school_geography 8a63be accuracy gen 58.33
ceval-modern_chinese_history fc01af accuracy gen 73.91
ceval-ideological_and_moral_cultivation a2aa4a accuracy gen 63.16
ceval-logic f5b022 accuracy gen 31.82
ceval-law a110a1 accuracy gen 25
ceval-chinese_language_and_literature 0f8b68 accuracy gen 30.43
ceval-art_studies 2a1300 accuracy gen 60.61
ceval-professional_tour_guide 4e673e accuracy gen 62.07
ceval-legal_professional ce8787 accuracy gen 39.13
ceval-high_school_chinese 315705 accuracy gen 63.16
ceval-high_school_history 7eb30a accuracy gen 70
ceval-middle_school_history 48ab4a accuracy gen 59.09
ceval-civil_servant 87d061 accuracy gen 53.19
ceval-sports_science 70f27b accuracy gen 52.63
ceval-plant_protection 8941f9 accuracy gen 59.09
ceval-basic_medicine c409d6 accuracy gen 47.37
ceval-clinical_medicine 49e82d accuracy gen 40.91
ceval-urban_and_rural_planner 95b885 accuracy gen 45.65
ceval-accountant 002837 accuracy gen 26.53
ceval-fire_engineer bc23f5 accuracy gen 22.58
ceval-environmental_impact_assessment_engineer c64e2d accuracy gen 64.52
ceval-tax_accountant 3a5e3c accuracy gen 34.69
ceval-physician 6e277d accuracy gen 40.82
ceval-stem - naive_average gen 35.09
ceval-social-science - naive_average gen 52.79
ceval-humanities - naive_average gen 52.58
ceval-other - naive_average gen 44.36
ceval-hard - naive_average gen 23.91
ceval - naive_average gen 44.16
二、进阶作业
选择采用TurboMind API的方式完成LMDeploy部署的InternLM-Chat-7B在CEval上的测评;
按照opencompass评测LMDeploy模型的教程,首先修改配置文件configs/eval_internlm_chat_turbomind_api.py
from mmengine.config import read_base
from opencompass.models.turbomind_api import TurboMindAPIModel
from opencompass.partitioners import NaivePartitioner
from opencompass.runners import LocalRunner
from opencompass.tasks import OpenICLEvalTask, OpenICLInferTask
# with read_base():
# # choose a list of datasets
# from .datasets.mmlu.mmlu_gen_a484b3 import mmlu_datasets
# from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets
# from .datasets.SuperGLUE_WiC.SuperGLUE_WiC_gen_d06864 import WiC_datasets
# from .datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets
# from .datasets.gsm8k.gsm8k_gen_1d7fe4 import gsm8k_datasets
# from .datasets.humaneval.humaneval_gen_8e312c import humaneval_datasets
# from .datasets.race.race_gen_69ee4f import race_datasets
# from .datasets.crowspairs.crowspairs_gen_381af0 import crowspairs_datasets
# # and output the results in a choosen format
# from .summarizers.medium import summarizer
# datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])
with read_base():
# choose a list of datasets
from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets
datasets = [*ceval_datasets]
_meta_template = dict(
round=[
dict(role='HUMAN', begin='[UNUSED_TOKEN_146]user\n', end='[UNUSED_TOKEN_145]\n'),
dict(role='SYSTEM', begin='[UNUSED_TOKEN_146]system\n', end='[UNUSED_TOKEN_145]\n'),
dict(role='BOT', begin='[UNUSED_TOKEN_146]assistant\n', end='[UNUSED_TOKEN_145]\n', generate=True),
],
eos_token_id=92542
)
models = [
dict(
type=TurboMindAPIModel,
abbr='internlm-chat-7b-turbomind',
path="/root/LLM/internlm-chat-7b",
api_addr='http://0.0.0.0:23333',
max_out_len=100,
max_seq_len=2048,
batch_size=8,
meta_template=_meta_template,
run_cfg=dict(num_gpus=1, num_procs=1),
)
]
infer = dict(
partitioner=dict(type=NaivePartitioner),
runner=dict(
type=LocalRunner,
max_num_workers=2,
task=dict(type=OpenICLInferTask)),
)
eval = dict(
partitioner=dict(type=NaivePartitioner),
runner=dict(
type=LocalRunner,
max_num_workers=2,
task=dict(type=OpenICLEvalTask)),
)
再将InternLM-Chat-7B 转换为TurboMind格式并以API服务形式部署;
lmdeploy convert internlm-chat-7b /root/LLM/internlm-chat-7b/
lmdeploy serve api_server ./workspace --server-name 0.0.0.0 --server-port 23333 --max-batch-size 64 --tp 1
然后运行评测:
python run.py configs/eval_internlm_chat_turbomind_api.py -w outputs/turbomind/internlm-chat-7b
评测运行中:
评测结果:
20240122_090838
tabulate format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dataset version metric mode internlm-chat-7b-turbomind
---------------------------------------------- --------- -------- ------ ----------------------------
ceval-computer_network db9ce2 accuracy gen 36.84
ceval-operating_system 1c2571 accuracy gen 36.84
ceval-computer_architecture a74dad accuracy gen 47.62
ceval-college_programming 4ca32a accuracy gen 45.95
ceval-college_physics 963fa8 accuracy gen 31.58
ceval-college_chemistry e78857 accuracy gen 29.17
ceval-advanced_mathematics ce03e2 accuracy gen 36.84
ceval-probability_and_statistics 65e812 accuracy gen 22.22
ceval-discrete_mathematics e894ae accuracy gen 18.75
ceval-electrical_engineer ae42b9 accuracy gen 37.84
ceval-metrology_engineer ee34ea accuracy gen 41.67
ceval-high_school_mathematics 1dc5bf accuracy gen 27.78
ceval-high_school_physics adf25f accuracy gen 42.11
ceval-high_school_chemistry 2ed27f accuracy gen 52.63
ceval-high_school_biology 8e2b9a accuracy gen 42.11
ceval-middle_school_mathematics bee8d5 accuracy gen 36.84
ceval-middle_school_biology 86817c accuracy gen 66.67
ceval-middle_school_physics 8accf6 accuracy gen 68.42
ceval-middle_school_chemistry 167a15 accuracy gen 75
ceval-veterinary_medicine b4e08d accuracy gen 39.13
ceval-college_economics f3f4e6 accuracy gen 50.91
ceval-business_administration c1614e accuracy gen 39.39
ceval-marxism cf874c accuracy gen 63.16
ceval-mao_zedong_thought 51c7a4 accuracy gen 70.83
ceval-education_science 591fee accuracy gen 58.62
ceval-teacher_qualification 4e4ced accuracy gen 72.73
ceval-high_school_politics 5c0de2 accuracy gen 94.74
ceval-high_school_geography 865461 accuracy gen 52.63
ceval-middle_school_politics 5be3e7 accuracy gen 71.43
ceval-middle_school_geography 8a63be accuracy gen 75
ceval-modern_chinese_history fc01af accuracy gen 82.61
ceval-ideological_and_moral_cultivation a2aa4a accuracy gen 68.42
ceval-logic f5b022 accuracy gen 36.36
ceval-law a110a1 accuracy gen 20.83
ceval-chinese_language_and_literature 0f8b68 accuracy gen 34.78
ceval-art_studies 2a1300 accuracy gen 60.61
ceval-professional_tour_guide 4e673e accuracy gen 58.62
ceval-legal_professional ce8787 accuracy gen 39.13
ceval-high_school_chinese 315705 accuracy gen 52.63
ceval-high_school_history 7eb30a accuracy gen 70
ceval-middle_school_history 48ab4a accuracy gen 77.27
ceval-civil_servant 87d061 accuracy gen 57.45
ceval-sports_science 70f27b accuracy gen 63.16
ceval-plant_protection 8941f9 accuracy gen 50
ceval-basic_medicine c409d6 accuracy gen 42.11
ceval-clinical_medicine 49e82d accuracy gen 50
ceval-urban_and_rural_planner 95b885 accuracy gen 43.48
ceval-accountant 002837 accuracy gen 34.69
ceval-fire_engineer bc23f5 accuracy gen 38.71
ceval-environmental_impact_assessment_engineer c64e2d accuracy gen 51.61
ceval-tax_accountant 3a5e3c accuracy gen 38.78
ceval-physician 6e277d accuracy gen 48.98