大模型学习之书生·浦语大模型6——基于OpenCompass大模型评测

在这里插入图片描述

基于OpenCompass大模型评测

关于评测的三个问题Why/What/How

在这里插入图片描述

Why

在这里插入图片描述
在这里插入图片描述

What

在这里插入图片描述
在这里插入图片描述
有许多任务评测,包括垂直领域

How

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

包含客观评测和主观评测,其中主观评测分人工和模型来评估。

提示词工程

在这里插入图片描述

主流评测框架

在这里插入图片描述

OpenCompass 能力框架

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

  • 模型层
  • 能力层
  • 方法层
  • 工具层

在这里插入图片描述
支持丰富的模型

在这里插入图片描述
评测流水线设计,能切分多个独立执行的任务,最大化利用计算资源。
在这里插入图片描述
大模型能力对比结果输出

前言探索
在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
探索性方向涵盖:

  • 多模态
  • 法律
  • 医生

挑战

在这里插入图片描述

实践

创建开发环境和准备数据集

在这里插入图片描述
查看支持的数据集:
在这里插入图片描述

启动评测

客观评测

主要是run.py代码文件
在这里插入图片描述

  • datasets:指定数据集
  • hf-path:模型文件
  • tokenizer-path:tokenizer路径
  • max-seq-len:模型读入的最大长度
  • max-out-len:模型输出的最大长度,客观题设置一般较小
  • –debug:debug模式,打印出所有的过程
    在这里插入图片描述
    在这里插入图片描述
主观评测

主要是eval_sbujective_alignbench.py文件修改,需要注意modelmax_out_len等处的修改。
在这里插入图片描述

最终结果:

python tools/list_configs.py internlm ceval 
20240122_153109
tabulate format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dataset                                         version    metric         mode      opencompass.models.huggingface.HuggingFace_model_repos_internlm2-chat-7b
----------------------------------------------  ---------  -------------  ------  --------------------------------------------------------------------------
ceval-computer_network                          db9ce2     accuracy       gen                                                                          47.37
ceval-operating_system                          1c2571     accuracy       gen                                                                          57.89
ceval-computer_architecture                     a74dad     accuracy       gen                                                                          38.1
ceval-college_programming                       4ca32a     accuracy       gen                                                                          18.92
ceval-college_physics                           963fa8     accuracy       gen                                                                           5.26
ceval-college_chemistry                         e78857     accuracy       gen                                                                           0
ceval-advanced_mathematics                      ce03e2     accuracy       gen                                                                           0
ceval-probability_and_statistics                65e812     accuracy       gen                                                                          11.11
ceval-discrete_mathematics                      e894ae     accuracy       gen                                                                          18.75
ceval-electrical_engineer                       ae42b9     accuracy       gen                                                                          18.92
ceval-metrology_engineer                        ee34ea     accuracy       gen                                                                          50
ceval-high_school_mathematics                   1dc5bf     accuracy       gen                                                                           0
ceval-high_school_physics                       adf25f     accuracy       gen                                                                          31.58
ceval-high_school_chemistry                     2ed27f     accuracy       gen                                                                          26.32
ceval-high_school_biology                       8e2b9a     accuracy       gen                                                                          26.32
ceval-middle_school_mathematics                 bee8d5     accuracy       gen                                                                          21.05
ceval-middle_school_biology                     86817c     accuracy       gen                                                                          66.67
ceval-middle_school_physics                     8accf6     accuracy       gen                                                                          52.63
ceval-middle_school_chemistry                   167a15     accuracy       gen                                                                          80
ceval-veterinary_medicine                       b4e08d     accuracy       gen                                                                          39.13
ceval-college_economics                         f3f4e6     accuracy       gen                                                                          29.09
ceval-business_administration                   c1614e     accuracy       gen                                                                          30.3
ceval-marxism                                   cf874c     accuracy       gen                                                                          84.21
ceval-mao_zedong_thought                        51c7a4     accuracy       gen                                                                          70.83
ceval-education_science                         591fee     accuracy       gen                                                                          62.07
ceval-teacher_qualification                     4e4ced     accuracy       gen                                                                          77.27
ceval-high_school_politics                      5c0de2     accuracy       gen                                                                          21.05
ceval-high_school_geography                     865461     accuracy       gen                                                                          47.37
ceval-middle_school_politics                    5be3e7     accuracy       gen                                                                          38.1
ceval-middle_school_geography                   8a63be     accuracy       gen                                                                          58.33
ceval-modern_chinese_history                    fc01af     accuracy       gen                                                                          65.22
ceval-ideological_and_moral_cultivation         a2aa4a     accuracy       gen                                                                          89.47
ceval-logic                                     f5b022     accuracy       gen                                                                          13.64
ceval-law                                       a110a1     accuracy       gen                                                                          37.5
ceval-chinese_language_and_literature           0f8b68     accuracy       gen                                                                          47.83
ceval-art_studies                               2a1300     accuracy       gen                                                                          66.67
ceval-professional_tour_guide                   4e673e     accuracy       gen                                                                          82.76
ceval-legal_professional                        ce8787     accuracy       gen                                                                          30.43
ceval-high_school_chinese                       315705     accuracy       gen                                                                          21.05
ceval-high_school_history                       7eb30a     accuracy       gen                                                                          75
ceval-middle_school_history                     48ab4a     accuracy       gen                                                                          68.18
ceval-civil_servant                             87d061     accuracy       gen                                                                          38.3
ceval-sports_science                            70f27b     accuracy       gen                                                                          63.16
ceval-plant_protection                          8941f9     accuracy       gen                                                                          68.18
ceval-basic_medicine                            c409d6     accuracy       gen                                                                          57.89
ceval-clinical_medicine                         49e82d     accuracy       gen                                                                          45.45
ceval-urban_and_rural_planner                   95b885     accuracy       gen                                                                          58.7
ceval-accountant                                002837     accuracy       gen                                                                          34.69
ceval-fire_engineer                             bc23f5     accuracy       gen                                                                          12.9
ceval-environmental_impact_assessment_engineer  c64e2d     accuracy       gen                                                                          38.71
ceval-tax_accountant                            3a5e3c     accuracy       gen                                                                          42.86
ceval-physician                                 6e277d     accuracy       gen                                                                          51.02
ceval-stem                                      -          naive_average  gen                                                                          30.5
ceval-social-science                            -          naive_average  gen                                                                          51.86
ceval-humanities                                -          naive_average  gen                                                                          54.34
ceval-other                                     -          naive_average  gen                                                                          46.53
ceval-hard                                      -          naive_average  gen                                                                          11.63
ceval                                           -          naive_average  gen                                                                          43.04
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

-------------------------------------------------------------------------------------------------------------------------------- THIS IS A DIVIDER --------------------------------------------------------------------------------------------------------------------------------

csv format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dataset,version,metric,mode,opencompass.models.huggingface.HuggingFace_model_repos_internlm2-chat-7b
ceval-computer_network,db9ce2,accuracy,gen,47.37
ceval-operating_system,1c2571,accuracy,gen,57.89
ceval-computer_architecture,a74dad,accuracy,gen,38.10
ceval-college_programming,4ca32a,accuracy,gen,18.92
ceval-college_physics,963fa8,accuracy,gen,5.26
ceval-college_chemistry,e78857,accuracy,gen,0.00
ceval-advanced_mathematics,ce03e2,accuracy,gen,0.00
ceval-probability_and_statistics,65e812,accuracy,gen,11.11
ceval-discrete_mathematics,e894ae,accuracy,gen,18.75
ceval-electrical_engineer,ae42b9,accuracy,gen,18.92
ceval-metrology_engineer,ee34ea,accuracy,gen,50.00
ceval-high_school_mathematics,1dc5bf,accuracy,gen,0.00
ceval-high_school_physics,adf25f,accuracy,gen,31.58
ceval-high_school_chemistry,2ed27f,accuracy,gen,26.32
ceval-high_school_biology,8e2b9a,accuracy,gen,26.32
ceval-middle_school_mathematics,bee8d5,accuracy,gen,21.05
ceval-middle_school_biology,86817c,accuracy,gen,66.67
ceval-middle_school_physics,8accf6,accuracy,gen,52.63
ceval-middle_school_chemistry,167a15,accuracy,gen,80.00
ceval-veterinary_medicine,b4e08d,accuracy,gen,39.13
ceval-college_economics,f3f4e6,accuracy,gen,29.09
ceval-business_administration,c1614e,accuracy,gen,30.30
ceval-marxism,cf874c,accuracy,gen,84.21
ceval-mao_zedong_thought,51c7a4,accuracy,gen,70.83
ceval-education_science,591fee,accuracy,gen,62.07
ceval-teacher_qualification,4e4ced,accuracy,gen,77.27
ceval-high_school_politics,5c0de2,accuracy,gen,21.05
ceval-high_school_geography,865461,accuracy,gen,47.37
ceval-middle_school_politics,5be3e7,accuracy,gen,38.10
ceval-middle_school_geography,8a63be,accuracy,gen,58.33
ceval-modern_chinese_history,fc01af,accuracy,gen,65.22
ceval-ideological_and_moral_cultivation,a2aa4a,accuracy,gen,89.47
ceval-logic,f5b022,accuracy,gen,13.64
ceval-law,a110a1,accuracy,gen,37.50
ceval-chinese_language_and_literature,0f8b68,accuracy,gen,47.83
ceval-art_studies,2a1300,accuracy,gen,66.67
ceval-professional_tour_guide,4e673e,accuracy,gen,82.76
ceval-legal_professional,ce8787,accuracy,gen,30.43
ceval-high_school_chinese,315705,accuracy,gen,21.05
ceval-high_school_history,7eb30a,accuracy,gen,75.00
ceval-middle_school_history,48ab4a,accuracy,gen,68.18
ceval-civil_servant,87d061,accuracy,gen,38.30
ceval-sports_science,70f27b,accuracy,gen,63.16
ceval-plant_protection,8941f9,accuracy,gen,68.18
ceval-basic_medicine,c409d6,accuracy,gen,57.89
ceval-clinical_medicine,49e82d,accuracy,gen,45.45
ceval-urban_and_rural_planner,95b885,accuracy,gen,58.70
ceval-accountant,002837,accuracy,gen,34.69
ceval-fire_engineer,bc23f5,accuracy,gen,12.90
ceval-environmental_impact_assessment_engineer,c64e2d,accuracy,gen,38.71
ceval-tax_accountant,3a5e3c,accuracy,gen,42.86
ceval-physician,6e277d,accuracy,gen,51.02
ceval-stem,-,naive_average,gen,30.50
ceval-social-science,-,naive_average,gen,51.86
ceval-humanities,-,naive_average,gen,54.34
ceval-other,-,naive_average,gen,46.53
ceval-hard,-,naive_average,gen,11.63
ceval,-,naive_average,gen,43.04
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

-------------------------------------------------------------------------------------------------------------------------------- THIS IS A DIVIDER --------------------------------------------------------------------------------------------------------------------------------

raw format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-------------------------------
Model: opencompass.models.huggingface.HuggingFace_model_repos_internlm2-chat-7b
ceval-computer_network: {'accuracy': 47.368421052631575}
ceval-operating_system: {'accuracy': 57.89473684210527}
ceval-computer_architecture: {'accuracy': 38.095238095238095}
ceval-college_programming: {'accuracy': 18.91891891891892}
ceval-college_physics: {'accuracy': 5.263157894736842}
ceval-college_chemistry: {'accuracy': 0.0}
ceval-advanced_mathematics: {'accuracy': 0.0}
ceval-probability_and_statistics: {'accuracy': 11.11111111111111}
ceval-discrete_mathematics: {'accuracy': 18.75}
ceval-electrical_engineer: {'accuracy': 18.91891891891892}
ceval-metrology_engineer: {'accuracy': 50.0}
ceval-high_school_mathematics: {'accuracy': 0.0}
ceval-high_school_physics: {'accuracy': 31.57894736842105}
ceval-high_school_chemistry: {'accuracy': 26.31578947368421}
ceval-high_school_biology: {'accuracy': 26.31578947368421}
ceval-middle_school_mathematics: {'accuracy': 21.052631578947366}
ceval-middle_school_biology: {'accuracy': 66.66666666666666}
ceval-middle_school_physics: {'accuracy': 52.63157894736842}
ceval-middle_school_chemistry: {'accuracy': 80.0}
ceval-veterinary_medicine: {'accuracy': 39.130434782608695}
ceval-college_economics: {'accuracy': 29.09090909090909}
ceval-business_administration: {'accuracy': 30.303030303030305}
ceval-marxism: {'accuracy': 84.21052631578947}
ceval-mao_zedong_thought: {'accuracy': 70.83333333333334}
ceval-education_science: {'accuracy': 62.06896551724138}
ceval-teacher_qualification: {'accuracy': 77.27272727272727}
ceval-high_school_politics: {'accuracy': 21.052631578947366}
ceval-high_school_geography: {'accuracy': 47.368421052631575}
ceval-middle_school_politics: {'accuracy': 38.095238095238095}
ceval-middle_school_geography: {'accuracy': 58.333333333333336}
ceval-modern_chinese_history: {'accuracy': 65.21739130434783}
ceval-ideological_and_moral_cultivation: {'accuracy': 89.47368421052632}
ceval-logic: {'accuracy': 13.636363636363635}
ceval-law: {'accuracy': 37.5}
ceval-chinese_language_and_literature: {'accuracy': 47.82608695652174}
ceval-art_studies: {'accuracy': 66.66666666666666}
ceval-professional_tour_guide: {'accuracy': 82.75862068965517}
ceval-legal_professional: {'accuracy': 30.434782608695656}
ceval-high_school_chinese: {'accuracy': 21.052631578947366}
ceval-high_school_history: {'accuracy': 75.0}
ceval-middle_school_history: {'accuracy': 68.18181818181817}
ceval-civil_servant: {'accuracy': 38.297872340425535}
ceval-sports_science: {'accuracy': 63.1578947368421}
ceval-plant_protection: {'accuracy': 68.18181818181817}
ceval-basic_medicine: {'accuracy': 57.89473684210527}
ceval-clinical_medicine: {'accuracy': 45.45454545454545}
ceval-urban_and_rural_planner: {'accuracy': 58.69565217391305}
ceval-accountant: {'accuracy': 34.69387755102041}
ceval-fire_engineer: {'accuracy': 12.903225806451612}
ceval-environmental_impact_assessment_engineer: {'accuracy': 38.70967741935484}
ceval-tax_accountant: {'accuracy': 42.857142857142854}
ceval-physician: {'accuracy': 51.02040816326531}
ceval-stem: {'ceval-computer_network': 47.368421052631575, 'ceval-operating_system': 57.89473684210527, 'ceval-computer_architecture': 38.095238095238095, 'ceval-college_programming': 18.91891891891892, 'ceval-college_physics': 5.263157894736842, 'ceval-college_chemistry': 0.0, 'ceval-advanced_mathematics': 0.0, 'ceval-probability_and_statistics': 11.11111111111111, 'ceval-discrete_mathematics': 18.75, 'ceval-electrical_engineer': 18.91891891891892, 'ceval-metrology_engineer': 50.0, 'ceval-high_school_mathematics': 0.0, 'ceval-high_school_physics': 31.57894736842105, 'ceval-high_school_chemistry': 26.31578947368421, 'ceval-high_school_biology': 26.31578947368421, 'ceval-middle_school_mathematics': 21.052631578947366, 'ceval-middle_school_biology': 66.66666666666666, 'ceval-middle_school_physics': 52.63157894736842, 'ceval-middle_school_chemistry': 80.0, 'ceval-veterinary_medicine': 39.130434782608695, 'naive_average': 30.50061705625207}
ceval-social-science: {'ceval-college_economics': 29.09090909090909, 'ceval-business_administration': 30.303030303030305, 'ceval-marxism': 84.21052631578947, 'ceval-mao_zedong_thought': 70.83333333333334, 'ceval-education_science': 62.06896551724138, 'ceval-teacher_qualification': 77.27272727272727, 'ceval-high_school_politics': 21.052631578947366, 'ceval-high_school_geography': 47.368421052631575, 'ceval-middle_school_politics': 38.095238095238095, 'ceval-middle_school_geography': 58.333333333333336, 'naive_average': 51.86291158931812}
ceval-humanities: {'ceval-modern_chinese_history': 65.21739130434783, 'ceval-ideological_and_moral_cultivation': 89.47368421052632, 'ceval-logic': 13.636363636363635, 'ceval-law': 37.5, 'ceval-chinese_language_and_literature': 47.82608695652174, 'ceval-art_studies': 66.66666666666666, 'ceval-professional_tour_guide': 82.75862068965517, 'ceval-legal_professional': 30.434782608695656, 'ceval-high_school_chinese': 21.052631578947366, 'ceval-high_school_history': 75.0, 'ceval-middle_school_history': 68.18181818181817, 'naive_average': 54.340731439412956}
ceval-other: {'ceval-civil_servant': 38.297872340425535, 'ceval-sports_science': 63.1578947368421, 'ceval-plant_protection': 68.18181818181817, 'ceval-basic_medicine': 57.89473684210527, 'ceval-clinical_medicine': 45.45454545454545, 'ceval-urban_and_rural_planner': 58.69565217391305, 'ceval-accountant': 34.69387755102041, 'ceval-fire_engineer': 12.903225806451612, 'ceval-environmental_impact_assessment_engineer': 38.70967741935484, 'ceval-tax_accountant': 42.857142857142854, 'ceval-physician': 51.02040816326531, 'naive_average': 46.533350138807684}
ceval-hard: {'ceval-advanced_mathematics': 0.0, 'ceval-discrete_mathematics': 18.75, 'ceval-probability_and_statistics': 11.11111111111111, 'ceval-college_chemistry': 0.0, 'ceval-college_physics': 5.263157894736842, 'ceval-high_school_mathematics': 0.0, 'ceval-high_school_chemistry': 26.31578947368421, 'ceval-high_school_physics': 31.57894736842105, 'naive_average': 11.627375730994151}
ceval: {'ceval-computer_network': 47.368421052631575, 'ceval-operating_system': 57.89473684210527, 'ceval-computer_architecture': 38.095238095238095, 'ceval-college_programming': 18.91891891891892, 'ceval-college_physics': 5.263157894736842, 'ceval-college_chemistry': 0.0, 'ceval-advanced_mathematics': 0.0, 'ceval-probability_and_statistics': 11.11111111111111, 'ceval-discrete_mathematics': 18.75, 'ceval-electrical_engineer': 18.91891891891892, 'ceval-metrology_engineer': 50.0, 'ceval-high_school_mathematics': 0.0, 'ceval-high_school_physics': 31.57894736842105, 'ceval-high_school_chemistry': 26.31578947368421, 'ceval-high_school_biology': 26.31578947368421, 'ceval-middle_school_mathematics': 21.052631578947366, 'ceval-middle_school_biology': 66.66666666666666, 'ceval-middle_school_physics': 52.63157894736842, 'ceval-middle_school_chemistry': 80.0, 'ceval-veterinary_medicine': 39.130434782608695, 'ceval-college_economics': 29.09090909090909, 'ceval-business_administration': 30.303030303030305, 'ceval-marxism': 84.21052631578947, 'ceval-mao_zedong_thought': 70.83333333333334, 'ceval-education_science': 62.06896551724138, 'ceval-teacher_qualification': 77.27272727272727, 'ceval-high_school_politics': 21.052631578947366, 'ceval-high_school_geography': 47.368421052631575, 'ceval-middle_school_politics': 38.095238095238095, 'ceval-middle_school_geography': 58.333333333333336, 'ceval-modern_chinese_history': 65.21739130434783, 'ceval-ideological_and_moral_cultivation': 89.47368421052632, 'ceval-logic': 13.636363636363635, 'ceval-law': 37.5, 'ceval-chinese_language_and_literature': 47.82608695652174, 'ceval-art_studies': 66.66666666666666, 'ceval-professional_tour_guide': 82.75862068965517, 'ceval-legal_professional': 30.434782608695656, 'ceval-high_school_chinese': 21.052631578947366, 'ceval-high_school_history': 75.0, 'ceval-middle_school_history': 68.18181818181817, 'ceval-civil_servant': 38.297872340425535, 'ceval-sports_science': 63.1578947368421, 'ceval-plant_protection': 68.18181818181817, 'ceval-basic_medicine': 57.89473684210527, 'ceval-clinical_medicine': 45.45454545454545, 'ceval-urban_and_rural_planner': 58.69565217391305, 'ceval-accountant': 34.69387755102041, 'ceval-fire_engineer': 12.903225806451612, 'ceval-environmental_impact_assessment_engineer': 38.70967741935484, 'ceval-tax_accountant': 42.857142857142854, 'ceval-physician': 51.02040816326531, 'naive_average': 43.043391430358646}
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

uncle_ll

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值