最新国内大模型评估结果

C-Eval是一个多层面、多学科的中文评估套件,用于测试基础模型在STEM、社会科学、人文和其他领域的性能。leaderboard展示了不同模型在各种设置下的测试结果,包括零样本和少量样本。各公司和研究团队提交的结果表明,模型在不同任务上的表现各异,零样本并不总是优于少量样本。用户可以随时提交自己模型的测试结果。
摘要由CSDN通过智能技术生成

网址:Leaderboard | C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

Leaderboard - C-Eval

Results for different subjects and the average test results are shown below. The results are from either zero-shot or few-shot prompting ---- note that few-shot is not necessarily better than zero-shot, for example, zero-shot is better for many instruction-tuned models in our own runs. In cases we tested the models in both zero- and few-shot settings, we report the setting with higher overall average accuracy. (Model details including prompting format can be viewed by clicking into each model)

You are welcome to submit your model's test results to C-Eval at any time (either zero-shot or few-shot eval is fine). Click here to submit your results (your results will not be public on the leaderboard unless you request to do so).

(Note: * indicates that the model was evaluated by the C-Eval team, while other results are obtained through user submissions.)

#ModelCreatorSubmission DateAvgAvg(Hard)STEMSocial ScienceHumanitiesOthers
0ChatGLM2Tsinghua & Zhipu.AI2023/6/2571.15064.481.673.771.3
1GPT-4*OpenAI2023/5/1568.754.967.177.664.567.8
2SenseChatSenseTime2023/6/2066.145.15878.467.268.8
3AiLMe-100B v1APUS2023/7/1965.255.365.472.362.461.1
4InternLMSenseTime & Shanghai AI Laboratory (equal contribution)2023/6/162.74658.176.764.656.4
5Instruct-DLM-v2DeepLang AI2023/7/256.837.450.371.159.153.4
6DFM2.0AISpeech & SJTU2023/7/1055.438.347.564.658.758.2
7ChatGPT*OpenAI2023/5/1554.441.452.961.850.953.6
8Claude-v1.3*Anthropic2023/5/1554.23951.961.752.153.7
9TeleChat-EChina Telecom Corporation Ltd.2023/7/454.241.551.163.153.852.3
10CPMModelBest2023/7/554.137.547.262.758.454.8
11Baichuan-13BBaichuan2023/7/953.636.74766.857.349.8
12DLM-v2DeepLang AI2023/7/253.535.34764.756.452.1
13InternLM-7BShanghai AI Laboratory & SenseTime2023/7/552.837.14867.455.445.8
14ChatGLM2-6BTsinghua & Zhipu.AI2023/6/2451.737.148.660.551.349.8
15EduChatECNU2023/7/1849.333.143.559.353.746.6
16SageGPT4Paradigm Inc.2023/6/2149.139.146.654.645.851.8
17AndesLM-13BAndesLM2023/6/184629.738.1615141.9
18Claude-instant-v1.0*Anthropic2023/5/1545.935.543.153.844.245.4
19WestlakeLM-19BWestlake University and Westlake Xinchen(Scietrain)2023/6/1844.634.941.65144.344.5
20bloomz-mt-176B*BigScience2023/5/1544.330.8395347.742.7
21玉言Fuxi AI Lab, NetEase2023/6/2044.330.639.254.546.442.2
22GLM-130B*Tsinghua2023/5/154430.736.755.847.743
23baichuan-7BBaichuan2023/6/1442.831.538.25246.239.3
24CubeLM-13BCubeLM2023/6/1242.527.93652.445.841.8
25Chinese-Alpaca-33BCui, Yang, and Yao2023/6/741.630.33751.642.340.3
26Chinese-Alpaca-Plus-13BCui, Yang, and Yao2023/6/541.530.536.649.743.141.2
27ChatGLM-6B*Tsinghua & Zhipu.AI2023/5/1538.929.233.348.341.338
28LLaMA-65B*Meta2023/5/1538.831.737.845.636.137.1
29Chinese LLaMA-13B*Cui et al.2023/5/1533.327.331.637.233.632.8
30MOSS*Fudan2023/5/1533.128.431.63733.432.1
31Chinese Alpaca-13B*Cui et al.2023/5/1530.924.427.439.232.528
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值