Llama/Qwen/DeepSeek开源之争——CLiB开源大模型排行榜：推理与数学计算领域03.05

本文链接：https://blog.csdn.net/easyllm/article/details/146514443

开源模型综合能力见：Llama/Qwen/DeepSeek开源之争——CLiB开源大模型排行榜03.04。

以下为推理与数学计算领域排行榜：

输出价格单位：（元/M tok）

排名	大模型	机构	输出价格	推理与数学计算
1	DeepSeek-R1	深度求索	16	92.75
2	deepseek-chat-v3	深度求索	8	92.47
3	DeepSeek-R1-Distill-Qwen-32B	深度求索	1.3	90.11
4	DeepSeek-R1-Distill-Qwen-14B	深度求索	0.7	89.83
5	phi-4	微软	1	89.80
6	qwen2.5-72b-instruct	阿里巴巴	12	89.33
7	DeepSeek-R1-Distill-Llama-70B	深度求索	4.1	88.82
8	Llama-3.3-70B-Instruct	meta	4.1	87.42
9	qwq-32b-preview	阿里巴巴	7	87.41
10	qwen2.5-math-72b-instruct	阿里巴巴	12	87.03
11	Llama-3.3-70B-Instruct-fp8	meta	2.2	86.57
12	Hermes-3-Llama-3.1-405B	NousResearch	5.8	85.56
13	Meta-Llama-3.1-405B-Instruct	Meta	21	85.02
14	qwen2.5-32b-instruct	阿里巴巴	7	84.18
15	qwen2.5-14b-instruct	阿里巴巴	6	82.62
16	Llama-3.1-Nemotron-70B-Instruct-fp8	nvidia	2.2	81.31
17	DeepSeek-R1-Distill-Qwen-7B	深度求索	0.4	81.26
18	qwen2.5-7b-instruct	阿里巴巴	2	80.22
19	DeepSeek-R1-Distill-Llama-8B	深度求索	0.4	79.20
20	internlm2_5-20b-chat	上海人工智能实验室	1	77.07
21	Mistral-Nemo-Instruct-2407	Mistral	0.6	75.58
22	Yi-1.5-34B-Chat	零一万物	1.3	75.27
23	internlm2_5-7b-chat	上海人工智能实验室	0.4	74.42
24	glm-4-9b-chat	智谱AI	0.6	74.0
25	Llama-3.1-8B-Instruct	Meta	0.4	73.50
26	gemma-2-27b-it	Google	1.3	73.37
27	Meta-Llama-3.1-8B-Instruct-fp8	meta	0.4	72.66
28	qwen2.5-3b-instruct	阿里巴巴	0	72.18
29	DeepSeek-R1-Distill-Qwen-1.5B	深度求索	0.1	72.05
30	gemma-2-9b-it	Google	0.6	70.63
31	Llama-3.2-3B-Instruct	meta	0.2	69.88
32	Yi-1.5-9B-Chat	零一万物	0.4	60.87
33	qwen2.5-1.5b-instruct	阿里巴巴	0	49.60
34	Llama-3.2-1B-Instruct	meta	0.2	49.02
35	Mistral-7B-Instruct-v0.3	Mistral	0.4	48.65
36	qwen2.5-0.5b-instruct	阿里巴巴	0	46.03

推理与数学计算领域目前囊括6个维度：演绎推理，常识推理，符号推理BBH，算术能力，七八九年级数学，表格问答。

完整评测结果详见：https://github.com/jeinlee1991/chinese-llm-benchmark

往期文章

关于大模型评测EasyLLM：https://easyllm.site

最全——全球最全大模型产品评测平台，已囊括~200个大模型
最新——日更各个大模型各项能力指标评测，输出排行榜
最方便——无需注册/梯子，国内外各个大模型可一键评测
结果可见——所有大模型评测的方法、题集、过程、得分结果，可见可追溯！