AI大模型(LLM)、聊天机器人整理（持续更新）by pickmind_llm对话机器人对比单纯的检索机器人-CSDN博客

本文链接：https://blog.csdn.net/shanchuan2012/article/details/132800387

原文：https://blog.pickmind.xyz/article/3c87123f-d283-4a05-8e43-4ee8550cf22f
目录：

文章目录

国内获批大模型
国内大模型深渊图
Open-source Large Language Models Leaderboard（国外）
lmsys发布的大模型排行榜（国外）
**Open LLM Leaderboard （国外）**
****AlpacaEval Leaderboard（国外）****
CLUE1.1总排行榜（国内）
****CLiB中文大模型能力评测榜单（国内）****
**排行榜 - C-Eval （国内）**

国内获批大模型

产品名	公司	是否开源	获批时间	链接
文心一言	百度	否	2023-08-31	https://wenxin.baidu.com/
豆包｜云雀大模型	抖音	否	2023-08-31	https://www.doubao.com/login
GLM 大模型	智谱 AI	是	2023-08-31	https://chatglm.cn
紫东太初大模型	中科院	否	2023-08-31	https://xihe.mindspore.cn
百川大模型	百川智能	是	2023-08-31	https://baichuan-ai.com/home
日日新大模型	商汤	否	2023-08-31	https://sensetime.com/cn
ABAB 大模型	MiniMax	否	2023-08-31	https://api.minimax.chat
书生	上海人工智能实验室	否	2023-08-31	https://intern-ai.org.cn/
星火大模型	讯飞	否	2023-08-31	https://xinghuo.xfyun.cn/

国内大模型深渊图

在这里插入图片描述

出处：未知。

Open-source Large Language Models Leaderboard（国外）

https://accubits.com/large-language-models-leaderboard/

排行榜随时在变化，请点击链接查看最新排行榜。

Untitled Database

来源。

lmsys发布的大模型排行榜（国外）

来自于UC伯克利

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

排行榜随时在变化，请点击链接查看最新排行榜。

Model	⭐ Arena Elo rating	📈 MT-bench (score)	MMLU	License
https://openai.com/research/gpt-4	1193	8.99	86.4	Proprietary
https://www.anthropic.com/index/introducing-claude	1161	7.9	77	Proprietary
https://www.anthropic.com/index/claude-2	1134	8.06	78.5	Proprietary
https://www.anthropic.com/index/introducing-claude	1130	7.85	73.4	Proprietary
https://openai.com/blog/chatgpt	1118	7.94	70	Proprietary
https://huggingface.co/lmsys/vicuna-33b-v1.3	1097	7.12	59.2	Non-commercial
https://huggingface.co/meta-llama/Llama-2-70b-chat-hf	1060	6.86	63	Llama 2 Community
https://huggingface.co/WizardLM/WizardLM-13B-V1.2	1046	7.2	52.7	Llama 2 Community
https://huggingface.co/lmsys/vicuna-13b-v1.5	1046	6.57	55.8	Llama 2 Community
https://huggingface.co/mosaicml/mpt-30b-chat	1043	6.39	50.4	CC-BY-NC-SA-4.0
https://huggingface.co/timdettmers/guanaco-33b-merged	1036	6.53	57.6	Non-commercial
https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf	1032			Llama 2 Community
https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#foundation_models	1008	6.4		Proprietary
https://huggingface.co/lmsys/vicuna-7b-v1.5	1003	6.17	49.8	Llama 2 Community
https://huggingface.co/meta-llama/Llama-2-13b-chat-hf	999	6.65	53.6	Llama 2 Community
https://huggingface.co/meta-llama/Llama-2-7b-chat-hf	979	6.27	45.8	Llama 2 Community

Open LLM Leaderboard （国外）

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

排行榜随时在变化，请点击链接查看最新排行榜。

T	Model	Average ⬆️	ARC	HellaSwag	MMLU	TruthfulQA
🔶	https://huggingface.co/uni-tianyan/Uni-TianYan https://huggingface.co/datasets/open-llm-leaderboard/details_uni-tianyan__Uni-TianYan	73.81	72.1	87.4	69.91	65.81
🔶	https://huggingface.co/fangloveskari/ORCA_LLaMA_70B_QLoRA https://huggingface.co/datasets/open-llm-leaderboard/details_fangloveskari__ORCA_LLaMA_70B_QLoRA	73.4	72.27	87.74	70.23	63.37
🔶	https://huggingface.co/garage-bAInd/Platypus2-70B-instruct https://huggingface.co/datasets/open-llm-leaderboard/details_garage-bAInd__Platypus2-70B-instruct	73.13	71.84	87.94	70.48	62.26
🔶	https://huggingface.co/upstage/Llama-2-70b-instruct-v2 https://huggingface.co/datasets/open-llm-leaderboard/details_upstage__Llama-2-70b-instruct-v2	72.95	71.08	87.89	70.58	62.25
🔶	https://huggingface.co/fangloveskari/Platypus_QLoRA_LLaMA_70b https://huggingface.co/datasets/open-llm-leaderboard/details_fangloveskari__Platypus_QLoRA_LLaMA_70b	72.94	72.1	87.46	71.02	61.18

AlpacaEval Leaderboard（国外）

来自斯坦福

https://tatsu-lab.github.io/alpaca_eval/

排行榜随时在变化，请点击链接查看最新排行榜。

Model Name	Win Rate	Length
GPT-4https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4/model_outputs.json	95.28%	1365
https://ai.meta.com/llama/https://github.com/tatsu-lab/alpaca_eval/blob/main/results/llama-2-70b-chat-hf/model_outputs.json	92.66%	1790
Claude 2https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-2/model_outputs.json	91.36%	1069
https://github.com/imoneoi/openchathttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/openchat-v3.1-13b/model_outputs.json	89.49%	1484
ChatGPThttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/chatgpt/model_outputs.json	89.37%	827
https://huggingface.co/WizardLM/WizardLM-13B-V1.2https://github.com/tatsu-lab/alpaca_eval/blob/main/results/wizardlm-13b-v1.2/model_outputs.json	89.17%	1635
https://huggingface.co/lmsys/vicuna-33b-v1.3https://github.com/tatsu-lab/alpaca_eval/blob/main/results/vicuna-33b-v1.3/model_outputs.json	88.99%	1479
Claudehttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude/model_outputs.json	88.39%	1082
https://arxiv.org/abs/2308.06259https://github.com/tatsu-lab/alpaca_eval/blob/main/results/humpback-llama2-70b/model_outputs.json	87.94%	1822
https://huggingface.co/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16https://github.com/tatsu-lab/alpaca_eval/blob/main/results/openbuddy-llama2-70b-v10.1/model_outputs.json	87.67%	1077

CLUE1.1总排行榜（国内）

https://www.cluebenchmarks.com/rank.html

排行榜随时在变化，请点击链接查看最新排行榜。

排行	模型	研究机构	测评时间	Score1.1	认证	AFQMC	TNEWS1.1	IFLYTEK	OCNLI_50K	WSC1.1	CSL	CMRC2018	CHID1.1	C3 1.1
1	玉言	网易伏羲	23-07-31	87.050	待认证	86.45	74.04	67.96	86.33	95.73	97.6	84.25	95.956	95.138
2	HunYuan-NLP 1T	腾讯混元AI大模型团队	22-11-26	86.918	待认证	85.11	70.44	67.54	86.5	96	96.2	87.9	98.848	93.723
3	通义-AliceMind	达摩院NLP	22-11-22	86.685	待认证	84.07	73.47	67.42	85.87	94.33	95.03	86.8	99.208	93.969
4	HUMAN	CLUE	19-12-01	86.678	已认证	81	71	80.3	90.3	98	84	92.4	87.10	96.00
5	CHAOS	OPPO研究院融智团队	22-11-09	86.552	待认证	83.37	73.22	65.81	86.37	94.6	95.7	87.2	99.217	93.477
6	WenJin	Meituan NLP	22-10-20	86.313	待认证	84.49	73.04	64.38	86.23	94.44	95.67	86.25	98.898	93.415
7	OBERT	OPPO小布助手	22-11-07	84.783	待认证	81.02	67.75	66	84.53	91.3	99.93	84.05	97.578	90.892
8	HunYuan_nlp	腾讯TEG	22-05-11	84.730	待认证	83.37	64.01	66.58	85.23	92.27	93.87	87.9	98.512	90.831
9	ShenNonG	云小微AI	21-12-01	84.351	待认证	82.57	65.56	64.42	85.97	94.21	91.23	86.5	97.932	90.769
10	ShenZhou	QQ浏览器实验室(QQ Browser Lab)	21-09-19	83.873	待认证	80.55	65.36	67.65	86.37	89.08	90.97	87.85	97.923	89.108

CLiB中文大模型能力评测榜单（国内）

https://github.com/jeinlee1991/chinese-llm-benchmark

排行榜随时在变化，请点击链接查看最新排行榜。

类别	大模型	总分	排名
商用	gpt4	95.8	1
商用	chatgpt-3.5	93.8	2
商用	文心一言v2.2	88.3	3
商用	商汤senseChat	83.2	4
开源	BELLE-Llama2-13B-chat-0.4M	80.0	5
开源	belle-llama-13b-2m	79.2	6
商用	Baichuan-53B	79.0	7
商用	讯飞星火v1.5	77.7	8
商用	360智脑	77.0	9
商用	chatglm官方	76.9	10

排行榜 - C-Eval （国内）

https://cevalbenchmark.com/static/leaderboard_zh.html

排行榜随时在变化，请点击链接查看最新排行榜。

#	模型名称	发布机构	提交时间	平均	平均(Hard)	STEM	社会科学	人文科学	其他
0	https://cevalbenchmark.com/static/model_zh.html?method=%E4%BA%91%E5%A4%A9%E4%B9%A6	深圳云天算法技术有限公司	2023/8/31	77.1	55.2	70.4	88	78.6	77.9
1	https://cevalbenchmark.com/static/model_zh.html?method=Galaxy	Zuoyebang	2023/8/23	73.7	60.5	71.4	86	71.6	68.8
2	https://cevalbenchmark.com/static/model_zh.html?method=YaYi	中科闻歌	2023/9/4	71.8	60.3	70.6	81.3	71.5	65.8
3	https://cevalbenchmark.com/static/model_zh.html?method=AiLMe-100B%20v3	APUS	2023/9/4	71.6	57.9	68.5	72.3	71.2	77
4	https://cevalbenchmark.com/static/model_zh.html?method=Mengzi	澜舟科技	2023/8/25	71.5	48.8	62.3	87.2	76.8	68.6
5	https://cevalbenchmark.com/static/model_zh.html?method=DFM2.0	AISpeech & SJTU	2023/9/2	71.2	46.1	59.1	80.5	75.5	80.3
6	https://cevalbenchmark.com/static/model_zh.html?method=ChatGLM2	Tsinghua & Zhipu.AI	2023/6/25	71.1	50	64.4	81.6	73.7	71.3
7	https://cevalbenchmark.com/static/model_zh.html?method=UniGPT2.0%EF%BC%88%E5%B1%B1%E6%B5%B7%EF%BC%89	Unisound（云知声）	2023/8/28	70	52.8	65.7	78.7	67	72.9
8	https://cevalbenchmark.com/static/model_zh.html?method=360GPT-S2	360	2023/8/29	69	42	59.4	82	70.6	72.9
9	https://cevalbenchmark.com/static/model_zh.html?method=InternLM-123B	Shanghai AI Lab & SenseTime	2023/8/22	68.8	50	63.5	81.4	72.7	63
10	https://cevalbenchmark.com/static/model_zh.html?method=GPT-4*	OpenAI	2023/5/15	68.7	54.9	67.1	77.6	64.5	67.8