AI大模型(LLM)、聊天机器人整理(持续更新)by pickmind

原文https://blog.pickmind.xyz/article/3c87123f-d283-4a05-8e43-4ee8550cf22f
目录:

国内获批大模型

产品名公司是否开源获批时间链接
文心一言百度2023-08-31https://wenxin.baidu.com/
豆包|云雀大模型抖音2023-08-31https://www.doubao.com/login
GLM 大模型智谱 AI2023-08-31https://chatglm.cn
紫东太初大模型中科院2023-08-31https://xihe.mindspore.cn
百川大模型百川智能2023-08-31https://baichuan-ai.com/home
日日新大模型商汤2023-08-31https://sensetime.com/cn
ABAB 大模型MiniMax2023-08-31https://api.minimax.chat
书生上海人工智能实验室2023-08-31https://intern-ai.org.cn/
星火大模型讯飞2023-08-31https://xinghuo.xfyun.cn/

国内大模型深渊图

在这里插入图片描述

出处:未知。

Open-source Large Language Models Leaderboard(国外)

https://accubits.com/large-language-models-leaderboard/

排行榜随时在变化,请点击链接查看最新排行榜。

Untitled Database

来源

lmsys发布的大模型排行榜(国外)

来自于UC伯克利

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

排行榜随时在变化,请点击链接查看最新排行榜。

Model⭐ Arena Elo rating📈 MT-bench (score)MMLULicense
https://openai.com/research/gpt-411938.9986.4Proprietary
https://www.anthropic.com/index/introducing-claude11617.977Proprietary
https://www.anthropic.com/index/claude-211348.0678.5Proprietary
https://www.anthropic.com/index/introducing-claude11307.8573.4Proprietary
https://openai.com/blog/chatgpt11187.9470Proprietary
https://huggingface.co/lmsys/vicuna-33b-v1.310977.1259.2Non-commercial
https://huggingface.co/meta-llama/Llama-2-70b-chat-hf10606.8663Llama 2 Community
https://huggingface.co/WizardLM/WizardLM-13B-V1.210467.252.7Llama 2 Community
https://huggingface.co/lmsys/vicuna-13b-v1.510466.5755.8Llama 2 Community
https://huggingface.co/mosaicml/mpt-30b-chat10436.3950.4CC-BY-NC-SA-4.0
https://huggingface.co/timdettmers/guanaco-33b-merged10366.5357.6Non-commercial
https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf1032Llama 2 Community
https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#foundation_models10086.4Proprietary
https://huggingface.co/lmsys/vicuna-7b-v1.510036.1749.8Llama 2 Community
https://huggingface.co/meta-llama/Llama-2-13b-chat-hf9996.6553.6Llama 2 Community
https://huggingface.co/meta-llama/Llama-2-7b-chat-hf9796.2745.8Llama 2 Community

Open LLM Leaderboard (国外)

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

排行榜随时在变化,请点击链接查看最新排行榜。

TModelAverage ⬆️ARCHellaSwagMMLUTruthfulQA
🔶https://huggingface.co/uni-tianyan/Uni-TianYan https://huggingface.co/datasets/open-llm-leaderboard/details_uni-tianyan__Uni-TianYan73.8172.187.469.9165.81
🔶https://huggingface.co/fangloveskari/ORCA_LLaMA_70B_QLoRA https://huggingface.co/datasets/open-llm-leaderboard/details_fangloveskari__ORCA_LLaMA_70B_QLoRA73.472.2787.7470.2363.37
🔶https://huggingface.co/garage-bAInd/Platypus2-70B-instruct https://huggingface.co/datasets/open-llm-leaderboard/details_garage-bAInd__Platypus2-70B-instruct73.1371.8487.9470.4862.26
🔶https://huggingface.co/upstage/Llama-2-70b-instruct-v2 https://huggingface.co/datasets/open-llm-leaderboard/details_upstage__Llama-2-70b-instruct-v272.9571.0887.8970.5862.25
🔶https://huggingface.co/fangloveskari/Platypus_QLoRA_LLaMA_70b https://huggingface.co/datasets/open-llm-leaderboard/details_fangloveskari__Platypus_QLoRA_LLaMA_70b72.9472.187.4671.0261.18

AlpacaEval Leaderboard(国外)

来自斯坦福

https://tatsu-lab.github.io/alpaca_eval/

排行榜随时在变化,请点击链接查看最新排行榜。

Model NameWin RateLength
GPT-4https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4/model_outputs.json95.28%1365
https://ai.meta.com/llama/https://github.com/tatsu-lab/alpaca_eval/blob/main/results/llama-2-70b-chat-hf/model_outputs.json92.66%1790
Claude 2https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-2/model_outputs.json91.36%1069
https://github.com/imoneoi/openchathttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/openchat-v3.1-13b/model_outputs.json89.49%1484
ChatGPThttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/chatgpt/model_outputs.json89.37%827
https://huggingface.co/WizardLM/WizardLM-13B-V1.2https://github.com/tatsu-lab/alpaca_eval/blob/main/results/wizardlm-13b-v1.2/model_outputs.json89.17%1635
https://huggingface.co/lmsys/vicuna-33b-v1.3https://github.com/tatsu-lab/alpaca_eval/blob/main/results/vicuna-33b-v1.3/model_outputs.json88.99%1479
Claudehttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude/model_outputs.json88.39%1082
https://arxiv.org/abs/2308.06259https://github.com/tatsu-lab/alpaca_eval/blob/main/results/humpback-llama2-70b/model_outputs.json87.94%1822
https://huggingface.co/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16https://github.com/tatsu-lab/alpaca_eval/blob/main/results/openbuddy-llama2-70b-v10.1/model_outputs.json87.67%1077

CLUE1.1总排行榜 (国内)

https://www.cluebenchmarks.com/rank.html

排行榜随时在变化,请点击链接查看最新排行榜。

排行模型研究机构测评时间Score1.1认证AFQMCTNEWS1.1IFLYTEKOCNLI_50KWSC1.1CSLCMRC2018CHID1.1C3 1.1
1玉言网易伏羲23-07-3187.050待认证86.4574.0467.9686.3395.7397.684.2595.95695.138
2HunYuan-NLP 1T腾讯混元AI大模型团队22-11-2686.918待认证85.1170.4467.5486.59696.287.998.84893.723
3通义-AliceMind达摩院NLP22-11-2286.685待认证84.0773.4767.4285.8794.3395.0386.899.20893.969
4HUMANCLUE19-12-0186.678已认证817180.390.3988492.487.1096.00
5CHAOSOPPO研究院融智团队22-11-0986.552待认证83.3773.2265.8186.3794.695.787.299.21793.477
6WenJinMeituan NLP22-10-2086.313待认证84.4973.0464.3886.2394.4495.6786.2598.89893.415
7OBERTOPPO小布助手22-11-0784.783待认证81.0267.756684.5391.399.9384.0597.57890.892
8HunYuan_nlp腾讯TEG22-05-1184.730待认证83.3764.0166.5885.2392.2793.8787.998.51290.831
9ShenNonG云小微AI21-12-0184.351待认证82.5765.5664.4285.9794.2191.2386.597.93290.769
10ShenZhouQQ浏览器实验室(QQ Browser Lab)21-09-1983.873待认证80.5565.3667.6586.3789.0890.9787.8597.92389.108

CLiB中文大模型能力评测榜单 (国内)

https://github.com/jeinlee1991/chinese-llm-benchmark

排行榜随时在变化,请点击链接查看最新排行榜。

类别大模型总分排名
商用gpt495.81
商用chatgpt-3.593.82
商用文心一言v2.288.33
商用商汤senseChat83.24
开源BELLE-Llama2-13B-chat-0.4M80.05
开源belle-llama-13b-2m79.26
商用Baichuan-53B79.07
商用讯飞星火v1.577.78
商用360智脑77.09
商用chatglm官方76.910

排行榜 - C-Eval (国内)

https://cevalbenchmark.com/static/leaderboard_zh.html

排行榜随时在变化,请点击链接查看最新排行榜。

#模型名称发布机构提交时间平均平均(Hard)STEM社会科学人文科学其他
0https://cevalbenchmark.com/static/model_zh.html?method=%E4%BA%91%E5%A4%A9%E4%B9%A6深圳云天算法技术有限公司2023/8/3177.155.270.48878.677.9
1https://cevalbenchmark.com/static/model_zh.html?method=GalaxyZuoyebang2023/8/2373.760.571.48671.668.8
2https://cevalbenchmark.com/static/model_zh.html?method=YaYi中科闻歌2023/9/471.860.370.681.371.565.8
3https://cevalbenchmark.com/static/model_zh.html?method=AiLMe-100B%20v3APUS2023/9/471.657.968.572.371.277
4https://cevalbenchmark.com/static/model_zh.html?method=Mengzi澜舟科技2023/8/2571.548.862.387.276.868.6
5https://cevalbenchmark.com/static/model_zh.html?method=DFM2.0AISpeech & SJTU2023/9/271.246.159.180.575.580.3
6https://cevalbenchmark.com/static/model_zh.html?method=ChatGLM2Tsinghua & Zhipu.AI2023/6/2571.15064.481.673.771.3
7https://cevalbenchmark.com/static/model_zh.html?method=UniGPT2.0%EF%BC%88%E5%B1%B1%E6%B5%B7%EF%BC%89Unisound(云知声)2023/8/287052.865.778.76772.9
8https://cevalbenchmark.com/static/model_zh.html?method=360GPT-S23602023/8/29694259.48270.672.9
9https://cevalbenchmark.com/static/model_zh.html?method=InternLM-123BShanghai AI Lab & SenseTime2023/8/2268.85063.581.472.763
10https://cevalbenchmark.com/static/model_zh.html?method=GPT-4*OpenAI2023/5/1568.754.967.177.664.567.8
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值