【NLP】高质量的中文预训练模型集合

 🔎大家好,我是Sonhhxg_柒,希望你看完之后,能对你有所帮助,不足请指正!共同学习交流🔎

📝个人主页-Sonhhxg_柒的博客_CSDN博客 📃

🎁欢迎各位→点赞👍 + 收藏⭐️ + 留言📝​

📣系列专栏 - 机器学习【ML】 自然语言处理【NLP】  深度学习【DL】

 🖍foreword

✔说明⇢本人讲解主要包括Python、机器学习(ML)、深度学习(DL)、自然语言处理(NLP)等内容。

如果你对这个系列感兴趣的话,可以关注订阅哟👋

文章目录

预训练中文 NLP 模型

目录

 NLU系列

BERT

ChineseBERT

RoBERTa

ALBERT

NEZHA

MacBERT

WoBERT

XLNET

ELECTRA

ZEN

ERNIE

ERNIE3

RoFormer

StructBERT

Lattice-BERT

Mengzi-BERT

Bloom

TaCL

MC-BERT

二郎神

PERT

MobileBERT

GAU-α

DeBERTa

GlyphBERT

CKBERT

LERT

NLG系列

GPT

GPT-3

NEZHA-Gen

CPM-Generate

T5

T5-PEGASUS

Mengzi-T5

PanGu-Alpha

EVA

BART

闻仲

余元

RWKV

NLU-NLG系列

UniLM

Simbert

RoFormer-sim

周文王

CPM-2

CPT

GLM

PLUG

OPD

Multi-Modal

WenLan

CogView

紫东太初

Mengzi-oscar

R2D2

Chinese-CLIP

TaiYi-CLIP

AltCLIP

AltDiffusion

Taiyi-Stable-Diffusion

wukong

Table

SDCUP

更新


预训练中文 NLP 模型

在自然语言处理领域中,预训练语言模型(Pretrained Language Models)已成为非常重要的基础技术,本仓库主要收集目前网上公开的一些高质量中文预训练模型(感谢分享资源的大佬),并将持续更新......

: 🤗huggingface模型下载地址: 1. huggingface官方地址

目录

 NLU系列

BERT

  • 2018 | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Jacob Devlin, et al. | arXiv | PDF
  • 2019 | Pre-Training with Whole Word Masking for Chinese BERT | Yiming Cui, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
BERT-BasebaseGoogle DriveGoogle Researchgithub通用
BERT-wwmbase

Google Drive
讯飞云-07Xj

Google DriveYiming Cuigithub通用
BERT-wwm-extbase

Google Drive
讯飞云-4cMG

Google DriveYiming Cuigithub通用
bert-base-民事base阿里云THUNLPgithub司法
bert-base-刑事base阿里云THUNLPgithub司法
BAAI-JDAI-BERTbase京东云JDAIgithub电商客服对话
FinBERTbase

Google Drive
百度网盘-1cmp

Google Drive
百度网盘-986f

Value Simplexgithub金融科技领域
EduBERTbase好未来AI好未来AItal-techgithub教育领域
guwenbert-basebase

百度网盘-4jng
huggingface

Ethangithub古文领域
guwenbert-largelarge

百度网盘-m5sz
huggingface

Ethangithub古文领域
BERT-CCPoemsmallthunlpTHUNLP-AIPoetgithub古典诗歌

备注:

wwm全称为**Whole Word Masking **,一个完整的词的部分WordPiece子词被mask,则同属该词的其他部分也会被mask

ext表示在更多数据集下训练

ChineseBERT

  • 2021 | ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information | Zijun Sun, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
ChineseBERTbasehuggingfaceShannonAIgithub通用
ChineseBERTlargehuggingfaceShannonAIgithub通用

RoBERTa

  • 2019 | RoBERTa: A Robustly Optimized BERT Pretraining Approach | Yinhan Liu, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
RoBERTa-tiny-cluetinyGoogle Drive百度网盘-8qvbCLUEgithub通用
RoBERTa-tiny-pairtinygoogle drive百度网盘-8qvbCLUEgithub通用
RoBERTa-tiny3L768-cluetinyGoogle DriveCLUEgithub通用
RoBERTa-tiny3L312-cluetinygoogle drive百度网盘-8qvbCLUEgithub通用
RoBERTa-large-pairlargeGoogle Drive百度网盘-8qvbCLUEgithub通用
RoBERTa-large-cluelargegoogle drive百度网盘-8qvbCLUEgithub通用
RBT33层base

Google Drive
讯飞云-b9nx

Google DriveYiming Cuigithub通用
RBTL33层large

Google Drive
讯飞云-vySW

Google DriveYiming Cuigithub通用
RBTL44层large讯飞云-e8dNYiming Cuigithub通用
RBTL66层large讯飞云-XNMAYiming Cuigithub通用
RoBERTa-wwm-extbase

Google Drive
讯飞云-Xe1p

Google DriveYiming Cuigithub通用
RoBERTa-wwm-ext-largelarge

Google Drive
讯飞云-u6gC

Google DriveYiming Cuigithub通用
RoBERTa-basebase

Google Drive
百度网盘

Google Drive
百度网盘

brightmartgithub通用
RoBERTa-Largelarge

Google Drive
百度网盘

Google Drivebrightmartgithub通用
RoBERTa-tinytinyhuggingfacehuggingfaceDBIIR @ RUCUER通用
RoBERTa-miniminihuggingfacehuggingfaceDBIIR @ RUCUER通用
RoBERTa-smallsmallhuggingfacehuggingfaceDBIIR @ RUCUER通用
RoBERTa-mediummediumhuggingfacehuggingfaceDBIIR @ RUCUER通用
RoBERTa-basebasehuggingfacehuggingfaceDBIIR @ RUCUER通用

ALBERT

  • 2019 | ALBERT: A Lite BERT For Self-Supervised Learning Of Language Representations | Zhenzhong Lan, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
Albert_tinytinyGoogle DriveGoogle Drivebrightmartgithub通用
Albert_base_zhbaseGoogle DriveGoogle Drivebrightmartgithub通用
Albert_large_zhlargeGoogle DriveGoogle Drivebrightmartgithub通用
Albert_xlarge_zhxlargeGoogle DriveGoogle Drivebrightmartgithub通用
Albert_basebaseGoogle DriveGoogle Researchgithub通用
Albert_largelargeGoogle DriveGoogle Researchgithub通用
Albert_xlargexlargeGoogle DriveGoogle Researchgithub通用
Albert_xxlargexxlargeGoogle DriveGoogle Researchgithub通用

NEZHA

  • 2019 | NEZHA: Neural Contextualized Representation for Chinese Language Understanding | Junqiu Wei, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
NEZHA-basebase

Google Drive
百度网盘-ntn3

lonePatientHUAWEIgithub通用
NEZHA-base-wwmbase

Google Drive
百度网盘-f68o

lonePatientHUAWEIgithub通用
NEZHA-largelarge

Google Drive
百度网盘-7thu

lonePatientHUAWEIgithub通用
NEZHA-large-wwmlarge

Google Drive
百度网盘-ni4o

lonePatientHUAWEIgithub通用

WoNEZHA
(word-base)

base百度网盘-qgkqZhuiyiTechnologygithub通用

MacBERT

  • 2020 | Revisiting Pre-Trained Models for Chinese Natural Language Processing | Yiming Cui, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
MacBERT-basebase

Google Drive
讯飞云-E2cP

Yiming Cuigithub通用
MacBERT-largelarge

Google Drive
讯飞云-3Yg3

Yiming Cuigithub通用

WoBERT

  • 2020 | 提速不掉点:基于词颗粒度的中文WoBERT | 苏剑林. | spaces | Blog post
模型版本TensorFlowPyTorch作者源地址应用领域
WoBERTbase百度网盘-kim2ZhuiyiTechnologygithub通用
WoBERT-plusbase百度网盘-aedwZhuiyiTechnologygithub通用

XLNET

  • 2019 | XLNet: Generalized Autoregressive Pretraining for Language Understanding | Zhilin Yang, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
XLNet-basebase

Google Drive
讯飞云-uCpe

Google DriveYiming Cuigithub通用
XLNet-midmiddle

Google Drive
讯飞云-68En

Google DriveYiming Cuigithub通用
XLNet_zh_Largelarge百度网盘brightmartgithub通用

ELECTRA

  • 2020 | ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators | Kevin Clark, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
ELECTRA-180g-largelarge

Google Drive
讯飞云-Yfcy

Yiming Cuigithub通用
ELECTRA-180g-small-exsmall

Google Drive
讯飞云-GUdp

Yiming Cuigithub通用
ELECTRA-180g-basebase

Google Drive
讯飞云-Xcvm

Yiming Cuigithub通用
ELECTRA-180g-smallsmall

Google Drive
讯飞云-qsHj

Yiming Cuigithub通用
legal-ELECTRA-largelarge

Google Drive
讯飞云-7f7b

Yiming Cuigithub司法领域
legal-ELECTRA-basebase

Google Drive
讯飞云-7f7b

Yiming Cuigithub司法领域
legal-ELECTRA-smallsmall

Google Drive
讯飞云-7f7b

Yiming Cuigithub司法领域
ELECTRA-tinytiny

Google Drive
百度网盘-rs99

CLUEgithub通用

ZEN

  • 2019 | ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations | Shizhe Diao, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
ZEN-Basebase

Google Drive
百度网盘

Sinovation Ventures AI Institutegithub通用
Erlangshen-ZEN2largehuggingfaceIDEA-CCNLgithub通用

ERNIE

  • 2019 | ERNIE: Enhanced Representation through Knowledge Integration | Yu Sun, et al. | arXiv | PDF

  • 2020 | SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis | Hao Tian, et al. | arXiv | PDF

  • 2020 | ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding | Dongling Xiao, et al. | arXiv | PDF

模型版本PaddlePaddlePyTorch作者源地址应用领域
ernie-1.0-basebaselinkPaddlePaddlegithub通用
ernie_1.0_skep_largelargelinkBaidugithub情感分析
ernie-grambaselinkBaidugithub通用

备注:

PaddlePaddle转TensorFlow可参考: tensorflow_ernie

PaddlePaddle转PyTorch可参考: ERNIE-Pytorch

ERNIE3

  • 2021 | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | Yu Sun, et al. | arXiv | PDF

  • 2021 | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | Shuohuan Wang, et al. | arXiv | PDF

模型版本PaddlePaddlePyTorch作者源地址应用领域
ernie-3.0-base12-layer, 768-hidden, 12-headslinkhuggingfacePaddlePaddlegithub通用
ernie-3.0-medium6-layer, 768-hidden, 12-headslinkhuggingfacePaddlePaddlegithub通用
ernie-3.0-mini6-layer, 384-hidden, 12-headslinkhuggingfacePaddlePaddlegithub通用
ernie-3.0-micro4-layer, 384-hidden, 12-headslinkhuggingfacePaddlePaddlegithub通用
ernie-3.0-nano4-layer, 312-hidden, 12-headslinkhuggingfacePaddlePaddlegithub通用

PaddlePaddle转PyTorch可参考: ERNIE-Pytorch

RoFormer

  • 2021 | RoFormer: Enhanced Transformer with Rotary Position Embedding | Jianlin Su, et al. | arXiv | PDF

  • 2021 | Transformer升级之路:2、博采众长的旋转式位置编码 | 苏剑林. | spaces | Blog post

模型版本TensorFlowPyTorch作者源地址应用领域
roformerbase(L12)百度网盘-xy9xZhuiyiTechnologygithub通用
roformersmall(L6)百度网盘-gy97ZhuiyiTechnologygithub通用
roformer-charbase(L12)百度网盘-bt94ZhuiyiTechnologygithub通用
roformerV2small(L6)百度网盘-ttn4追一ZhuiyiTechnologygithub通用
roformerV2base(L12)百度网盘-pfoh追一ZhuiyiTechnologygithub通用
roformerV2large(L24)百度网盘-npfv追一ZhuiyiTechnologygithub通用

StructBERT

  • 2019 | StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | Wei Wang, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
StructBERTlarge(L24)阿里云Alibabagithub通用

Lattice-BERT

  • 2021 | Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models | Yuxuan Lai, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
LatticeBERTtiny(L4)阿里云Alibabagithub通用
LatticeBERTsmall(L6)阿里云Alibabagithub通用
LatticeBERTbase(L12)阿里云Alibabagithub通用

Mengzi-BERT

  • 2021 | Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese | Zhuosheng Zhang, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
Mengzi-BERTbase(L12)huggingfaceLangboatgithub通用
Mengzi-BERT-finbase(L12)huggingfaceLangboatgithub金融财经

Bloom

  • 2022 | Bloom: BigScience Large Open-science Open-access Multilingual Language Model | huggingface bigscience | - | BLOG
模型版本TensorFlowPyTorch作者源地址应用领域
bloom-6b4-zh6B(L30)huggingfaceLangboat (作者另有bloom-389m-zh到bloom-2b5-zh等多个中文模型)github通用

TaCL

  • 2021 | TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning | Yixuan Su, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
TaCLbase(L12)huggingfaceyxuansugithub通用

MC-BERT

  • 2021 | MC-BERT: Conceptualized Representation Learning for Chinese Biomedical Text Mining | alibaba-research | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
MC-BERTbase(L12)linkalibaba-researchgithub生物医疗

二郎神

模型版本类型TensorFlowPyTorch作者源地址应用领域
Erlangshenlarge(L24)berthuggingfaceIDEA-CCNLgithub中文通用

PERT

  • 2022 | PERT: Pre-Training BERT with Permuted Language Model | Yiming Cui, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
PERT-basebase(12L)百度网盘-rcswhuggingfaceYiming Cuigithub通用
PERT-largelarge(24L)百度网盘-e9hshuggingfaceYiming Cuigithub通用

MobileBERT

  • 2020 | MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices | Zhiqing Sun, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
Chinese-MobileBERT-base-f2base百度网盘-56bjYiming Cuigithub通用
Chinese-MobileBERT-base-f4base百度网盘-v2v7Yiming Cuigithub通用
Chinese-MobileBERT-large-f2large百度网盘-6m5aYiming Cuigithub通用
Chinese-MobileBERT-large-f4large百度网盘-3h9bYiming Cuigithub通用

GAU-α

  • 2022 | GAU-α: (FLASH) Transformer Quality in Linear Time | Weizhe Hua, et al. | arXiv | PDF | blog
模型版本TensorFlowPyTorch作者源地址应用领域
chinese_GAU-alpha-char_L-24_H-768base下载ZhuiyiTechnologygithub通用

DeBERTa

  • 2020 | DeBERTa: Decoding-enhanced BERT with Disentangled Attention | Pengcheng He, et al. | arXiv | PDF |
模型版本TensorFlowPyTorch作者源地址应用领域
DeBERTa-v2-LargelargehuggingfaceIDEA-CCNLgithub通用
DeBERTa-v2-xLargexlargehuggingfaceIDEA-CCNLgithub通用
DeBERTa-v2basehuggingfaceIDEA-CCNLgithub通用

GlyphBERT

  • 2021 | GlyphCRM: Bidirectional Encoder Representation for Chinese Character with its Glyph | Yuxin li, et al. | arXiv | PDF |
模型版本TensorFlowPyTorch作者源地址应用领域
GlyphCRM-basebasehuggingfaceHITsz-TMGgithub通用

CKBERT

  • 2022 | Revisiting and Advancing Chinese Natural Language Understanding with Accelerated Heterogeneous Knowledge Pre-training | Zhang, Taolin, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
pai-ckbert-base-zhbasehuggingfaceAlibabagithub通用
pai-ckbert-large-zhlargehuggingfaceAlibabagithub通用
pai-ckbert-huge-zhhugehuggingfaceAlibabagithub通用

LERT

  • 2022 | LERT: A Linguistically-motivated Pre-trained Language Model | Yiming Cui et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
Chinese-LERT-small15m百度网盘-4vuyhuggingfaceYiming Cuigithub通用
Chinese-LERT-base400m百度网盘-9jgihuggingfaceYiming Cuigithub通用
Chinese-LERT-large1.2G百度网盘-s82thuggingfaceYiming Cuigithub通用

NLG系列

GPT

  • 2019 | Improving Language Understandingby Generative Pre-Training | Alec Radford, et al. | arXiv | PDF

  • 2019 | Language Models are Unsupervised Multitask Learners | Alec Radford, et al. | arXiv | PDF

模型版本TensorFlowPyTorch作者源地址应用领域
GPT230亿语料

Google Drive
百度网盘-ffz6

Caspar ZHANGgpt2-ml通用
GPT215亿语料

Google Drive
百度网盘-q9vr

Caspar ZHANGgpt2-ml通用
CDial-GPTLCCC-basebasehuggingfacethu-coaiCDial-GPT中文对话
CDial-GPT2LCCC-basebasehuggingfacethu-coaiCDial-GPT中文对话
CDial-GPTLCCC-largelargehuggingfacethu-coaiCDial-GPT中文对话
GPT2-dialoguebase

Google Drive
百度网盘-osi6

yangjianxin1GPT2-chitchat闲聊对话
GPT2-mmibase

Google Drive
百度网盘-1j88

yangjianxin1GPT2-chitchat闲聊对话
GPT2-散文模型base

Google Drive
百度网盘-fpyu

Zeyao DuGPT2-Chinese散文
GPT2-诗词模型base

Google Drive
百度网盘-7fev

Zeyao DuGPT2-Chinese诗词
GPT2-对联模型base

Google Drive
百度网盘-i5n0

Zeyao DuGPT2-Chinese对联
roformer-gptbase(L12)百度网盘-2nnnZhuiyiTechnologygithub通用

GPT-3

  • 2019 | Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | Zihang Dai, et al. | arXiv | PDF

  • 2020 | Language Models are Few-Shot Learners | Tom B. Brown, et al. | arXiv | PDF

模型版本介绍PyTorch作者源地址应用领域
Chinese-Transformer-XL29亿参数(GPT-3)项目首页模型下载THUDMgithub通用

NEZHA-Gen

  • 2019 | NEZHA: Neural Contextualized Representation for Chinese Language Understanding | Junqiu Wei, et al. | arXiv | PDF

  • 2019 | Improving Language Understandingby Generative Pre-Training | Alec Radford, et al. | arXiv | PDF

模型版本TensorFlowPyTorch作者源地址应用领域
NEZHA-Genbase

Google Drive
百度网盘-rb5m

HUAWEIgithub通用
NEZHA-Genbase

Google Drive
百度网盘-ytim

HUAWEIgithub诗歌

CPM-Generate

  • 2020 | CPM: A Large-scale Generative Chinese Pre-trained Language Model | Zhengyan Zhang, et al. | arXiv | PDF
模型版本资源PyTorch作者源地址应用领域
CPM26亿参数项目首页模型下载Tsinghua AIgithub通用

备注:

PyTorch转TensorFlow可参考: CPM-LM-TF2

PyTorch转PaddlePaddle可参考: CPM-Generate-Paddle

T5

  • 2019 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Colin Raffel, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
T5smallhuggingfacehuggingfaceDBIIR @ RUCUER通用

T5-PEGASUS

  • 2019 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Colin Raffel, et al. | arXiv | PDF

  • 2019 | PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization | Jingqing Zhang, et al. | arXiv | PDF

  • 2021 | T5 PEGASUS:开源一个中文生成式预训练模型 | 苏剑林. | spaces | Blog post

模型版本KerasPyTorch作者源地址应用领域
T5 PEGASUSbase百度网盘-3sfnZhuiyiTechnologygithub通用
T5 PEGASUSsmall百度网盘-qgukZhuiyiTechnologygithub通用

Keras转PyTorch可参考: t5-pegasus-pytorch

Mengzi-T5

  • 2021 | Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese | Zhuosheng Zhang, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
Mengzi-T5base(L12)huggingfaceLangboatgithub通用

PanGu-Alpha

  • 2021 | PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation | Wei Zeng, et al. | arXiv | PDF
模型版本资源下载地址作者源地址应用领域
盘古α-2.6B2.6G项目首页模型下载PCL-Platform.Intelligencegithub通用
盘古α-13B12G项目首页模型下载PCL-Platform.Intelligencegithub通用
盘古α-2.6B pytorch版本2.6G项目首页模型下载PCL-Platform.Intelligencegithub通用
盘古α-13B pytorch版本12G项目首页模型下载PCL-Platform.Intelligencegithub通用

EVA

  • 2021 | EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training | Hao Zhou, et al. | arXiv | PDF
模型版本介绍模型下载作者源地址应用领域备注
EVA28亿参数项目首页模型下载thu-coaigithub中文开放域对话需要登陆才能下载
EVA2.0-xLargexlarge项目首页huggingfacethu-coaigithub中文开放域对话
EVA2.0-largelarge项目首页huggingfacethu-coaigithub中文开放域对话
EVA2.0-basebase项目首页huggingfacethu-coaigithub中文开放域对话

BART

  • 2019 | BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | Mike Lewis, et al. | arxiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
BART-basebasehuggingfacefastNLPgithub中文通用
BART-largelargehuggingfacefastNLPgithub中文通用

闻仲

模型版本类型TensorFlowPyTorch作者源地址应用领域
Wenzhonglarge(L24)GPT2huggingfaceIDEA-CCNLgithub中文通用

余元

模型版本类型TensorFlowPyTorch作者源地址应用领域
Yuyuanlarge(L24)GPT2huggingfaceIDEA-CCNLgithub医学领域

RWKV

模型版本类型TensorFlowPyTorch作者源地址应用领域
RWKVbase(L12)类似GPT-2githubPENG Bogithub小说

NLU-NLG系列

UniLM

  • 2019 | Unified Language Model Pre-training for Natural Language Understanding and Generation | Li Dong, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
Unilmbase百度网盘-tblr百度网盘-etwfYunwenTechnologygithub通用

Simbert

  • 2020 | 鱼与熊掌兼得:融合检索和生成的SimBERT模型 | 苏剑林. | spaces | Blog post
模型版本TensorFlowPyTorch作者源地址应用领域
SimBERT Tinytiny百度网盘-1tp7ZhuiyiTechnologygithub通用
SimBERT Smallsmall百度网盘-nu67ZhuiyiTechnologygithub通用
SimBERT Basebase百度网盘-6xhqZhuiyiTechnologygithub通用

RoFormer-sim

  • 2021 | SimBERTv2来了!融合检索和生成的RoFormer-Sim模型 | 苏剑林. | spaces | Blog post
模型版本TensorFlowPyTorch作者源地址应用领域
roformer-simbase(L12)百度网盘-2cgzZhuiyiTechnologygithub通用
roformer-simsmall(L6)百度网盘-h68qZhuiyiTechnologygithub通用
roformer-sim-v2base(L12)百度网盘-w15nZhuiyiTechnologygithub通用

周文王

模型版本类型TensorFlowPyTorch作者源地址应用领域
Zhouwenwangbase(L12)roformerhuggingfaceIDEA-CCNLgithub中文通用
Zhouwenwanglarge(L24)roformerhuggingfaceIDEA-CCNLgithub中文通用

CPM-2

  • 2021 | CPM-2: Large-scale Cost-effective Pre-trained Language Models | Zhengyan Zhang, et al. | arXiv | PDF
模型版本介绍模型下载作者源地址应用领域备注
CPM-2110亿参数项目首页模型下载BAAI-WuDaogithub通用需要申请才能下载
CPM-2100亿参数项目首页模型下载BAAI-WuDaogithub中英需要申请才能下载
CPM-21980亿参数项目首页模型下载BAAI-WuDaogithub中英需要申请才能下载

CPT

  • 2021 | CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation | Yunfan Shao, et al. | arxiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
CPT-basebase(L12)huggingfacefastNLPgithub通用
CPT-largelarge(L24)huggingfacefastNLPgithub通用

GLM

  • 2022 | GLM: General Language Model Pretraining with Autoregressive Blank Infilling | Zhengxiao Du, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
GLMlarge地址THUDMgithub通用
GLMxxlarge地址THUDMgithub通用
GLM-130B130B地址THUDMgithub通用

PLUG

  • 2019 | StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | Wei Wang, et al. | arXiv | PDF
  • 2020 | PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation | Bin Bi, et al. | ACL| PDF
模型版本模型下载作者源地址应用领域
PLUGlarge(L24 27B)AliceMind-需要申请Alibabagithub通用

OPD

  • 2022 | 待定 | , et al. | arXiv | PDF
模型版本介绍模型下载作者源地址应用领域备注
OPD6.3B项目首页模型下载thu-coaigithub中文开放域对话需要申请才能下载

Multi-Modal

WenLan

  • 2021 | WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training | Yuqi Huo, et al. | arXiv | PDF
模型版本介绍模型下载作者源地址应用领域备注
BriVL(WenLan)10亿参数项目首页模型下载BAAI-WuDaogithub中文通用图文需要登陆才能下载

CogView

  • 2021 | CogView: Mastering Text-to-Image Generation via Transformers | Ming Ding, et al. | arXiv | PDF
模型版本介绍模型下载作者源地址应用领域备注
CogView40亿参数项目首页模型下载THUDMgithub中文多模态生成模型需要登陆才能下载

紫东太初

模型版本介绍模型下载作者源地址应用领域备注
紫东太初- light_vision_text项目首页模型下载中科院自动化所github中文图像-文本领域紫东太初多模态大模型中的图像-文本预训练模型
紫东太初-text[GPT]32亿参数项目首页百度网盘-nos5中科院自动化所github中文通用紫东太初多模态大模型中的文本预训练模型
紫东太初-vision项目首页模型下载中科院自动化所github视觉领域紫东太初多模态大模型中的视觉预训练模型
紫东太初-speech项目首页模型下载中科院自动化所github语音领域紫东太初多模态大模型中的语音检测与识别多任务模型

Mengzi-oscar

  • 2021 | Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese | Zhuosheng Zhang, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
Mengzi-oscarbase(L12)huggingfaceLangboatgithub中文多模态-图文

R2D2

  • 2022 | Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework | Chunyu Xie, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址首页应用领域
R2D2ViT-LlargeGoogleyuxie11githubzero中文多模态-图文
PRD2ViT-LlargeGoogleyuxie11githubzero中文多模态-图文

Chinese-CLIP

  • 2021 | Learning Transferable Visual Models From Natural Language Supervision | Alec Radford, et al. | arXiv | PDF
  • 2022 | Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese | An Yang, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
CN-CLIPRN5077MaliyuncsOFA-Sysgithub中文多模态-图文
CN-CLIPViT-B/16188MaliyuncsOFA-Sysgithub中文多模态-图文
CN-CLIPViT-L/14406MaliyuncsOFA-Sysgithub中文多模态-图文
CN-CLIPViT-L/14@336px407MaliyuncsOFA-Sysgithub中文多模态-图文
CN-CLIPViT-H/14958MaliyuncsOFA-Sysgithub中文多模态-图文

TaiYi-CLIP

  • 2021 | Learning Transferable Visual Models From Natural Language Supervision | Alec Radford, et al. | arXiv | PDF
  • 2022 | Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence | Junjie Wang, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
Taiyi-CLIP-Roberta-large-326M-ChinesebasehuggingfaceIDEA-CCNLgithub中文多模态-图文

AltCLIP

  • 2022 | AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | Chen, Zhongzhi, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
AltCLIP3.22GhuggingfaceFlagAIgithub中文多模态-图文

AltDiffusion

  • 2022 | AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | Chen, Zhongzhi, et al. | arXiv | PDF
  • 2022 | High-Resolution Image Synthesis With Latent Diffusion Models | Rombach, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
AltDiffusion8.0GhuggingfaceFlagAIgithub中文多模态-图文

Taiyi-Stable-Diffusion

  • 2022 | Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence | Junjie Wang, et al. | arXiv | PDF
  • 2022 | High-Resolution Image Synthesis With Latent Diffusion Models | Rombach, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
Taiyi-Stable-Diffusion1BhuggingfaceIDEA-CCNLgithub中文多模态-图文

wukong

  • 2022 | Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark | Jiaxi Gu, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
CLIPurlHUAWEIgithub中文多模态-图文
FILIPurlHUAWEIgithub中文多模态-图文
wukongurlHUAWEIgithub中文多模态-图文

Table

SDCUP

  • 2021 | Improving Text-to-SQL with Schema Dependency Learning | Binyuan Hui, et al. | arXiv | PDF
模型版本TensorFlowPyTorch作者源地址应用领域
sdcupbase阿里云Alibabagithub中文表格
sdcuplarge阿里云Alibabagithub中文表格

更新

  • 6
    点赞
  • 26
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
引用\[3\]中提到,随着ELMo、BERT等模型的发布,NLP领域进入了“大力出奇迹”的时代。采用大规模语料上进行无监督预训练的深层模型,在下游任务数据上微调一下,即可达到很好的效果。因此,调参方法可以通过以下步骤进行: 1. 选择合适的预训练模型:根据任务需求选择适合的预训练模型,如BERT、GPT等。 2. 调整模型参数:可以尝试调整模型的层数、隐藏单元数、注意力头数等超参数,以获得更好的性能。 3. 调整学习率:学习率是训练过程中一个重要的超参数,可以通过网格搜索或学习率衰减等方法来选择合适的学习率。 4. 数据增强:通过数据增强技术,如随机遮挡、词汇替换等,可以增加训练数据的多样性,提高模型的泛化能力。 5. 选择合适的优化器:不同的优化器对模型的训练效果有影响,可以尝试不同的优化器,如Adam、SGD等。 6. 提前停止训练:通过监控验证集上的性能指标,当模型在验证集上的性能不再提升时,可以提前停止训练,以避免过拟合。 总之,调参方法可以根据具体任务和数据集的特点进行灵活调整,通过不断尝试和优化,找到最佳的参数组合,以获得更好的预训练模型性能。 #### 引用[.reference_title] - *1* [新预训练模型CodeBERT出世,编程语言和自然语言都不在话下](https://blog.csdn.net/weixin_42137700/article/details/104518840)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* *3* [论文阅读笔记:《自然语言处理中的预训练模型》](https://blog.csdn.net/weixin_41089007/article/details/105397788)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值