多模型语言python_GPT2 多语言支持, 15亿参数中文预训练模型

logo.svg?sanitize=true

GPT2 for Multiple Languages

68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e73766768747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f696d6361737061722f677074322d6d6c68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f646f776e6c6f6164732f696d6361737061722f677074322d6d6c2f746f74616c68747470733a2f2f696d672e736869656c64732e696f2f62616467652f636f6e747269627574696f6e732d77656c636f6d652d627269676874677265656e2e7376673f7374796c653d666c617468747470733a2f2f696d672e736869656c64732e696f2f6769746875622f73746172732f696d6361737061722f677074322d6d6c3f7374796c653d736f6369616c

Simplifed GPT2 train scripts(based on Grover, supporting TPUs)

Ported bert tokenizer, multilingual corpus compatible

1.5B GPT2 pretrained Chinese model ( ~15G corpus, 10w steps )

Batteries-included Colab demo #

1.5B GPT2 pretrained Chinese model ( ~30G corpus, 22w steps )

Pretrained Model

Size

Language

Corpus

Vocab

Link

SHA256

1.5B parameters

Chinese

~30G

CLUE ( 8021 tokens )

e698cc97a7f5f706f84f58bb469d614e

51d3c0ce5f9ab9bf77e01e3fcb41d482

1.5B parameters

Chinese

~15G

Bert ( 21128 tokens )

4a6e5124df8db7ac2bdd902e6191b807

a6983a7f5d09fb10ce011f9a073b183e

Using Cloud TPU Pod v3-256 to train 22w steps

loss.png

Google Colab

With just 2 clicks (not including Colab auth process), the 1.5B pretrained Chinese model demo is ready to go:

demo.png

Train

Disclaimer

The contents in this repository are for academic research purpose, and we do not provide any conclusive remarks.

Citation

@misc{GPT2-ML,

author = {Zhibo Zhang},

title = {GPT2-ML: GPT-2 for Multiple Languages},

year = {2019},

publisher = {GitHub},

journal = {GitHub repository},

howpublished = {\url{https://github.com/imcaspar/gpt2-ml}},

}

Reference

Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值