预训练 gpt2_用您自己的语言训练gpt 2

预训练 gpt2

We all know modern day Natural Language Processing (NLP) has progressed by leaps and bounds in the past couple of years following the development of attention networks and transformers. It paved the way for a plethora of new algorithms achieving State-Of-The-Art (SOTA) for the different tasks of NLP.

众所周知,随着注意力网络和转换器的发展,近几年来,自然语言处理(NLP)取得了长足的进步。 它为众多新算法铺平了道路,从而可以为NLP的不同任务提供最新技术(SOTA)。

OpenAI has been one of the leaders in providing their own language model (now released GPT-3) which is trained on a huge corpus of internet data. Since, GPT-3 is a recent phenomenon and in English at the moment, and is only accessible through API given by OpenAI, we shift our focus on the earlier version of it i.e. GPT-2. To know about the internal nuts and bolts of GPT-2, I’d suggest you to go through this link. For more depths into Attention and Transformers, here are some excellent links:

OpenAI一直是提供自己的语言模型(现已发布的GPT-3)的领导者之一,该模型在庞大的互联网数据集上进行了培训。 由于GPT-3是目前的一种新现象,目前仅以英语提供,并且只能通过OpenAI提供的API进行访问,因此我们将重点放在GPT-2的早期版本上。 要了解GPT-2的内部螺母和螺栓,建议您通过此链接 。 要深入了解“注意力”和“变形金刚”,以下是一些出色的链接:

GPT-2 was also released for English, which makes it difficult for someone trying to generate text in a different language.

GPT-2也发布了英语版本,这使得尝试生成其他语言的文本变得很困难。

So why not train your own GPT-2 model on your favourite language for text generation? That is exactly what we are going to do. So, without further ado, let us jump in.

那么,为什么不使用自己喜欢的语言训练自己的GPT-2模型来生成文本呢? 这正是我们要做的。 因此,事不宜迟,让我们跳进去。

For the demo, I have considered a non-Latin alphabet script (Bengali here), because why not!! I have used Huggingface’s implementation for the model.

对于演示,我考虑了非拉丁字母脚本(此处为孟加拉语),因为为什么不! 我已经为模型使用了Huggingface的实现。

1.收集数据。 (1. Gathering the data.)

Gathering good quality data is one of the most important stages as all Data Scientists would agree. So, we are going to assume that you already have a folder contai

  • 0
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值