蒸馏 (distill_Distill-BERT:使用BERT进行更智能的文本生成

蒸馏 (distill

The field of natural language processing is now in the age of large scale pretrained models being the first thing to try for almost any new task. Models like BERT, RoBERTa and ALBERT are so large and have been trained with so much data that they are able to generalize their pre-trained knowledge to understand any downstream tasks that you can use them for. But that’s all they can do — understand. If you wanted to answer a question that wasn’t multiple choice, write a story or an essay or anything that required free form writing, you’d be out of luck.

现在,自然语言处理领域已经进入大规模预训练模型时代,这是几乎可以尝试完成任何新任务的第一件事。 像BERT,RoBERTa和ALBERT这样的模型是如此之大,并且已经通过大量数据进行了训练,以至于它们能够推广其预训练的知识,以理解您可以使用它们的任何下游任务。 但这就是他们所能做的-了解。 如果您想回答一个非多项选择的问题,撰写故事或文章或任何需要自由形式撰写的内容,那么您将很不幸。

Now don’t get me wrong, just because BERT-like models can’t write stories, that doesn’t mean that there aren’t models out there that can. Introducing the Sequence to Sequence (Seq2Seq) model. When we write a story, we write the next word, sentence or even paragraph based on what we’ve written so far. This is exactly what Seq2Seq models are designed to do. They predict the most likely next word based on all the words they’ve seen so far by modeling them as a time series i.e. the order of the previous words matters.

现在不要误会我的意思,仅仅是因为类似BERT的模型不能写故事,这并不意味着就没有模型可以写。 介绍序列到序列(Seq2Seq)模型。 当我们写一个故事时,我们根据到目前为止所写的内容来写下一个单词,句子甚至段落。 这正是Seq2Seq模型的设计目标。 他们通过将它们建模为一个时间序列,即前一个单词的顺序很重要,从而根据到目前为止所看到的所有单词来预测最可能的下一个单词。

Seq2Seq models have been around for a while and there are several variants that are used for text generation tasks like summarization and translating one language to another. The exploration of Seq2Seq models has culminated in the development of models like GPT-2 and GPT-3, which can complete news snippets, stories, essays and even investment strategies — all from a few sentences of context! Forewarning though, not all these generated pieces of text make very much sense when you read them — probability distributions over words can only take you so far.

Seq2Seq模型已经存在了一段时间,并且有多种变体用于文本生成任务

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值