蒸馏 (distill
The field of natural language processing is now in the age of large scale pretrained models being the first thing to try for almost any new task. Models like BERT, RoBERTa and ALBERT are so large and have been trained with so much data that they are able to generalize their pre-trained knowledge to understand any downstream tasks that you can use them for. But that’s all they can do — understand. If you wanted to answer a question that wasn’t multiple choice, write a story or an essay or anything that required free form writing, you’d be out of luck.
现在,自然语言处理领域已经进入大规模预训练模型时代,这是几乎可以尝试完成任何新任务的第一件事。 像BERT,RoBERTa和ALBERT这样的模型是如此之大,并且已经通过大量数据进行了训练,以至于它们能够推广其预训练的知识,以理解您可以使用它们的任何下游任务。 但这就是他们所能做的-了解。 如果您想回答一个非多项选择的问题,撰写故事或文章或任何需要自由形式撰写的内容,那么您将很不幸。
Now don’t get me wrong, just because BERT-like models can’t write stories, that doesn’t mean that there aren’t models out there that can. Introducing the Sequence to Sequence (Seq2Seq) model. When we write a story, we write the next word, sentence or even paragraph based on what we’ve written so far. This is exactly what Seq2Seq models are designed to do. They predict the most likely next word based on all the words they’ve seen so far by modeling them as a time series i.e. the order of the previous words matters.
现在不要误会我的意思,仅仅是因为类似BERT的模型不能写故事,这并不意味着就没有模型可以写。 介绍序列到序列(Seq2Seq)模型。 当我们写一个故事时,我们根据到目前为止所写的内容来写下一个单词,句子甚至段落。 这正是Seq2Seq模型的设计目标。 他们通过将它们建模为一个时间序列,即前一个单词的顺序很重要,从而根据到目前为止所看到的所有单词来预测最可能的下一个单词。
Seq2Seq models have been around for a while and there are several variants that are used for text generation tasks like summarization and translating one language to another. The exploration of Seq2Seq models has culminated in the development of models like GPT-2 and GPT-3, which can complete news snippets, stories, essays and even investment strategies — all from a few sentences of context! Forewarning though, not all these generated pieces of text make very much sense when you read them — probability distributions over words can only take you so far.
Seq2Seq模型已经存在了一段时间,并且有多种变体用于文本生成任务