一、GPT1
论文:Improving Language Understanding by Generative Pre-Training
链接:https://cdn.openai.com/research-covers/languageunsupervised/language_understanding_paper.pdf
启发点:生成loss和微调loss同时作用,让下游任务来适应预训练模型
(一) Introduction
- Abstract:generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. 在大量无标注数据集上做生成式预训练,在下游特定任务做微调。
- Motivation:The ability to learn effectively from raw text is crucial to alleviating the dependence on supervised learning in natural language processing 有效从原始文本学习的能力有助于减轻nlp对有监督学习的依赖
- Introduction: Explore a semi-supervised approach using a combination of unsupervised pre-training and supervised fine-tuning, learn a universal representation that transfers with little adaptation. 探索一种新的半监督的方式,融合了无监督预训练和有监督微调,学到通用表示以在迁移时只需要小的改变。
(二) Methods
Framework:two-stage training procedure - 大量文本无监督学习后进行微调模式 generative pre-training and discriminative fine-tuning
-
First Stage
语言模型建模,k为上下文窗口大小
仅包含decoder模块,多层Transformer&多头自注意力机制
-
Second Stage
有监督微调阶段主要预测Label y
为了提高有监督学习的泛化性以及加速收敛,加入语言模型的loss,所以在微调阶段仅需引入一个参数矩阵W_y
对于特定任务需要进行输入转换: 文本分类任务可以直接输入,但是文本蕴含和问答任务需要一些特定转换,如图所示。
(1)Textual entailment:concat premise and hypothesis with delimiter $
(2)Similarity:两个句子没有顺序关系,输入包含所有句子序的组合,two emb added element-wise,最后输入线性层
(3)Question Answering and Commonsense Reasoning:normalized via a softmax produce an output distribution over