GPT系列之：GPT-1，GPT-2，GPT-3详细解读

最新推荐文章于 2025-03-06 16:01:49 发布

置顶

vivid_blog

最新推荐文章于 2025-03-06 16:01:49 发布

阅读量2k

点赞数 17

文章标签： gpt gpt-3

本文链接：https://blog.csdn.net/weixin_42018581/article/details/142030361

版权

一、GPT1

论文：Improving Language Understanding by Generative Pre-Training
链接：https://cdn.openai.com/research-covers/languageunsupervised/language_understanding_paper.pdf

启发点：生成loss和微调loss同时作用，让下游任务来适应预训练模型

(一) Introduction

Abstract：generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. 在大量无标注数据集上做生成式预训练，在下游特定任务做微调。
Motivation：The ability to learn effectively from raw text is crucial to alleviating the dependence on supervised learning in natural language processing 有效从原始文本学习的能力有助于减轻nlp对有监督学习的依赖
Introduction: Explore a semi-supervised approach using a combination of unsupervised pre-training and supervised fine-tuning, learn a universal representation that transfers with little adaptation. 探索一种新的半监督的方式，融合了无监督预训练和有监督微调，学到通用表示以在迁移时只需要小的改变。

(二) Methods

Framework：two-stage training procedure - 大量文本无监督学习后进行微调模式 generative pre-training and discriminative fine-tuning

First Stage
语言模型建模，k为上下文窗口大小

仅包含decoder模块，多层Transformer&多头自注意力机制
Second Stage
有监督微调阶段主要预测Label y

为了提高有监督学习的泛化性以及加速收敛，加入语言模型的loss，所以在微调阶段仅需引入一个参数矩阵W_y

对于特定任务需要进行输入转换: 文本分类任务可以直接输入，但是文本蕴含和问答任务需要一些特定转换，如图所示。
（1）Textual entailment：concat premise and hypothesis with delimiter $
（2）Similarity：两个句子没有顺序关系，输入包含所有句子序的组合，two emb added element-wise，最后输入线性层
（3）Question Answering and Commonsense Reasoning：normalized via a softmax produce an output distribution over

最低0.47元/天解锁文章