Language Models are Unsupervised Multitask Learners 论文纪要

最新推荐文章于 2024-03-05 10:36:50 发布

芝麻挞

最新推荐文章于 2024-03-05 10:36:50 发布

阅读量1.4k

点赞数

文章标签： GPT-2 unsupervised language model

本文是完整读过GPT-2 paper之后记录下来的觉得重要的地方，其中大部分摘自paper原文，有用中文部分记录自己读paper时想到的东西以及不懂的地方，求指教！

读GPT-2 paper之前可以作为预习先看看张俊林大佬漫谈式的文章《效果惊人的GPT 2.0模型：它告诉了我们什么》 https://zhuanlan.zhihu.com/p/56865533 写的着实很好！

Abstract

The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks.

Introduction

Current systems are better characterized as narrow experts rather than competent generalists. We would like to move towards more general systems which can perform many tasks – eventually without the need to manually create and label a training dataset for each one.
Multi-task learning
- The two most ambitious efforts to date have trained on a total of 10 and 17 (dataset, objective) pairs respectively.
- From a meta-learning perspective, each (dataset, objective) pair is a single training example sampled from the distribution of datasets and objectives.
- Cons：Multitask training may need just as many effective training pairs to realize its promise with current approaches. It will be very difficult to continue to scale the creation of datasets and the design of objectives to the degree that may be required to brute force our way there with current techniques.

Approach

Learning to perform a single task can be expressed in a probabilistic framework as estimating a conditional distribution P(output | input). Since a general system should be able to perform many different tasks, even for the same input, it should condition not only on the input but also on the task to be performed. That is, it should model P(output | input, task).
Task conditioning is often implemented at an architecture level, such as the task specific encoder-decoders network. However, language provides a flexible way to specify tasks, inputs and outputs all as a sequence of symbols. (For example, a translation training example can be written as the sequence (translate to French, english text, french text 因为训练这个模型就是让它学习语言知识，这样自然也能够理解语言给的任务指令).
Language modeling is also able to, in principle, learn the tasks of McCann et al. (2018) without the need for explicit supervision of which symbols are the outputs to be predicted. The problem instead becomes whether we are

最低0.47元/天解锁文章

芝麻挞

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Language Models are Unsupervised Multitask Learners 论文纪要

GPT-2一出来就被刷屏，然而官方称害怕被用来生成恶意语料就没有开源，而是release了一个小很多的模型（并且有人实践证明小模型要辣鸡很多）。不过除了津津乐道吃瓜，以及暗暗羡慕财大气粗的openAI砸钱搞出这个巨无霸模型，甩其他学术界小喽喽几百条街之外，还是要把paper拿过来好好盘一盘。
复制链接

扫一扫