Language Models are Unsupervised Multitask Learners 论文纪要

本文是完整读过GPT-2 paper之后记录下来的觉得重要的地方,其中大部分摘自paper原文,有用中文部分记录自己读paper时想到的东西以及不懂的地方,求指教!

读GPT-2 paper之前可以作为预习先看看张俊林大佬漫谈式的文章《效果惊人的GPT 2.0模型:它告诉了我们什么》 https://zhuanlan.zhihu.com/p/56865533 写的着实很好!

Abstract

The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks.

Introduction

  • Current systems are better characterized as narrow experts rather than competent generalists. We would like to move towards more general systems which can perform many tasks – eventually without the need to manually create and label a training dataset for each one.
  • Multi-task learning
    • The two most ambitious efforts to date have trained on a total of 10 and 17 (dataset, objective) pairs respectively.
    • From a meta-learning perspective, each (dataset, objective) pair is a single training example sampled from the distribution of datasets and objectives.
    • Cons:Multitask training may need just as many effective training pairs to realize its promise with current approaches. It will be very difficult to continue to scale the creation of datasets and the design of objectives to the degree that may be required to brute force our way there with current techniques.

Approach

  • Learning to perform a single task can be expressed in a probabilistic framework as estimating a conditional distribution P(output | input). Since a general system should be able to perform many different tasks, even for the same input, it should condition not only on the input but also on the task to be performed. That is, it should model P(output | input, task).
  • Task conditioning is often implemented at an architecture level, such as the task specific encoder-decoders network. However, language provides a flexible way to specify tasks, inputs and outputs all as a sequence of symbols. (For example, a translation training example can be written as the sequence (translate to French, english text, french text 因为训练这个模型就是让它学习语言知识,这样自然也能够理解语言给的任务指令).
  • Language modeling is also able to, in principle, learn the tasks of McCann et al. (2018) without the need for explicit supervision of which symbols are the outputs to be predicted. The problem instead becomes whether we are
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值