Raki的读paper小记:LAMOL: LANGUAGE MODELING FOR LIFELONG LANGUAGE LEARNING

82 篇文章 10 订阅
19 篇文章 3 订阅

Abstract & Introduction & Related Work

  • 研究任务
    lifelong learning
  • 已有方法和相关工作
  • 面临挑战
    • 现有方法大多基于图片或者游戏,而不是语言
  • 创新思路
    • 提出了一种基于语言模型的lifelong learning方法
    • 重现以前任务的伪样本,同时不需要额外的内存或模型容量
  • 实验结论
    • 结果显示,LAMOL可以防止灾难性遗忘,而没有任何不妥协的迹象,并且只用一个模型就可以连续完成五种非常不同的语言任务
    • sota
    • 此外,我们建议在伪样本生成过程中增加特定任务的标记,以便在所有先前的任务中均匀地分配生成的样本。这一扩展稳定了LLL,在大量任务的训练中特别有用
    • 我们分析了不同数量的伪样本如何影响LAMOL的最终性能,考虑了有和没有特定任务标记的结果

训练一个语言模型,同时具备生成伪样本的能力,而不需要额外的空间
在这里插入图片描述

LAMOL

DATA FORMATTING

受decaNLP(Bryan McCann & Socher,2018)使用的协议启发,我们使用的数据集的样本被框定在一个类似SQuAD的方案中,其中包括上下文、问题和答案。虽然LM同时是一个QA模型,但数据格式取决于训练目标。当作为QA模型进行训练时,LM在阅读上下文和问题后学习解码答案。另一方面,当作为LM训练时,LM学习解码给定的三个部分的token。

除了上下文、问题和答案之外,我们还增加了三个特殊的token:

  • ANS :插在问题和答案之间。由于在推理过程中,上下文和问题是已知的,所以在输入ANS后开始解码
  • EOS :每个样本的最后一个token,遇到EOS时停止解码
  • GEN :伪样本生成期间的第一个token,解码在输入GEN后开始

在这里插入图片描述

TRAINING

用一个系数 γ \gamma γ 来平衡伪样本的数量,并且置信度不够高的时候,生成的样本会被丢弃

同时优化两个损失
在这里插入图片描述

TASK-SPECIFIC TOKENS

在训练许多任务时,对所有任务使用相同的GEN token是有问题的,因为旧任务的部分在理论上是呈指数级下降的。例如,如果 γ = 0.01 γ=0.01 γ=0.01,那么在训练第二个任务时,第一个任务的部分约为 1%,但在训练第三个任务时,只有约 0.01 0.01% 0.01。这个问题对LLL绝对是有害的。为了缓解这个问题,我们可以选择用每个任务的特定任务token来取代GEN token,以通知模型生成属于特定任务的伪样本

在这个设定下,所有的之前的任务都有相同的伪样本数量 γ ∣ T i ∣ \gamma|T_i| γTi

请注意,由于每个任务都使用特定的token,随着更多任务的训练,LM的词汇量和嵌入权重会略有增加

EXPERIMENTAL RESULTS

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

CONCLUSION

我们提出了LAMOL,一种基于语言建模的简单而有效的LLL方法。一个单一的LM就能实现LLL,无需额外的模型组件,也无需保留旧的样本。此外,任何预训练的LM都可以用来利用大量的未标记文本来改善LLL。最后,只要有需要,就可以增加更多的任务

Remark

方法出奇的简单,大概可以作为一个base method?ICLR出品确实顶

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Lifelong Machine Learning (Synthesis Lectures on Artificial Intelligence and Machine Learning) By 作者: Zhiyuan Chen – Bing Liu ISBN-10 书号: 1681733021 ISBN-13 书号:: 9781681733029 Edition 版本: 2 Release Finelybook 出版日期: 2018-08-14 pages 页数: (207 ) $79.95 Book Description to Finelybook sorting Synthesis Lectures on Artificial Intelligence and Machine Learning Lifelong Machine Learning, Second Edition 版本 is an introduction to an advanced machine learning paradigm that continuously learns by accumulating past knowledge that it then uses in future learning and problem solving. In contrast, the current dominant machine learning paradigm learns in isolation: given a training dataset, it runs a machine learning algorithm on the dataset to produce a model that is then used in its intended application. It makes no attempt to retain the learned knowledge and use it in subsequent learning. Unlike this isolated system, humans learn effectively with only a few examples precisely because our learning is very knowledge-driven: the knowledge learned in the past helps us learn new things with little data or effort. Lifelong learning aims to emulate this capability, because without it, an AI system cannot be considered truly intelligent. Research in lifelong learning has developed significantly in the relatively short time since the first edition of this book was published. The purpose of this second edition is to expand the definition of lifelong learning, update the content of several chapters, and add a new chapter about continual learning in deep neural networks—which has been actively researched over the past two or three years. A few chapters have also been reorganized to make each of them more coherent for the reader. Moreover, the authors want to propose a unified framework for the research area. Currently, there are several research topics in machine learning that are closely related to lifelong learning—most notably, multi-task learning, transfer learning, and meta-learning—because they also employ the idea of knowledge sharing and transfer. This book brings all these topics under one roof and discusses their similarities and differences. Its goal is to introduce this emerging machine learning paradigm and present a comprehensive survey and review of the important research results and latest ideas in the area. This book is thus suitable for students, researchers, and practitioners who are interested in machine learning, data mining, natural language processing, or pattern recognition. Lecturers can readily use the book for courses in any of these related fields. Preface Acknowledgments Introduction Related Learning Paradigms Lifelong Supervised Learning Continual Learning and Catastrophic Forgetting Open-World Learning Lifelong Topic Modeling Lifelong Information Extraction Continuous Knowledge Learning in Chatbots Lifelong Reinforcement Learning Conclusion and Future Directions Bibliography Authors’Biographies Blank Page

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值