EMNLP2020 | Language Model精选论文解读

AMiner平台由清华大学计算机系研发,拥有我国完全自主知识产权。平台包含了超过2.3亿学术论文/专利和1.36亿学者的科技图谱,提供学者评价、专家发现、智能指派、学术地图等科技情报专业化服务。系统2006年上线,吸引了全球220个国家/地区1000多万独立IP访问,数据下载量230万次,年度访问量超过1100万,成为学术搜索和社会网络挖掘研究的重要数据和实验平台。

AMiner平台:https://www.aminer.cn

导语:EMNLP,自然语言处理经验方法会议(Conference on Empirical Methods in Natural Language Processing),是由国际语言学会(ACL)下属的SIGDAT小组主办的自然语言处理领域的顶级国际会议,也是自然语言算法的A类会议。
根据EMNLP2020官方数据,今年共审阅论文3359篇,接收754篇,接收率为22.4%。

在AMiner平台EMNLP2020会议专题的首页中,AMiner根据本届论文数据,最后生成了热门Topic词云图,Language Model(语言模型)是今年比较火的选题之一。
在这里插入图片描述

EMNLP2020专题:https://www.aminer.cn/conf/emnlp2020/homepage

今天就为大家奉上7篇必读的Language Model相关论文。更多EMNLP2020主题论文,可以查看专题页面:

1.论文名称:How Much Knowledge Can You Pack Into the Parameters of a Language Model?
论文链接:https://www.aminer.cn/pub/5e4faa9f3a55ac969512bc33?conf=emnlp2020
作者:Roberts Adam, Raffel Colin, Shazeer Noam
简介:
Deep neural language models that have been pre-trained on unlabeled text have proven to be extremely performant when fine-tuned on downstream Natural Language Processing (NLP) tasks (Devlin et al, 2018; Yang et al, 2019; Liu et al, 2019; Lan et al, 2019; Raffel et al, 2019).
The authors take a different approach by evaluating the capability of language models on the practical task of opendomain question answering – the authors finetune the model to answer questions without access to any external knowledge or context.
在这里插入图片描述

2.论文名称:On Extractive, Abstractive Neural Document Summarization with Transformer Language Models
论文链接:https://www.aminer.cn/pub/5f7fe6d80205f07f689732a0?conf=emnlp2020
作者:Jonathan Pilault, Raymond Li, Sandeep Subramanian, Chris Pal
简介:
Language models (LMs) are trained to estimate the joint probability of an arbitrary sequence of words or characters using a large corpus of text.
Markovian assumptions and the curse of dimensionality make it harder for n-gram LMs to model long range dependencies and learn smooth functions that can learn similarities between words in the vocabulary
This has led to a preference for recurrent or feed-forward neural language models (Bengio et al, 2003; Mikolov et al, 2010) in recent years due to to their ability to learn expressive conditional probability distributions (Merity et al, 2017; Radford et al, 2019).
RNNs are limited by their sequential nature, making them 1) difficult to optimize and learn for long sequences with long range dependencies (Hochreiter, 1998; Pascanu et al, 2013), and 2) hard to parallelize on modern hardware like GPUs, limiting their scalability
在这里插入图片描述

3.论文名称:Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
论文链接:https://www.aminer.cn/pub/5f02e52b9e795e22854aeb37?conf=emnlp2020
作者:Chengyu Wang, Minghui Qiu, Jun Huang, Xiaofeng He
简介:
Notable works include ELMo (Peters et al, 2018), BERT (Devlin et al, 2019), Transformer-XL (Dai et al, 2019), ALBERT (Lan et al, 2019), StructBERT (Wang et al, 2019b) and many others
These models revolutionize the learning paradigms of various NLP tasks.
State-of-the art language models mostly utilize self-supervised tasks during pretraining (for instance, masked language modeling and sentence prediction in BERT (Devlin et al, 2019))
This unavoidably creates a learning gap between pre-training and fine-tuning.
For a group of similar tasks, conventional practices require the parameters of all task-specific models to be initialized from the same pre-trained language model, ignoring how the learning process in different domains is correlated and mutually reinforced
在这里插入图片描述

4.论文:Consistency of a Recurrent Language Model With Respect to Incomplete Decoding
论文链接:https://www.aminer.cn/pub/5e4129b13a55ac9f8f89e019?conf=emnlp2020
作者:Welleck Sean, Kulikov Ilia, Kim Jaedeok, Pang Richard Yuanzhe, Cho Kyunghyun
简介:
Neural sequence models trained with maximum likelihood estimation (MLE) have become a standard approach to modeling sequences in a variety of natural language applications such as machine translation (Bahdanau et al, 2015), dialogue modeling (Vinyals et al, 2015), and language modeling (Radford et al, 2018).
Despite this success, MLEtrained neural sequence models have been shown to exhibit issues such as length bias (Sountsov & Sarawagi, 2016; Stahlberg & Byrne, 2019) and degenerate repetition (Holtzman et al, 2019)
These issues are suspected to be related to the maximum likelihood objective’s local normalization, which results in a discrepancy between the learned model’s distribution and the distribution induced by the decoding algorithm used to generate sequences (Lafferty et al, 2001; Andor et al, 2016).
在这里插入图片描述

5.论文名称:DagoBERT: Generating Derivational Morphology with a Pretrained Language Model.
论文链接:https://www.aminer.cn/pub/5f7fe6d80205f07f689731a2?conf=emnlp2020
作者:Valentin Hofmann, Janet Pierrehumbert, Hinrich Schütze
简介:
This question has attracted a lot of attention in NLP recently, with a focus on syntax (e.g., Goldberg, 2019) and semantics (e.g., Ethayarajh, 2019).
It is much less clear what PLMs learn about other aspects of language.
PLMs about derivational morphology, taking BERT as the example PLM.
在这里插入图片描述

6.论文名称:Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models
论文链接:https://www.aminer.cn/pub/5eb78919da5629cf244303f4?conf=emnlp2020
作者:Lin Bill Yuchen, Lee Seyeon, Khanna Rahul, Ren Xiang
简介:
Pre-trained language models (PTLMs), such as BERT (Devlin et al, 2019), have yielded stateof-the-art performance on many natural language processing tasks.
Given PTLMs’ cited ability to create general, yet useful text representations, an investigation into their ability to encode commonsense knowledge into representations is warrantedcommonsense knowledge is often required to have a full understanding of language.
Motivated by this and similar inquiries, probing tasks for analyzing PTLMs’ behaviors have been created.
在这里插入图片描述

7.论文名称:Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models.
论文链接:https://www.aminer.cn/pub/5f7fe6d80205f07f68973254?conf=emnlp2020
作者:Isabel Papadimitriou, Dan Jurafsky
简介:
The authors train LSTM models on data with varying degrees of language-like structure, and evaluate their performance on natural language.
The authors freeze the LSTM parameters and fine-tune the word embeddings on the evaluation language.
This lets them see if the training data induces language-like structure in the recurrent parameters of LSTMs— despite removing vocabulary-level confounders.
By assessing if representations are useful across languages, the authors examine the attributes of grammar
在这里插入图片描述

更详细了解EMNLP2020论文,可以关注公众号或者链接直达EMNLP2020专题,最前沿的研究方向和最全面的论文数据等你来~
在这里插入图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值