EMNLP2020 | 神经机器翻译精选论文解读

AMiner学术搜索和科技情报挖掘

于 2020-11-03 09:44:21 发布

阅读量2.7k

点赞数

分类专栏： AMiner会议 AMiner会议论文推荐文章标签：人工智能自然语言处理

本文链接：https://blog.csdn.net/AI_Conf/article/details/109462433

版权

AMiner会议论文推荐同时被 2 个专栏收录

240 篇文章 18 订阅

订阅专栏

AMiner会议

129 篇文章 7 订阅

订阅专栏

AMiner平台由清华大学计算机系研发，拥有我国完全自主知识产权。平台包含了超过2.3亿学术论文/专利和1.36亿学者的科技图谱，提供学者评价、专家发现、智能指派、学术地图等科技情报专业化服务。系统2006年上线，吸引了全球220个国家/地区1000多万独立IP访问，数据下载量230万次，年度访问量超过1100万，成为学术搜索和社会网络挖掘研究的重要数据和实验平台。

AMiner平台：https://www.aminer.cn

导语：EMNLP，自然语言处理经验方法会议（Conference on Empirical Methods in Natural Language Processing），是由国际语言学会（ACL）下属的SIGDAT小组主办的自然语言处理领域的顶级国际会议，也是自然语言算法的A类会议。EMNLP2020共审阅论文3359篇，接收754篇，接收率为22.4%。
根据AMiner-EMNLP近五年词云图，Neural Machine Translation（神经机器翻译）当仁不让成为近年来最火的Topic之一。

神经机器翻译Topic一直以来吸引人的原因在哪里？让我们精选6篇论文进行解读，一起探寻其中的奥秘吧！
在这里插入图片描述

更多EMNLP2020主题的论文可以链接直达专题页面：
https://www.aminer.cn/conf/emnlp2020/papers

1.论文名称：Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation

论文链接：https://www.aminer.cn/pub/5f7d934191e011346ad27e11?conf=emnlp2020

作者：Wenxiang Jiao, Xing Wang, Shilin He, Irwin King, Michael R. Lyu, Zhaopeng Tu

简介：
Neural machine translation (NMT) is a data-hungry approach, which requires a large amount of data to train a well-performing NMT model (Koehn and Knowles, 2017).
The authors observe a high overlapping ratio of the most inactive and active examples across random seeds, model capacity, and model architectures (§4.2).
These results provide empirical support for the hypothesis of the existence of inactive examples in large-scale datasets, which is invariant to specific NMT models and depends on the data distribution itself
在这里插入图片描述

2.论文名称：Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation

论文链接：https://www.aminer.cn/pub/5f7fe6d80205f07f689732e1?conf=emnlp2020

作者：Dana Ruiter, Josef van Genabith, Cristina España-Bonet

简介：
Human learners, when faced with a new task, generally focus on simple examples before applying what they learned to more complex instances.
This approach to learning based on sampling from a curriculum of increasing complexity has been shown to be beneficial for machines and is referred to as curriculum learning (CL) (Bengio et al, 2009).
The authors’ method resembles self-paced learning (SPL) (Kumar et al, 2010), in that it uses the emerging model hypothesis to select samples online that fit into its space as opposed to most curriculum learning approaches that rely on judgements by the target hypothesis, i.e. an external teacher (Hacohen and Weinshall, 2019) to design the curriculum
在这里插入图片描述

3.论文名称：Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation

论文链接：https://www.aminer.cn/pub/5f69d19b91e011a2f02706fa?conf=emnlp2020

作者：Pei Zhang, Boxing Chen, Niyu Ge, Kai Fan

简介：
Recent advances in deep learning have led to significant improvement of Neural Machine Translation (NMT) (Sutskever et al, 2014; Bahdanau et al, 2014; Luong et al, 2015; Vaswani et al, 2017).
Most literatures focused on looking back a fixed number of previous source or target sentences as the document-level context (Tu et al, 2018; Voita et al, 2018; Zhang et al, 2018; Miculicich et al, 2018; Voita et al, 2019a,b).
The authors elect to pay attention to the context in the previous n sentences only where n is a small number and usually does not cover the entire document
在这里插入图片描述

4.论文名称：CSP: Code-Switching Pre-training for Neural Machine Translation

论文链接：https://www.aminer.cn/pub/5f7fe6d80205f07f689731a1?conf=emnlp2020

作者：Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang, Qi Ju

简介：
Neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Cho et al, 2014; Bahdanau et al, 2015) which typically follows the encoder-decoder framework, directly applies a single neural network to transform the source sentence into the target sentence.
The model-fusion approaches seek to incorporate the sentence representation provided by the pretrained model, such as BERT, into the NMT model (Yang et al, 2019b; Clinchant et al, 2019; Weng et al, 2019; Zhu et al, 2020; Lewis et al, 2019; Liu et al, 2020)
These approaches are able to leverage the publicly available pre-trained checkpoints in the website but they need to change the NMT model to fuse the sentence embedding calculated by the pre-trained model.
These approaches are more production-ready since they keep the size and structure of the model same as standard NMT systems
在这里插入图片描述

5.论文名称：On the Sparsity of Neural Machine Translation Models

论文链接：https://www.aminer.cn/pub/5f7d966891e011346ad27e6f?conf=emnlp2020

作者：Yong Wang, Longyue Wang, Victor O. K. Li, Zhaopeng Tu

简介：
Modern neural machine translation (NMT) (Bahdanau et al, 2015; Gehring et al, 2017; Vaswani et al, 2017) models employ sufficient capacity to fit the massive data well by utilizing a large number of parameters, and suffer from the widely recognized issue, namely, over-parameterization.
The low utilization efficiency of parameters results in a waste of computational resources (Qiao et al, 2019), as well as renders the model stuck in a local optimum (Han et al, 2017; Yu et al, 2019).
Recent work has proven that such spare parameters can be reused to maximize the utilization of models in CV tasks such as image classification (Han et al, 2017; Qiao et al, 2019).
The authors empirically study the efficiency issue for NMT models
在这里插入图片描述

6.论文名称：Shallow-to-Deep Training for Neural Machine Translation

论文链接：https://www.aminer.cn/pub/5f7fe6d80205f07f689732f0?conf=emnlp2020

作者：Bei Li, Ziyang Wang, Hui Liu, Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang, Jingbo Zhu

简介：
Neural models have led to stateof-the-art results in machine translation (MT) (Bahdanau et al, 2015; Sutskever et al, 2014)
Many of these systems can broadly be characterized as following a multi-layer encoder-decoder neural network design: both the encoder and decoder learn representations of word sequences by a stack of layers (Vaswani et al, 2017; Wu et al, 2016; Gehring et al, 2017), building on an interesting line of work in improving such models.
The decoder shares a similar architecture as the encoder but possesses an encoder-decoder attention sub-layer to capture the mapping between two languages
在这里插入图片描述