关于 #今日arXiv精选
这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者。
Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding
Comment: Long paper at EMNLP 2021
Link: http://arxiv.org/abs/2109.01583
Abstract
Lack of training data presents a grand challenge to scaling out spokenlanguage understanding (SLU) to low-resource languages. Although various dataaugmentation approaches have been proposed to synthesize training data inlow-resource target languages, the augmented data sets are often noisy, andthus impede the performance of SLU models. In this paper we focus on mitigatingnoise in augmented data. We develop a denoising training approach. Multiplemodels are trained with data produced by various augmented methods. Thosemodels provide supervision signals to each other. The experimental results showthat our method outperforms the existing state of the art by 3.05 and 4.24percentage points on two benchmark datasets, respectively. The code will bemade open sourced on github.
Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation
Comment: Findings of EMNLP 2021
Link: http://arxiv.org/abs/2109.01484
Abstract
Exemplar-Guided Paraphrase Generation (EGPG) aims to generate a targetsentence which conforms to the style of the given exemplar while encapsulatingthe content information of the source sentence. In this paper, we propose a newmethod with the goal of learning a better representation of the style andthecontent. This method is mainly motivated by the recent success of contrastivelearning which has demonstrated its power in unsupervised feature extractiontasks. The idea is to design two contrastive losses with respect to the contentand the style by considering two problem characteristics during training. Onecharacteristic is that the target sentence shares the same content with thesource sentence, and the second characteristic is that the target sentenceshares the same style with the exemplar. These two contrastive losses areincorporated into the general encoder-decoder paradigm. Experiments on twodatasets, namely QQP-Pos and ParaNMT, demonstrate the effectiveness of ourproposed constrastive losses.
Language Modeling, Lexical Translation, Reordering: The Training Process of NMT through the Lens of Classical SMT
Comment: EMNLP 2021
Link: http://arxiv.org/abs/2109.01396
Abstract
Differently from the traditional statistical MT that decomposes thetranslation task into distinct separately learned components, neural machinetranslation uses a single neural network to model the entire translationprocess. Despite neural machine translation being de-facto standard, it isstill not clear how NMT models acquire different competences over the course oftraining, and how this mirrors the different models in traditional SMT. In thiswork, we look at the competences related to three core SMT components and findthat during training, NMT first focuses on learning target-side languagemodeling, then improves translation quality approaching word-by-wordtranslation, and finally learns more complicated reordering patterns. We showthat this behavior holds for several models and language pairs. Additionally,we explain how such an understanding of the training process can be useful inpractice and, as an example, show how it can be used to improve vanillanon-autoregressive neural machine translation by guiding teacher modelselection.
Detecting Speaker Personas from Conversational Texts
Comment: Accepted by EMNLP 2021
Link: http://arxiv.org/abs/2109.01330
Abstract
Personas are useful for dialogue response prediction. However, the personasused in current studies are pre-defined and hard to obtain before aconversation. To tackle this issue, we study a new task, named Speaker PersonaDetection (SPD), which aims to detect speaker personas based on the plainconversational text. In this task, a best-matched persona is searched out fromcandidates given the conversational text. This is a many-to-many semanticmatching task because both contexts and personas in SPD are composed ofmultiple sentences. The long-term dependency and the dynamic redundancy amongthese sentences increase the difficulty of this task. We build a dataset forSPD, dubbed as Persona Match on Persona-Chat (PMPC). Furthermore, we evaluateseveral baseline models and propose utterance-to-profile (U2P) matchingnetworks for this task. The U2P models operate at a fine granularity whichtreat both contexts and personas as sets of multiple sequences. Then, eachsequence pair is scored and an interpretable overall score is obtained for acontext-persona pair through aggregation. Evaluation results show that the U2Pmodels outperform their baseline counterparts significantly.
·