今日arXiv精选 | 15篇EMNLP 2021最新论文

 关于 #今日arXiv精选 

这是「AI 学术前沿」旗下的一档栏目,编辑将每日从 arXiv 中精选高质量论文,推送给读者。

Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression

Comment: Accepted to EMNLP 2021 (main conference)

Link: http://arxiv.org/abs/2109.03228

Abstract

Recent studies on compression of pretrained language models (e.g., BERT)usually use preserved accuracy as the metric for evaluation. In this paper, wepropose two new metrics, label loyalty and probability loyalty that measure howclosely a compressed model (i.e., student) mimics the original model (i.e.,teacher). We also explore the effect of compression with regard to robustnessunder adversarial attacks. We benchmark quantization, pruning, knowledgedistillation and progressive module replacing with loyalty and robustness. Bycombining multiple compression techniques, we provide a practical strategy toachieve better accuracy, loyalty and robustness.

Unsupervised Conversation Disentanglement through Co-Training

Comment: Accepted to EMNLP 2021 main conference

Link: http://arxiv.org/abs/2109.03199

Abstract

Conversation disentanglement aims to separate intermingled messages intodetached sessions, which is a fundamental task in understanding multi-partyconversations. Existing work on conversation disentanglement relies heavilyupon human-annotated datasets, which are expensive to obtain in practice. Inthis work, we explore to train a conversation disentanglement model withoutreferencing any human annotations. Our method is built upon a deep co-trainingalgorithm, which consists of two neural networks: a message-pair classifier anda session classifier. The former is responsible for retrieving local relationsbetween two messages while the latter categorizes a message to a session bycapturing context-aware information. Both networks are initialized respectivelywith pseudo data built from an unannotated corpus. During the deep co-trainingprocess, we use the session classifier as a reinforcement learning component tolearn a session assigning policy by maximizing the local rewards given by themessage-pair classifier. For the message-pair classifier, we enrich itstraining data by retrieving message pairs with high confidence from thedisentangled sessions predicted by the session classifier. Experimental resultson the large Movie Dialogue Dataset demonstrate that our proposed approachachieves competitive performance compared to the previous supervised methods.Further experiments show that the predicted disentangled conversations canpromote the performance on the downstream task of multi-party responseselection.

When differential privacy meets NLP: The devil is in the detail

Comment: Camera-ready for EMNLP 2021

Link: http://arxiv.org/abs/2109.03175

Abstract

Differential privacy provides a formal approach to privacy of individuals.Applications of differential privacy in various scenarios, such as protectingusers' original utterances, must satisfy certain mathematical properties. Ourcontribution is a formal analysis of ADePT, a differentially privateauto-encoder for text rewriting (Krishna et al, 2021). ADePT achieves promisingresults on downstream tasks while providing tight privacy guarantees. Our proofreveals that ADePT is not differentially private, thus rendering theexperimental results unsubstantiated. We also quantify the impact of the errorin its private mechanism, showing that the true sensitivity is higher by atleast factor 6 in an optimistic case of a very small encoder's dimension andthat the amount of utterances that are not privatized could easily reach 100%of the entire dataset. Our intention is neither to criticize the authors, northe peer-reviewing process, but rather point out that if differential privacyapplications in NLP rely on formal guarantees, these should be outlined in fulland put under detailed scrutiny.

Aspect-Controllable Opinion Summarization

Comment: EMNLP 2021

Link: http://arxiv.org/abs/2109.03171

Abstract

Recent work on opinion summarization produces general summaries based on aset of input reviews and the popularity of opinions expressed in them. In thispaper, we propose an approach that allows the generation of customizedsummaries based on aspect queries (e.g., describing the location and room of ahotel). Using a review corpus, we create a synthetic training dataset of(review, summary) pairs enriched with aspect controllers which are induced by amulti-instance learning model that predicts the aspects of a document atdifferent levels of granularity. We fine-tune a pretrained model using oursynthetic dataset and generate aspect-specific summaries by modifying theaspect controllers. Experiments on two benchmarks show that our modeloutperforms the previous state of the art and generates personalized summariesby controlling the number of aspects discussed in them.

How much pretraining data do language models need to learn syntax?

Comment: To be published in proceedings of the 2021 Conference on Empirical  Methods in Natural Language Processing (EMNLP 2021)

Link: http://arxiv.org/abs/2109.03160

Abstract

Transformers-based pretrained language models achieve outstanding results inmany well-known NLU benchmarks. However, while pretraining methods are veryconvenient, they are expensive in terms of time and resources. This calls for astudy of the impact of pretraining data size on the knowledge of the models. Weexplore this impact on the syntactic capabilities of RoBERTa, using modelstrained on incremental sizes of raw text data. First, we use syntacticstructural probes to determine whether models pretrained on more data encode ahigher amount of syntactic information. Second, we perform a targeted syntacticevaluation to analyze the impact of pretraining data size on the syntacticgeneralization performance of the models. Third, we compare the performance ofthe different models on three downstream applications: part-of-speech tagging,dependency parsing and paraphrase identification. We complement our study withan analysis of the cost-benefit trade-off of training such models. Ourexperiments show that while models pretrained on more data encode moresyntactic knowledge and perform better on downstream applications, they do notalways offer a better performance across the different syntactic phenomena andcome at a higher financial and environmental cost.

Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

Comment: EMNLP 2021 main conference

Link: http://arxiv.org/abs/2109.03158

Abstract

An individual's variation in writing style is often a function of both socialand personal attributes. While structured social variation has been extensivelystudied, e.g., gender based variation, far less is known about how tocharacterize individual styles due to their idiosyncratic nature. We introducea new approach to studying idiolects through a massive cross-author comparisonto identify and encode stylistic features. The neural model achieves strongperformance at authorship identification on short texts and through ananalogy-based probing task, showing that the learned representations exhibitsurprising regularities that encode qualitative and quantitative shifts ofidiolectal styles. Through text perturbation, we quantify the relativecontributions of different linguistic elements to idiolectal variation.Furthermore, we provide a description of idiolects through measuring inter- andintra-author variation, showing that variation in idiolects is oftendistinctive yet consistent.

PAUSE: Positive and Annealed Unlabeled Sentence Embedding

Comment: Accepted by EMNLP 2021 main conference as long paper (12 pages and 2  figures). For source code, see https://github.com/EQTPartners/pause

Link: http://arxiv.org/abs/2109.03155

Abstract

Sentence embedding refers to a set of effective and versatile techniques forconverting raw text into numerical vector representations that can be used in awide range of natural language processing (NLP) applications. The majority ofthese techniques are either supervised or unsupervised. Compared to theunsupervised methods, the supervised ones make less assumptions aboutoptimization objectives and usually achieve better results. However, thetraining requires a large amount of labeled sentence pairs, which is notavailable in many industrial scenarios. To that end, we propose a generic andend-to-end approach -- PAUSE (Positive and Annealed Unlabeled SentenceEmbedding), capable of learning high-quality sentence embeddings from apartially labeled dataset. We experimentally show that PAUSE achieves, andsometimes surpasses, state-of-the-art results using only a small fraction oflabeled sentence pairs on various benchmark tasks. When applied to a realindustrial use case where labeled samples are scarce, PAUSE encourages us toextend our dataset without the liability of extensive manual annotation work.

Learning grounded word meaning representations on similarity graphs

Comment: Accepted to EMNLP 2021 (long paper)

Link: http://arxiv.org/abs/2109.03084

Abstract

This paper introduces a novel approach to learn visually grounded meaningrepresentations of words as low-dimensional node embeddings on an underlyinggraph hierarchy. The lower level of the hierarchy models modality-specific wordrepresentations through dedicated but communicating graphs, while the higherlevel puts these representations together on a single graph to learn arepresentation jointly from both modalities. The topology of each graph modelssimilarity relations among words, and is estimated jointly with the graphembedding. The assumption underlying this model is that words sharing similarmeaning correspond to communities in an underlying similarity graph in alow-dimensional space. We named this model Hierarchical Multi-Modal SimilarityGraph Embedding (HM-SGE). Experimental results validate the ability of HM-SGEto simulate human similarity judgements and concept categorization,outperforming the state of the art.

GOLD: Improving Out-of-Scope Detection in Dialogues using Data Augmentation

Comment: 14 pages, 5 figures. Accepted at EMNLP 2021

Link: http://arxiv.org/abs/2109.03079

Abstract

Practical dialogue systems require robust methods of detecting out-of-scope(OOS) utterances to avoid conversational breakdowns and related failure modes.Directly training a model with labeled OOS examples yields reasonableperformance, but obtaining such data is a resource-intensive process. To tacklethis limited-data problem, previous methods focus on better modeling thedistribution of in-scope (INS) examples. We introduce GOLD as an orthogonaltechnique that augments existing data to train better OOS detectors operatingin low-data regimes. GOLD generates pseudo-labeled candidates using samplesfrom an auxiliary dataset and keeps only the most beneficial candidates fortraining through a novel filtering mechanism. In experiments across threetarget benchmarks, the top GOLD model outperforms all existing methods on allkey metrics, achieving relative gains of 52.4%, 48.9% and 50.3% against medianbaseline performance. We also analyze the unique properties of OOS data toidentify key factors for optimally applying our proposed method.

Generate & Rank: A Multi-task Framework for Math Word Problems

Comment: Findings of EMNLP2021

Link: http://arxiv.org/abs/2109.03034

Abstract

Math word problem (MWP) is a challenging and critical task in naturallanguage processing. Many recent studies formalize MWP as a generation task andhave adopted sequence-to-sequence models to transform problem descriptions tomathematical expressions. However, mathematical expressions are prone to minormistakes while the generation objective does not explicitly handle suchmistakes. To address this limitation, we devise a new ranking task for MWP andpropose Generate & Rank, a multi-task framework based on a generativepre-trained language model. By joint training with generation and ranking, themodel learns from its own mistakes and is able to distinguish between correctand incorrect expressions. Meanwhile, we perform tree-based disturbancespecially designed for MWP and an online update to boost the ranker. Wedemonstrate the effectiveness of our proposed method on the benchmark and theresults show that our method consistently outperforms baselines in alldatasets. Particularly, in the classical Math23k, our method is 7% (78.4%$\rightarrow$ 85.4%) higher than the state-of-the-art.

Don't Go Far Off: An Empirical Study on Neural Poetry Translation

Comment: EMNLP 2021 Camera ready

Link: http://arxiv.org/abs/2109.02972

Abstract

Despite constant improvements in machine translation quality, automaticpoetry translation remains a challenging problem due to the lack ofopen-sourced parallel poetic corpora, and to the intrinsic complexitiesinvolved in preserving the semantics, style, and figurative nature of poetry.We present an empirical investigation for poetry translation along severaldimensions: 1) size and style of training data (poetic vs. non-poetic),including a zero-shot setup; 2) bilingual vs. multilingual learning; and 3)language-family-specific models vs. mixed-multilingual models. To accomplishthis, we contribute a parallel dataset of poetry translations for severallanguage pairs. Our results show that multilingual fine-tuning on poetic textsignificantly outperforms multilingual fine-tuning on non-poetic text that is35X larger in size, both in terms of automatic metrics (BLEU, BERTScore) andhuman evaluation metrics such as faithfulness (meaning and poetic style).Moreover, multilingual fine-tuning on poetic data outperforms \emph{bilingual}fine-tuning on poetic data.

Exploiting Reasoning Chains for Multi-hop Science Question Answering

Comment: 14 pages, Findings of EMNLP 2021

Link: http://arxiv.org/abs/2109.02905

Abstract

We propose a novel Chain Guided Retriever-reader ({\tt CGR}) framework tomodel the reasoning chain for multi-hop Science Question Answering. Ourframework is capable of performing explainable reasoning without the need ofany corpus-specific annotations, such as the ground-truth reasoning chain, orhuman-annotated entity mentions. Specifically, we first generate reasoningchains from a semantic graph constructed by Abstract Meaning Representation ofretrieved evidence facts. A \textit{Chain-aware loss}, concerning both localand global chain information, is also designed to enable the generated chainsto serve as distant supervision signals for training the retriever, wherereinforcement learning is also adopted to maximize the utility of the reasoningchains. Our framework allows the retriever to capture step-by-step clues of theentire reasoning process, which is not only shown to be effective on twochallenging multi-hop Science QA tasks, namely OpenBookQA and ARC-Challenge,but also favors explainability.

Datasets: A Community Library for Natural Language Processing

Comment: EMNLP Demo 2021

Link: http://arxiv.org/abs/2109.02846

Abstract

The scale, variety, and quantity of publicly-available NLP datasets has grownrapidly as researchers propose new tasks, larger models, and novel benchmarks.Datasets is a community library for contemporary NLP designed to support thisecosystem. Datasets aims to standardize end-user interfaces, versioning, anddocumentation, while providing a lightweight front-end that behaves similarlyfor small datasets as for internet-scale corpora. The design of the libraryincorporates a distributed, community-driven approach to adding datasets anddocumenting usage. After a year of development, the library now includes morethan 650 unique datasets, has more than 250 contributors, and has helpedsupport a variety of novel cross-dataset research projects and shared tasks.The library is available at https://github.com/huggingface/datasets.

Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models

Comment: EMNLP 2021

Link: http://arxiv.org/abs/2109.02837

Abstract

Commonsense reasoning benchmarks have been largely solved by fine-tuninglanguage models. The downside is that fine-tuning may cause models to overfitto task-specific data and thereby forget their knowledge gained duringpre-training. Recent works only propose lightweight model updates as models mayalready possess useful knowledge from past experience, but a challenge remainsin understanding what parts and to what extent models should be refined for agiven task. In this paper, we investigate what models learn from commonsensereasoning datasets. We measure the impact of three different adaptation methodson the generalization and accuracy of models. Our experiments with two modelsshow that fine-tuning performs best, by learning both the content and thestructure of the task, but suffers from overfitting and limited generalizationto novel answers. We observe that alternative adaptation methods likeprefix-tuning have comparable accuracy, but generalize better to unseen answersand are more robust to adversarial splits.

Eliminating Sentiment Bias for Aspect-Level Sentiment Classification with Unsupervised Opinion Extraction

Comment: 11 pages, Findings of EMNLP'2021, 7th-11th November 2021

Link: http://arxiv.org/abs/2109.02403

Abstract

Aspect-level sentiment classification (ALSC) aims at identifying thesentiment polarity of a specified aspect in a sentence. ALSC is a practicalsetting in aspect-based sentiment analysis due to no opinion term labelingneeded, but it fails to interpret why a sentiment polarity is derived for theaspect. To address this problem, recent works fine-tune pre-trained Transformerencoders for ALSC to extract an aspect-centric dependency tree that can locatethe opinion words. However, the induced opinion words only provide an intuitivecue far below human-level interpretability. Besides, the pre-trained encodertends to internalize an aspect's intrinsic sentiment, causing sentiment biasand thus affecting model performance. In this paper, we propose a span-basedanti-bias aspect representation learning framework. It first eliminates thesentiment bias in the aspect embedding by adversarial learning against aspects'prior sentiment. Then, it aligns the distilled opinion candidates with theaspect by span-based dependency modeling to highlight the interpretable opinionterms. Our method achieves new state-of-the-art performance on five benchmarks,with the capability of unsupervised opinion extraction.

·

·

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值