ACL2023赶会必备，拿来即用之Experiments

最新推荐文章于 2023-07-19 09:52:35 发布

zenRRan

最新推荐文章于 2023-07-19 09:52:35 发布

阅读量704

点赞数

文章标签：机器学习人工智能深度学习自然语言处理编程语言

原文链接：https://mp.weixin.qq.com/s?__biz=MzI3ODgwODA2MA==&mid=2247514378&idx=1&sn=db19d2cf2619d55ed1322da7133e89a8&chksm=eb53ad99dc24248fd14f6cb63402ad2208d327298ff72079a37799059d425b9f9f730d82a942&scene=126&&sessionid=0

版权

每天给你送来NLP技术干货！

机构｜中国人民大学高瓴人工智能学院

研究方向｜自然语言处理，模型压缩

来自 | RUC AI Box

本文从ACL2022年中的论文中整理出来了常见的数据集描述和基准方法，范围涵盖14个子领域20篇论文。

本次整理主要有以下几个特点：

拿来即用，辅助写作。论文写作中一定会包含数据集介绍部分。对于通用的数据集，一般会有相对“标准”的描述方式。本文从顶会论文中收集和整理了不同数据集的描述，这有助于我们学习和积累规范的表达方式；
全面的数据集和基准方法收集和整理。本文涵盖了14个子领域20篇论文，虽然无法保证一篇不漏，但也竭尽可能覆盖了绝大多数主流数据集。本文可以让读者对NLP前沿任务有全面了解，同时对于感兴趣的领域又可快速找到基准方法、评测数据集和指标，容易快速上手复现相关论文和改进方法；
辅助规划实验。已公开的数据集非常多，但为了说明自己方法的有效性又不能全部做实验。本文整理的素材有助于读者规划自己实验任务和数据集时，有的放矢，且不偏不漏。

上次分享过拿来即用之Abstract和Related work，目的是让大家从别人论文中快速对相关领域有个准确的认识，感兴趣的可以移步：

《ACL2022赶会必备，拿来即用之Abstract和Related Work 》

关于数据集整理的文章很多，但是这些文章主要还是作为数据集的“资源池”，即包含数据集的官方链接、介绍等。而本文依然保持“赶会必备系列”的初衷，目的是辅助科研和写作。希望大家ACL 2023投稿顺利～

预训练语言模型

^[1]On the Sensitivity and Stability of Model Interpretations in NLP

关键词：可解释性

任务与数据集：

text classification: SST-2, Yelp, AGNews

SST-2 and Yelp are sentiment classification tasks where models predict whether a review is negative (0) or positive (1). AGNews is to discriminate between world (0) and business (1) articles.

基准方法：

VaGrad, GradInp (gradient-based)
IngGrad, DeepLIFT (reference based)
Occlusion, LIME (perturbation based)

^[2]Composable Sparse Fine-Tuning for Cross-Lingual Transfer

关键词：跨语言微调

任务与数据集：

part-of-speech tagging (POS), dependency parsing (DP): Universal Dependencies 2.7
named entity recognition (NER): MasakhaNER
natural language inference (NLI): AmericasNLI

基准方法：

MAD-X（adapter-based framework）
BITFIT

^[3]Compression of Generative Pre-trained Language Models via Quantization

关键词：模型压缩

任务与数据集：

Language Modeling: WikiText2, Penn Treebank (PTB), WikiText103

The task of language modeling is to predict the probability distribution over a sequence of words.

Next Utterance Prediction: Persona-Chat

The task of next utterance prediction predicts the next utterance given the dialogue context. It tests the language understanding ability of generative models.

Abstractive Summarization: XSum

Abstractive summarization aims at generating a terse summary that captures the main ideas of the source article.

基准方法：

PACT, LSQ, LAQ

^[4]AdapLeR: Speeding up Inference by Adaptive Length Reduction

关键词：模型加速

任务与数据集：

sentiment: SST-2, IMDB
paraphrase: MRPC
topic classification: AG’s News
knowledge extraction: DBpedia
NLI: MNLI
question answering: QNLI
hate speech: HateXplain

基准方法：

BERT-base (the backbone)
DistillBERT (static compression method)
PoWER-BERT, TR-BERT (length reduction methods)

^[5]ABC: Attention with Bounded-memory Control

关键词：模型加速

任务与数据集：

Language Modeling: WikiText-103
Machine Translation: WMT14 EN-DE (Sentence-level translation), IWSLT14 ESEN (Document-level translation)
Masked Language Model Finetuning: BookCorpus, English Wikipedia, OpenWebText, and RealNews (pretrain) GLUE (fine-tuning)

基准方法：

Linformer

^[6]PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

关键词：prompt

任务与数据集：

sentiment analysis datasets: SST-2, SST-5, MR, CR
subjectivity classification: SUBJ
question classification: TREC
natural language inference: CB, RTE
question answering: QNLI
word sense disambiguation: WiC
paraphrase detection: MRPC, QQP

基准方法：

PET
The standard fine-tuning

^[7]A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models

关键词：prompt，多模态

任务与数据集：

visual question answering: VQAv2, OKVQA, GQA
image captioning: NoCaps, Flickr30k
categorical learning: miniImageNet

基准方法：

Frozen, PICa, SimVLM, Unified VLP (zero/few-shot vision-language learners)
Uniter_large, Oscar, SimVLM, VinVL, Unified VLP (full fine-tuned models)
VL-T5_no-vqa (pre-trained without visual question answering dataset)
Frozen and AFHN (miniImageNet)

表示学习

^[8]A Contrastive Framework for Learning Sentence Representations from Pairwise and Triple-wise Perspective in Angular Space

关键词：对比学习

任务与数据集：

unsupervised demantic textual similarity: STS tasks 2012-2016, STS Benchmark, SICK-Relatedness
sentEval transfer tasks: MR, CR, SUBJ, MPQA, SST-2, TREC, MRPC

基准方法：

GloVe embeddings, Skip-thought, average BERT embeddings from the last layer, BERT-Flow, BERT-Whitening (representative methods)
ISBERT, CT-BERT, ConSERT, SimCSE (contrastive learning methods)

机器翻译

^[9]Universal Conditional Masked Language Pre-training for Neural Machine Translation

关键词：预训练

任务与数据集：

autoregressive neural machine translation: En-Kk, De-En, En-Tr, En-Ro, En-Et, En-Fi, En-Lv, En-De, En-Cs, En-De, En-Fr
non-autoregressive neural machine translation: WMT14 En-De, WMT16 En-Ro and IWSLT14 En-De

基准方法：

mBART, mRASP, MASS, XLM, mBERT

信息检索

^[10]Compact Token Representations with Contextual Quantization for Efficient Document Re-ranking

关键词：模型加速

任务与数据集：

passage and document ranking: MS MARCO

基准方法：

Choices of first-stage retrieval models: fast BM25 method, uniCOIL, Colbert
Re-ranking models and quantizers compared: BECR, PreTTR, BERT-base, TILDEv2

对话

^[11]A Model-Agnostic Data Manipulation Method for Persona-based Dialogue Generation

任务与数据集：

PersonaChat

基准方法：

Back Translation (BT)
CVAE
Entropy Filter

推理

^[12]Generated Knowledge Prompting for Commonsense Reasoning

任务与数据集：

NumerSense

NumerSense (Lin et al., 2020) consists of numerical statements about common objects and concepts where for each sentence we need to recover a masked number word.

CommonsenseQA (CSQA)

CommonsenseQA (CSQA) (Talmor et al., 2019) is a 5-way multiple-choice QA dataset about common world scenarios.

CommonsenseQA 2.0 (CSQA2)

CommonsenseQA 2.0 (CSQA2) (Talmor et al., 2021) is a binary classification dataset where we need to judge whether commonsense statements are true or false.

QASC

QASC (Khot et al., 2020) is an 8-way multiplechoice QA dataset about grade school science.

基准方法（Knowledge Generation Baselines）：

No knowledge, Random sentences, Context sentences, Template-based, Retrieval-based

情感分析

^[13]Adversarial Soft Prompt Tuning for Cross-Domain Sentiment Analysis

任务和数据集：

Amazon reviews dataset

基准方法：

BERT-DAAT
SENTIX_Fix
Standard fine-tuning
Fine-tuning + AT (Add the adversarial training operating on standard fine-tuning vanilla PLMs.)
Prompt-tuning(Hard) (Use a manually defined template “It is [MASK]” for prompt-tuning)
Prompt-tuning(Hard) + AT (Add the adversarial training operating on Prompt-tuning(Hard))

比喻解释

^[14]Can Pre-trained Language Models Interpret Similes as Smart as Human?

任务和数据集：

The Simile Property Probing Task: General Corpus, Teacher-designed Quizzes

基准方法：

EMB
Meta4meaning
ConScore
MIUWE

多模态

^[15]Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

任务和数据集:

TWITTER-2015, TWITTER-2017

基准方法：

RAN, UMT, OSCGA, RpBERT (multimodal aspect term extraction (MATE))
TomBERT, CapTrBERT (multimodal aspect sentiment classification (MASC))
SPAN, D-GCN, BART (joint aspect sentiment analysis (JASA))
UMT+TomBERT, OSCGA+TomBERT, UMT-collapsed, OSCGAcollapsed, RpBERTcollapsed, JML (Joint Multimodal Aspect-Sentiment Analysis (JMASA))

文本生成

^[16]A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization

关键词：多文档摘要

数据集：

Multi-News
Wikipedia Current Events Portal (WCEP)

基准方法：

HiMAP, Hierarchical Transformer, GraphSum, GraphSum + RoBERTa, BART-Long (Multi-News)
TSR, BERTReg, Submodular+ABS, BART-WCEP-DynE-5 (WCEP)

阅读理解

^[17]AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension

数据集：

ReClor
LogiQA

基准方法：

BERT, RoBERTa, XLNet (pre-trained language model based methods)
DAGN, Focal Reasoner, LReasoner

代码理解

^[18]A Neural Network Architecture for Program Understanding Inspired by Human Behaviors

任务和数据集：

code summarization: TL-CodeSum, Java subset of CodeSearchNet
code clone detection: BigCloneBench 2014 (BCB), BCB-F (new dataset)

基准方法：

CodeNN, NCS, Rencos, CodeBERT, PLBART (code summarization)
CodeBERT, PLBART, ASTNN, FA-AST (code clone detection)

信息抽取

^[19]FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction

任务和数据集

CORD

We evaluate on CORD (Park et al., 2019), which stands for the Consolidated Receipt Dataset for post-OCR parsing. The annotations are provided in 30 fine-grained semantic entities such as store name, menu price, table number, discount, etc.

FUNSD

FUNSD (Jaume et al., 2019) is a public dataset for form understanding in noisy scanned documents. It is a subset of the Truth Tobacco Industry Document (TTID)9. The dataset consists of 199 annotated forms with 9,707 entities and 31,485 word-level annotations for 4 entity types: header, question, answer, and other.

Payment

We use the large-scale payment data (Majumder et al., 2020) that consists of around 10K documents and 7 semantic entity labels from human annotators. The corpus comes from different vendors with different layout templates.

基准方法：

SPADE
UniLMv2
LayoutLMv1
DocFormer
LayoutLMv2
TILT
DocFormer

表格处理

^[20]FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining

任务和数据集：

Formula Prediction: Enron
Table Question Answering: HiTab
Cell Type Classification: DeEx

基准方法：

SpreadsheetCoder, TaPEx, TUTA (Formula Prediction)
TaPas, BERT, TaPEx, TUTA (Table Question Answering)
CNN^BERT, Bi-LSTM, TaBERT, TaPas, TUTA (Cell Type Classification)、

参考:

1. https://aclanthology.org/2022.acl-long.188

2. https://aclanthology.org/2022.acl-long.125

3. https://aclanthology.org/2022.acl-long.331

4. https://aclanthology.org/2022.acl-long.1

5. https://aclanthology.org/2022.acl-long.515

6. https://aclanthology.org/2022.acl-long.254

7. https://aclanthology.org/2022.acl-long.197

8. https://aclanthology.org/2022.acl-long.336

9. https://aclanthology.org/2022.acl-long.442

10. https://aclanthology.org/2022.acl-long.51

11. https://aclanthology.org/2022.acl-long.550

12. https://aclanthology.org/2022.acl-long.225

13. https://aclanthology.org/2022.acl-long.174

14. https://aclanthology.org/2022.acl-long.543

15. https://aclanthology.org/2022.acl-long.152

16. https://aclanthology.org/2022.acl-long.351

17. https://aclanthology.org/2022.acl-long.494

18. https://aclanthology.org/2022.acl-long.353

19. https://aclanthology.org/2022.acl-long.260

20. https://aclanthology.org/2022.acl-long.82

📝论文解读投稿，让你的文章被更多不同背景、不同方向的人看到，不被石沉大海，或许还能增加不少引用的呦~ 投稿加下面微信备注“投稿”即可。

最近文章

为什么回归问题不能用Dropout？

Bert/Transformer 被忽视的细节

中文小样本NER模型方法总结和实战

一文详解Transformers的性能优化的8种方法

DiffCSE: 将Equivariant Contrastive Learning应用于句子特征学习

苏州大学NLP团队文本生成&预训练方向招收研究生/博士生（含直博生）

NIPS'22 | 重新审视区域视觉特征在基于知识的视觉问答中的作用

投稿或交流学习，备注：昵称-学校（公司）-方向，进入DL&NLP交流群。

方向有很多：机器学习、深度学习，python，情感分析、意见挖掘、句法分析、机器翻译、人机对话、知识图谱、语音识别等。

记得备注~

zenRRan

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
ACL2023赶会必备，拿来即用之Experiments

每天给你送来NLP技术干货！© 作者｜刘沛羽机构｜中国人民大学高瓴人工智能学院研究方向｜自然语言处理，模型压缩来自 | RUC AI Box本文从ACL2022年中的论文中整理出来了常见的数据集描述和基准方法，范围涵盖14个子领域20篇论文。本次整理主要有以下几个特点：拿来即用，辅助写作。论文写作中一定会包含数据集介绍部分。对于通用的数据集，一般会有相对“标准”的描述方式。本文从顶会论文中收集和整...
复制链接

扫一扫