今日arXiv精选 | 18篇近期值得关注的Transformer工作

 关于 #今日arXiv精选 

这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者。

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot  Learners

Category: NLP

Link: https://arxiv.org/abs/2108.13161

Abstract

Large-scale pre-trained language models have contributed significantly to natural language processing. However, their effectiveness depends mainly on scaling the modelparameters and prompt design. This study proposes a novel pluggable, extensible, and efficientapproach named DifferentiAble pRompT.

Shatter: An Efficient Transformer Encoder with Single-Headed  Self-Attention and Relative Sequence Partitioning

Category: NLP

Link: https://arxiv.org/abs/2108.13032

Abstract

Shatter is an alternative self-attention architecture to the popular Transformer architecture. Shatter can be pretrained on GPUs in 7 days, and match the performance of BERT_Base.

ReGen: Reinforcement Learning for Text and Knowledge Base Generation  using Pretrained Language Models

Category: NLP

Link: https://arxiv.org/abs/2108.12472

Abstract

We present ReGen, a bidirectionalgeneration of text and graph leveraging Reinforcement Learning (RL) to improve performance. Our system provides state-of-the-art results on WebNLG+ 2020 and TekGen datasets.

Exploring and Improving Mobile Level Vision Transformers

Category: Computer Vision

Link: https://arxiv.org/abs/2108.13015

Abstract

We study the vision transformer structure in the mobile level in this paper. We find a dramatic performance drop. We propose a novel irregular patch embedding module and adaptive patch merging module to improve the performance.

Multi-Channel Transformer Transducer for Speech Recognition

Category: Machine Learning

Link: https://arxiv.org/abs/2108.12953

Abstract

Multi-channel inputs offer several advantages over single-channel, to improve the robustness of on-device speech recognition systems. Recent work on a multi-channel transformer has proposed a way to incorporate such inputs into end-to-end ASR for improved accuracy.

Making Transformers Solve Compositional Tasks

Category: Artificial Intelligence

Link: https://arxiv.org/abs/2108.04378

Abstract

Several studies have reported the inability of Transformer models to generalize compositionally. This is a key type of generalization in many NLP tasks such as semantic parsing. We show that the inductive biases given to the model by several design decisions significantly impact compositional generalization.

C5T5: Controllable Generation of Organic Molecules with Transformers

Category: Machine Learning

Link: https://arxiv.org/abs/2108.10307

Abstract

C5T5 is a novel self-supervised pretraining method that that that allows transformers to make zero-shot select-and-replace edits, altering substances towards desired property values. It operates on IUPAC names, a standardized molecular representation that intuitively encodes rich structural information

ViTGAN: Training GANs with Vision Transformers

Category: Computer Vision

Link: https://arxiv.org/abs/2107.04589

Abstract

Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. We introduce novel regularization Techniques for training GANs with ViTs. Empirically, our approach achieves comparable performance to state-of-the-art CNN-based Style

Extracting Qualitative Causal Structure with Transformer-Based NLP

Category: Machine Learning

Link: https://arxiv.org/abs/2108.13304

Abstract

In everyday or academic language, we may express interactions between quantities, events or entities. Qualitative causal relationships compactly express the direction, dependency,temporal constraints, and monotonicity constraints of discrete or continuous interactions. This paper presents a transformer-based NLP architecture that jointly identifies and extracts

Leveraging Pre-trained Language Model for Speech Sentiment Analysis

Category: NLP

Link: https://arxiv.org/abs/2106.06598

Abstract

We explore the use of pre-trained language models to learnipientsentiment information of written texts for speech sentiment analysis. Although spoken and written texts have different linguistic characteristics, they can complement each other in understanding sentiment. In these experiments, we demonstrate the proposed approaches can consistentlyimprove F1 scores.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Category: Computer Vision

Link: https://arxiv.org/abs/2103.1403

Abstract

This paper presents a new vision Transformer that can serve as a general-purpose backbone for computer vision. It is compatible with a broad range of vision tasks, including image classification and dense prediction. Its performance surpasses the previous state-of-the-art by a large margin. The code and models will be made publicly available.

GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal  Transformer

Category: Computer Vision

Link: https://arxiv.org/abs/2108.1263

Abstract

Group activity recognition is a crucial yet challenging problem. Previous methods either model spatial and temporal information separately, or directly aggregateindividual features to form group features. We propose a novel group activity recognition network termed GroupFormer.

Vision Transformers with Patch Diversification

Category: Computer Vision

Link: https://arxiv.org/abs/2104.12753

Abstract

Vision transformer has demonstrated promising performance on challenging computer vision tasks. But directly training the vision transformers may give sub-optimal results. We introduce novel loss functions in vision transformer training to encourage diversity across patchrepresentations for more discriminative feature extraction.

The Devil is in the Detail: Simple Tricks Improve Systematic  Generalization of Transformers

Category: Machine Learning

Link: https://arxiv.org/abs/2108.12284

Abstract

Transformers, typically trained with default hyper-parameters from standard tasks, fail dramatically. By revisiting model configurations, we can drasticallyimprove the performance of Transformers on systematic generalization. We report improvements on five popular datasets: SCAN, CFQ, PCFG, COGS, and Mathematics

Geometry-Free View Synthesis: Transformers and no 3D Priors

Category: Computer Vision

Link: https://arxiv.org/abs/2104.07652

Abstract

A transformer-based model can synthesize entirely novel views without any hand-engineered 3Dbiases. This is achieved by (i) a global attention mechanism for implicitlylearning long-range 3D correspondences between source and target views, and (ii) a probabilistic formulation.

Fastformer: Additive Attention is All You Need

Category: NLP

Link: https://arxiv.org/abs/2108.09084

Abstract

Transformer is a powerful model for text understanding. It is inefficient due to its quadratic complexity to input sequence length. In Fastformer, instead of modeling the pair-wise interactionsbetween tokens, we first use additive attention mechanism to model global contexts.

Space-time Mixing Attention for Video Transformer

Category: Computer Vision

Link: https://arxiv.org/abs/2106.05968

Abstract

This paper is on video recognition using Transformers. The complexity of the model scales linearly with the number of frames in the video sequence and induces no overhead compared to an image-based Transformer model.

Span Fine-tuning for Pre-trained Language Models

Category: NLP

Link: https://arxiv.org/abs/2108.12848

Abstract

Pre-trained language models (PrLM) have to carefully manage input units when training on a large text with a vocabulary consisting of millions of words. Previous works have shown that incorporating span-level information over successive words in pre-training could further improve the performance of PrLMs.

·

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值