Glancing Transformer for Non-Autoregressive Neural Machine Translation

最新推荐文章于 2023-04-09 17:14:14 发布

彭伟_02

最新推荐文章于 2023-04-09 17:14:14 发布

阅读量541

点赞数 1

分类专栏： NLP Seq2Seq NLG

本文链接：https://blog.csdn.net/ganxiwu9686/article/details/119597102

版权

68 篇文章 6 订阅

订阅专栏

14 篇文章 0 订阅

订阅专栏

14 篇文章 1 订阅

订阅专栏

NAT’s conditional independence assumption prevents learning word interdependency in the target sentence. （解决内部依赖）
Previous methods require multiple passes of decoding, its generation speed is measurably slower than the vanilla NAT （我们是single pass decoding during inference）

GLAT achieves parallel text generation with only single decoding pass.
GLM adopts a adaptive glancing sampling strategy. （根据预先输出的y~（NAR直接生成）和真实的y的距离决定从y中采样的比例）
相比较vanilla NAT，提升大概5BLEU，和AT相比，差0.9BLEU，但是7.9X加速

在这里插入图片描述

The Glancing Language Model
- 首先第一个decoder预先生成Y~
The Glancing Sampling Strategy
- 根据Y~和真实Y计算距离，Hamming distances（训练的时候Y~和Y长度一致，否则使用Levenshtein distance ）
- 从Y中采样一定的比例作为第二个解码器的输入，同时未被替换的用Encoder的隐层表示，对他们进行NAR的预测。计算LOSS，更新第二个decoder
Inference
- 解码的时候需要决定输出的长度
  - noisy parallel decoding (NPD) and
  - connectionist temporal classification (CTC)