Effective Approaches to Attention-based Neural Machine Translation


文章来自于2015年EMNLP,在基础 attention 上开始研究一些变化操作,尝试不同的 score-function,不同的 alignment-function。文章主要介绍了Global以及Local两种attention的具体操作。

Abstract

文章目的:
1)在已有Attentional-NMT基础上探索更有效Attentional-NMT结构。
2)提出Global和Local两种简单且高效的结构,并在WMT相关翻译任务上验证了其有效性(5.0BLEU)。
实验结果:
1)WMT’15 English to German translation task with 25.9 BLEU。
2) improve 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker

Introduction

1)目前NMT进展
2)NMT优势
3)Attention常用于模态间的Decode-Encode
4)Global以及Local两种attention(分别基于Neural machine translation by jointly learning to align and translate以及Show, attend and tell: Neural image caption generation with visual attention基础之上)
5)简介了其他对于Attention的贡献,比如score计算方法以及权重函数计算等。

Neural Machine Translation

1)主要回顾了一下NMT的相关知识以及目前进展,比如解释Decoder-Encoder简单建模原理以及Decoder里面Decomposition常用RNN及其变种,并且列举了一些这方面的论文
2)简单介绍了本文的结构基于Sequence to sequence learning with
neural networks以及 Addressing the rare word problem in neural machine translation基础之上,但是本文使用了LSTM,如下如所示GaryChern
并且交代了本文训练的目标函数:
GaryChern

Attention-based Models

这部分是本文的重点,具体介绍了Global以及Local两种attention的计算方法以及具体步骤,并且还提出了一个Input-feeding的方法。两种Attention的区别就在于是否全部的source都参与了计算。
具体的生成过程公式如下:
GaryChern
GaryChern
第一个公式用于生成带Attention的t时刻状态,并且采用Contact计算score。第二个公式用于生成预测分布,即t时刻预测值。

Global Attention

整个结构的重点就在于下面这张图:
GaryChern
a t a_t at 是一个与时间序列长度相同的alignment vector,计算Attention的重要一步,其是当前目标状态 h t h_t ht 与各个源隐藏层状态 h s i h_s^i hsi
计算所得,具体计算公式如下:
GaryChern
并且Score的计算方式作者给出了三种,分别是dot、general以及concat。
GaryChern
并且作者这里提了一下原本的实验方法是:GaryChern
其原本是通过计算当前隐层状态的softmax来对应于source里面隐层状态的权重。两者对比图像如下:
GaryChern

关于Global Attention的思想来源可以参考这篇文章:Neural machine translation by jointly learning to align and translate

Local Attention

整个结构的重点就在于下面这张图:
Garychern
这部分最主要的就是哪部分的source里面的隐层状态需要参与计算,因此其先计算出一个aligned position p t p_t pt用于确定范围 [ p t − D , p t + D ] [p_t-D,p_t+D] [ptD,pt+D],D的设置文章说依据经验而言,也由于框的变化,使得 a t a_t at大小是变动的,本文提到了了两种 p t p_t pt,当 p t = t p_t=t pt=t是则回归到了Global (Monotonic alignment (local-m) ),第二种被称为Predictive alignment (local-p),其计算公式为: p t = S ∗ s i g m o i d ( v p T t a n h ( W p h t ) ) p_t=S*sigmoid(v_p^Ttanh(W_ph_t)) pt=Ssigmoid(vpTtanh(Wpht))其中 W p , v p W_p,v_p Wp,vp可以预测位置习得。 S S S是源序列长度,通过Sigmoid使得整个长度在 [ 0 , S ] [0,S] [0,S]。为了使得计算的权重更加靠近 p t p_t pt,以便于对于在框里的source部分具有较好权重,其还在源权重计算基础上乘以了一个高斯分布。具体的权重计算公式变为:
GaryChern
在其中设置了 σ = D / 2 \sigma=D/2 σ=D/2 s s s是一个位于框中心的Integer。

Input-feeding Approach

具体操作步骤如下图所示:
GaryChern以上便是个人觉得文章的所有重要的部分。

others

剩余包括两大部分:实验以及结果分析。

实验的具体细节:

1:具体的数据预处理与结构方面
1)filter out sentence pairs whose lengths exceed 50 words and shuffle mini-batches as we proceed.
2)stacking LSTM models have 4 layers, each with 1000 cells, and 1000-dimensional embeddings.
2:关于参数方面
1) our parameters are uniformly initialized in [−0.1, 0.1]
3:训练过程中的技巧
1)train for 10 epochs using plain SGD
2)a simple learning rate schedule is employed – we start with a learning rate of1; after 5 epochs, we begin to halve the learning
rate every epoch,
3)our mini-batch size is 128
4)the normalized gradient is rescaled whenever its norm exceeds 5.
5)dropout with probability 0.2 for our LSTMs
6)For dropout models, we train for 12 epochs and start halving the learning rate after 8 epochs. For local attention models, we empirically set the window sizeD = 10.

实验结果与分析

论文地址:https://arxiv.org/pdf/1508.04025)

代码实现方式:MATLAB
相关设备:Tesla K40( 1K target words per second)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Visual segmentation is one of the most important tasks in computer vision, which involves dividing an image into multiple segments, each of which corresponds to a different object or region of interest in the image. In recent years, transformer-based methods have emerged as a promising approach for visual segmentation, leveraging the self-attention mechanism to capture long-range dependencies in the image. This survey paper provides a comprehensive overview of transformer-based visual segmentation methods, covering their underlying principles, architecture, training strategies, and applications. The paper starts by introducing the basic concepts of visual segmentation and transformer-based models, followed by a discussion of the key challenges and opportunities in applying transformers to visual segmentation. The paper then reviews the state-of-the-art transformer-based segmentation methods, including both fully transformer-based approaches and hybrid approaches that combine transformers with other techniques such as convolutional neural networks (CNNs). For each method, the paper provides a detailed description of its architecture and training strategy, as well as its performance on benchmark datasets. Finally, the paper concludes with a discussion of the future directions of transformer-based visual segmentation, including potential improvements in model design, training methods, and applications. Overall, this survey paper provides a valuable resource for researchers and practitioners interested in the field of transformer-based visual segmentation.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值