2018.2. Unsupervised Neural Machine Translation 阅读笔记

2018.2-Mikel Artetxe, Kyunghyum Cho-Unsupervised Nueral Machine Translation

UPV/EHU, New York University

ICLR2018

Abstract

  • This paer
    • build upon the recent work on unsupervised embedding mappings
    • a slightly modified attentional encoder-decoder model
      • combination of denoising and back-translation
    • novelty
      • Dual structure => handle both directions together
      • Shared encoder
      • fixed cross-lingual embeddings in the encoder during training
  • Result
    • no parallel resource
      • WMT 2014 French -> English: 15.56 BLEU
      • WMT 2014 German -> English; 10.21 BLEU
    • combined with 100,000 parallel sentences
      • WMT 2014 French -> English: 21.81 BLEU
      • WMT 2014 German -> English; 15.24 BLEU
  • Related Works
    • unsupervised cross-lingual embeddings
    • statistical deciperment for machine translation
    • low-resource neural machine translation

Method

System Architecture

201802MT

  • encoder: a two-layer bidirectional RNN (GRU cells with 600 hidden units)
  • decoder: a two-layer RNN (GRU cells with 600 hidden units)
  • dim. of the ebds.: 300
  • attention mechanism: global attention method with the general alignment function (同这篇文献)
Unsupervised Training
  • for each stc. in lang. L1, train allternating in 2 steps:
    • STEP 1: Denoising: shared encoder + L1 decoder
      • random swap
    • STEP 2: On-the-fly Backtranslation, including 2 parts
      • PART 1: translate in inference mode: shared encoder + L2 decoder
      • PART 2: for the translated stc.: shared encoder + L1 decoder

Expt.s

  • Datasets:
  • Corpus Preprocessing:
    • tokenization and truecasing
    • byte pair encoding (BPE): manages to correctly translate rare words
    • learning on monolingual corpus:
      • 50,000 operations
      • Limite the vocabulary: most frequent 50,000 tokens.
      • Replace the rest with a special token .
      • Accelerate training: discarde sentences with more than 50 elements.
  • Cross-lingual Embeddings:
  • Training:
    • Cross-entopy loss func.
    • Train each system took about 4-5 days on a single Titan X GPU for the full unsupervised variant.
  • Decoding:
    • Training time: greedy decoding
    • Test time: beam-search
  • Result:
    201802MTResult
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值