2018.2. Unsupervised Neural Machine Translation 阅读笔记

最新推荐文章于 2021-10-22 15:49:28 发布

元气少女wuqh

最新推荐文章于 2021-10-22 15:49:28 发布

阅读量174

点赞数

分类专栏： Paper Reading

本文链接：https://blog.csdn.net/tsinghuahui/article/details/82959069

版权

10 篇文章 0 订阅

订阅专栏

2018.2-Mikel Artetxe, Kyunghyum Cho-Unsupervised Nueral Machine Translation

UPV/EHU, New York University

ICLR2018

This paer
- build upon the recent work on unsupervised embedding mappings
  - 这篇文献和这篇文献
- a slightly modified attentional encoder-decoder model
  - combination of denoising and back-translation
- novelty
  - Dual structure => handle both directions together
  - Shared encoder
  - fixed cross-lingual embeddings in the encoder during training
Result
- no parallel resource
  - WMT 2014 French -> English: 15.56 BLEU
  - WMT 2014 German -> English; 10.21 BLEU
- combined with 100,000 parallel sentences
  - WMT 2014 French -> English: 21.81 BLEU
  - WMT 2014 German -> English; 15.24 BLEU
Related Works
- unsupervised cross-lingual embeddings
- statistical deciperment for machine translation
- low-resource neural machine translation

201802MT

encoder: a two-layer bidirectional RNN (GRU cells with 600 hidden units)
decoder: a two-layer RNN (GRU cells with 600 hidden units)
dim. of the ebds.: 300
attention mechanism: global attention method with the general alignment function (同这篇文献)

Datasets:
- Train: WMT 2014 French-English & German-English
- Test: newstest2014, Tokenized BLEU (multi-bleu.perl script)
Corpus Preprocessing:
- tokenization and truecasing
- byte pair encoding (BPE): manages to correctly translate rare words
- learning on monolingual corpus:
  - 50,000 operations
  - Limite the vocabulary: most frequent 50,000 tokens.
  - Replace the rest with a special token .
  - Accelerate training: discarde sentences with more than 50 elements.
Cross-lingual Embeddings:
- word2vec, Mikolov et al., 2013: skip-gram
- Map the embeddings to a shared space
Training:
- Cross-entopy loss func.
- Train each system took about 4-5 days on a single Titan X GPU for the full unsupervised variant.
Decoding:
- Training time: greedy decoding
- Test time: beam-search
Result: