WMT的英德翻译

1. The University of Cambridge’s Machine Translation Systems for WMT18

1. basic Architecture

Combine the three most commonly used architectures: recurrent, convolutional, and self-attention-based models like the Transformer

2. system combination

If we want to combine q models M 1 , . . . , M q M_1,...,M_q M1,...,Mq, we first divide the models into two groups by selecting a p with 1 ≤ \le p ≤ \le q.

Then, we refer to the first group M 1 , . . . , M p M_1,...,M_p M1,...,Mp as full posterior scores and the second group M p , . . . , M q M_p,...,M_q Mp,...,Mq as MBR-based scores.

Full-posterior models scores compute as follows:
在这里插入图片描述

Combined scores compute as follows:
在这里插入图片描述

3. Data
1. language detection (Nakatani, 2010) on all available monolingual and parallel data
2. additionally filtered on ParaCrawl
  • No words contain more than 40 characters.
  • Sentences must not contain HTML tags.
  • The minimum sentence length is 4 words.
  • The character ratio between source and targetmust not exceed 1:3 or 3:1
  • Source and target sentences must be equal af-ter stripping out non-numerical characters.
  • Sentences must end with punctuation marks.

2. NTT’s Neural Machine Translation Systems for WMT 2018

1. basic Architecture

Transformer Big

2. Data
  • Noisy Data Filtering
  1. use language model (such as KenLM) to evaluate a sentences naturalness
  2. use a word alignment model (such as fast_align) to check whether the sentence pair has the same meaning
  • Synthetic Corpus
  1. translating monolingual sentences with Transformer -> seudo-parallel corpora
  2. Back-translate & evaluate -> selected the high-scoring sentence pair
  • Right-to-Left Re-ranking
  1. R2L model re-ranks an n-best hypothesis generated by the Left-to-Right (L2R) model (n=10)

3. Microsoft’s Submission to the WMT2018 News Translation Task:How I Learned to Stop Worrying and Love the Data

1. basic Architecture

Transformer Big + Ensemble-decoding + R2L Reranking

2. Data
  • Dual conditional cross-entropy filtering

    For a sentence pair(x, y), cross-entropy compute as follows:

    a d q ( x , y ) = e x p ( − ( ∣ H A ( y ∣ x ) − H B ( y ∣ x ) ∣ ) + 1 2 ( H A ( y ∣ x ) + H B ( y ∣ x ) ) ) adq(x, y) = exp(-(|H_A(y|x) - H_B(y|x)|) + \frac{1}{2} (H_A(y|x) + H_B(y|x))) adq(x,y)=exp((HA(yx)HB(yx))+21(HA(yx)+HB(yx)))

    where A and B are translation models trained on the same data but in inverse directions.(We setting A = W d e − > e n A = W_{de->en} A=Wde>en and B = W e n − > d e B = W_{en->de} B=Wen>de)

    H M ( y ∣ x ) = − 1 ∣ y ∣ ∑ t = 1 ∣ y ∣ l o g P M ( y t ∣ y &lt; t , x ) H_M(y|x) = - \frac{1}{|y|} \sum\limits_{t=1}^{|y|} log P_M(y_t|y&lt;t, x) HM(yx)=y1t=1ylogPM(yty<t,x)

    P M ( x ∣ y ) P_M(x|y) PM(xy) is the probability distribution for a model M

  • Data weighting

    sentence instance weighting is a feature available in Marian(Junczys-Dowmunt et al., 2018) .

    sentence score = Data weighting * cross-entropy -> sort and select by sentence score

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值