神经机器翻译WMT14英法基准系统 WMT14 English-French Baseline

本文回顾了2017年以来WMT14英法翻译基准系统的进展,包括GNMT的32K wordpieces模型,Transformer的基线和大模型,RNMT+,ConvS2S以及Fairseq。各模型使用不同的词汇处理,如wordpieces和BPE,实验结果显示Fairseq在WMT'14上取得了43.2的高分。
摘要由CSDN通过智能技术生成

 

最近(2017年以来)的WMT14 English-French Baseline记录

 

1. GNMT

   https://arxiv.org/pdf/1609.08144.pdf

   语料处理:a shared source and target vocabulary of 32K wordpieces

     For the wordpiece models, we train 3 different models with vocabulary sizes of 8K, 16K, and 32K. Table 4 summarizes our results on the WMT En→Fr dataset. In this table, we also compare against other strong baselines without model ensembling. As can be seen from the table, “WPM-32K”, a wordpiece model with a shared source and target vocabulary of 32K wordpieces, performs well on this dataset and achieves the best quality as well as the fastest inference speed.

    On WMT En→Fr, the training set contains 36M sentence pairs. In both cases, we use newstest2014 as the test sets to compare against previous work. The combination of newstest2012 and newstest2013 is used as the development

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值