神经机器翻译WMT14英法基准系统 WMT14 English-French Baseline

最新推荐文章于 2024-07-16 20:59:04 发布

warrioR_wx

最新推荐文章于 2024-07-16 20:59:04 发布

阅读量5.1k

点赞数 2

分类专栏：学习笔记

本文链接：https://blog.csdn.net/wangxinginnlp/article/details/82663198

版权

本文回顾了2017年以来WMT14英法翻译基准系统的进展，包括GNMT的32K wordpieces模型，Transformer的基线和大模型，RNMT+，ConvS2S以及Fairseq。各模型使用不同的词汇处理，如wordpieces和BPE，实验结果显示Fairseq在WMT'14上取得了43.2的高分。

摘要由CSDN通过智能技术生成

最近（2017年以来）的WMT14 English-French Baseline记录

1. GNMT

https://arxiv.org/pdf/1609.08144.pdf

语料处理：a shared source and target vocabulary of 32K wordpieces

For the wordpiece models, we train 3 different models with vocabulary sizes of 8K, 16K, and 32K. Table 4 summarizes our results on the WMT En→Fr dataset. In this table, we also compare against other strong baselines without model ensembling. As can be seen from the table, “WPM-32K”, a wordpiece model with a shared source and target vocabulary of 32K wordpieces, performs well on this dataset and achieves the best quality as well as the fastest inference speed.

On WMT En→Fr, the training set contains 36M sentence pairs. In both cases, we use newstest2014 as the test sets to compare against previous work. The combination of newstest2012 and newstest2013 is used as the development