标题:On The Evaluation of Machine Translation Systems Trained With Back-Translation
摘要:Back-translation is a widely used data augmentation technique which leverages target monolingual data. However, its effectiveness has been challenged since automatic metrics such as BLEU only show significant improvements for test examples where the source itself is a translation, or translationese. This is believed to be due to translationese inputs better matching the back-translated training data. In this work, we show that this conjecture is not empirically supported and that backtranslation improves translation quality of both naturally occurring text as well as translationese according to professional human translators. We provide empirical evidence to support the view that back-translation is preferred by humans because it produces more fluent outputs. BLEU cannot capture human preferences because references are translationese when source sentences are natural text. We recommend complementing BLEU with a language model score to measure fluency.
链接:
https://arxiv.org/pdf/1908.05204.pdfarxiv.org要点:
- 在NLP领域,反译(Back-Translation)是一种数据增强的方式。如下图,
![e0a177f831b6f649009b234f334283aa.png](https://i-blog.csdnimg.cn/blog_migrate/0d6539aef4a6487cde7f64b15eed2428.png)
2. BLEU是一种定量衡量机器翻译质量的标准。是以n-gram的衍生。此文作者发现,用了BT作为数据增强方式能提高X*->Y (逆向反译) (表1)。同时翻译腔(translationese)相对于原始文本是更容易翻译的(表2).
![655365cb3398af3447ec81769d249b12.png](https://i-blog.csdnimg.cn/blog_migrate/b053efbe88e02ee6d5837e5b2d433336.jpeg)
3. 真人检查翻译质量更偏好BT的,但是未在BLEU上得到体现。
4. 作者检查BLEU的失败原因:可能是由于翻译腔之间比较接近。以迷惑度perplexity(详见https://www.zhihu.com/question/58482430 解释)表示,迷惑度越低代表模型越好。
![64b0d8df55c9866151a8f8104ec4a250.png](https://i-blog.csdnimg.cn/blog_migrate/a23618cbd7b09e33e6fb8f81e46efda1.jpeg)
5. 母语水平者因为翻译腔的不流畅,肯定不会偏好翻译腔。但是目前BLEU作为指标并不能体现对流畅度的偏好。