perplexity 衡量指标_对于机器翻译衡量指标的批判性思考

最新推荐文章于 2021-12-08 17:13:30 发布

weixin_39841610

最新推荐文章于 2021-12-08 17:13:30 发布

阅读量307

点赞数

文章标签： perplexity 衡量指标

本文链接：https://blog.csdn.net/weixin_39841610/article/details/112367623

版权

标题：On The Evaluation of Machine Translation Systems Trained With Back-Translation

摘要：Back-translation is a widely used data augmentation technique which leverages target monolingual data. However, its effectiveness has been challenged since automatic metrics such as BLEU only show significant improvements for test examples where the source itself is a translation, or translationese. This is believed to be due to translationese inputs better matching the back-translated training data. In this work, we show that this conjecture is not empirically supported and that backtranslation improves translation quality of both naturally occurring text as well as translationese according to professional human translators. We provide empirical evidence to support the view that back-translation is preferred by humans because it produces more fluent outputs. BLEU cannot capture human preferences because references are translationese when source sentences are natural text. We recommend complementing BLEU with a language model score to measure fluency.

链接：

https://arxiv.org/pdf/1908.05204.pdfarxiv.org

要点：

在NLP领域，反译（Back-Translation）是一种数据增强的方式。如下图，

X是source， Y是target，一个*代表一次翻译。

2. BLEU是一种定量衡量机器翻译质量的标准。是以n-gram的衍生。此文作者发现，用了BT作为数据增强方式能提高X*->Y (逆向反译) (表1)。同时翻译腔（translationese）相对于原始文本是更容易翻译的（表2).

OP代表On parallel data only，a mix of direct and reversedata

3. 真人检查翻译质量更偏好BT的，但是未在BLEU上得到体现。

4. 作者检查BLEU的失败原因：可能是由于翻译腔之间比较接近。以迷惑度perplexity（详见https://www.zhihu.com/question/58482430 解释）表示，迷惑度越低代表模型越好。

5. 母语水平者因为翻译腔的不流畅，肯定不会偏好翻译腔。但是目前BLEU作为指标并不能体现对流畅度的偏好。

weixin_39841610

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
perplexity 衡量指标_对于机器翻译衡量指标的批判性思考

标题：On The Evaluation of Machine Translation Systems Trained With Back-Translation摘要：Back-translation is a widely used data augmentation technique which leverages target monolingual data. However, its ...
复制链接

扫一扫

perplexity 衡量指标_对于机器翻译衡量指标的批判性思考

“相关推荐”对你有帮助么？