[文献阅读] A Study of Translation Edit Rate with Targeted Human Annotation

A Study of Translation Edit Rate with Targeted Human Annotation


Matthew Snover and Bonnie Dorr
Institute for Advanced Computer Studies
University of Maryland
College Park, MD 20742
{snover,bonnie}@umiacs.umd.edu


本文重要信息摘要:

1、Translation Edit Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation.

2、The methods of automatic machine translation consist of BLEU, METEOR,NIST,TER and so on.

3、We define a new, more intuitive measure of “goodness” of MT output—specifically, the number of edits needed to fix the output so that it semantically matches a correct translation.  

4、Recently the GALE (Olive, 2005) (Global Autonomous Language Exploitation) research program introduced a new error measure called Translation Edit Rate (TER)  that was originally designed to count the number of edits (including phrasal shifts) performed by a human to change a hypothesis so that it is both fluent and has the correct meaning. This was then decomposed into two steps: defining a new reference and finding the minimum number
of edits so that the hypothesis exactly matches one of the references. This measure was defined such that all edits, including shifts, would have a cost of one. Finding only the minimum number of ed-its, without generating a new reference is the measure defined as TER; finding the minimum of edits to a new targeted references is defined as human-targeted TER (or HTER). 

5、BLEU (Papineni et al., 2002) calculates the score of a translation by measuring the number of n-grams, of varying length, of the system output that occur within the set of references.

6、METEOR (Banerjee and Lavie, 2005) is an evaluation measure that counts the number of exact word matches between the system output and reference. Unmatched words are then stemmed and matched. Additional penalities are assessed for reordering the words between the hypothesis and reference. This method has been shown to correlate very well with human judgments.

7、TER is defined as the minimum number of edits needed to change a hypothesis so that it exactly matches one of the references, normalized by the average length of the references.

8、Possible edits include the insertion, deletion, and substitution of single words as well as shifts of word sequences.

9、 

10、The number of insertions, deletions, and substitutions is calculated using dynamic programming. A greedy search is used to find the set of shifts, by repeatedly selecting the shift that most reduces the number of insertions, deletions and substitutions, until no more beneficial shifts remain. 

11、

12、In both TER and HTER, the majority of the edits were substitutions and deletions.

13、 In an analysis of shift size and distance, we found that most shifts are short in length (1 word) and are
by less than 7 words.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值