# 机器翻译评价指标之BLEU详细计算过程

## 1. 简介

BLEU（Bilingual Evaluation Understudy），相信大家对这个评价指标的概念已经很熟悉，随便百度谷歌就有相关介绍。原论文为BLEU: a Method for Automatic Evaluation of Machine Translation，IBM出品。

BLEU=BPexp(n=1NwnlogPn) B L E U = B P ⋅ e x p ( ∑ n = 1 N w n l o g P n )

BP={1e1r/cif c>rif cr B P = { 1 if  c > r e 1 − r / c if  c ≤ r

NLTKnltk.align.bleu_score模块实现了这里的公式，主要包括三个函数，两个私有函数分别计算P和BP，一个函数整合计算BLEU值。

# 计算BLEU值
def bleu(candidate, references, weights)

# （1）私有函数，计算修正的n元精确率（Modified n-gram Precision）
def _modified_precision(candidate, references, n)

# （2）私有函数，计算BP惩罚因子
def _brevity_penalty(candidate, references)

It is a guide to action which ensures that the military always obeys the commands of the party

1：It is a guide to action that ensures that the military will forever heed Party commands
2：It is the guiding principle which guarantees the military forces always being under the command of the Party
3：It is the practical guide for the army always to heed the directions of the party

## 2. Modified n-gram Precision计算（也即是 Pn P n $P_n$）

def _modified_precision(candidate, references, n):
counts = Counter(ngrams(candidate, n))

if not counts:
return 0

max_counts = {}
for reference in references:
reference_counts = Counter(ngrams(reference, n))
for ngram in counts:
max_counts[ngram] = max(max_counts.get(ngram, 0), reference_counts[ngram])

clipped_counts = dict((ngram, min(count, max_counts[ngram])) for ngram, count in counts.items())

return sum(clipped_counts.values()) / sum(counts.values())

### Modified 1-gram precision：

the 3 1 4 4 4 3
obeys 1 0 0 0 0 0
a 1 1 0 0 1 1
which 1 0 1 0 1 1
ensures 1 1 0 0 1 1
guide 1 1 0 1 1 1
always 1 0 1 1 1 1
is 1 1 1 1 1 1
of 1 0 1 1 1 1
to 1 1 0 1 1 1
commands 1 1 0 0 1 1
that 1 2 0 0 2 1
It 1 1 1 1 1 1
action 1 1 0 0 1 1
party 1 0 0 1 1 1
military 1 1 1 0 1 1

### Modified 2-gram precision：

ensures that110011
guide to110011
which ensures100000
obeys the100000
commands of100000
that the110011
a guide110011
of the101111
always obeys100000
the commands100000
to action110011
the party100111
is a110011
action which100000
It is111111
military always100000
the military111011

P2=1017=0.588235294 P 2 = 10 17 = 0.588235294 $P_2=\frac{10}{17}=0.588235294$

### Modified 3-gram precision：

ensures that the110011
which ensures that100000
action which ensures100000
a guide to110011
military always obeys100000
the commands of100000
commands of the100000
to action which100000
the military always100000
obeys the commands100000
It is a110011
of the party100111
is a guide110011
that the military110011
always obeys the100000
guide to action110011

P3=716=0.4375 P 3 = 7 16 = 0.4375 $P_3=\frac{7}{16}=0.4375$

### Modified 4-gram precision：

to action which ensures100000
action which ensures that100000
guide to action which100000
obeys the commands of100000
which ensures that the100000
commands of the party100000
ensures that the military110011
a guide to action110011
always obeys the commands100000
that the military always100000
the commands of the100000
the military always obeys100000
military always obeys the100000
is a guide to110011
It is a guide110011

P4=415=0.266666667 P 4 = 4 15 = 0.266666667 $P_4=\frac{4}{15}=0.266666667$

Ni=1wnlogPn=0.25logP1+0.25logP2+0.25logP3+0.25logP4=0.684055269517 ∑ i = 1 N w n log ⁡ P n = 0.25 ∗ log ⁡ P 1 + 0.25 ∗ log ⁡ P 2 + 0.25 ∗ log ⁡ P 3 + 0.25 ∗ log ⁡ P 4 = − 0.684055269517 $\sum_{i=1}^{N}w_n\log P_n=0.25*\log P_1+0.25*\log P_2+0.25*\log P_3+0.25*\log P_4=-0.684055269517$

## 3. Brevity Penalty 计算

def _brevity_penalty(candidate, references):

c = len(candidate)
ref_lens = (len(reference) for reference in references)
#这里有个知识点是Python中元组是可以比较的，如(0,1)>(1,0)返回False，这里利用元组比较实现了选取参考翻译中长度最接近候选翻译的句子，当最接近的参考翻译有多个时，选取最短的。例如候选翻译长度是10，两个参考翻译长度分别为9和11，则r=9.
r = min(ref_lens, key=lambda ref_len: (abs(ref_len - c), ref_len))
print 'r:',r

if c > r:
return 1
else:
return math.exp(1 - r / c)

## 4. 整合

BLEU的取值范围是[0,1]，0最差，1最好。

07-03
12-02 5593

08-27 820
02-17 6154
02-08 605
08-30 4万+
09-09
10-18 2万+
09-09 1806
09-09
11-06 1万+
11-22 1118
04-14 2万+
10-26 304
03-25 7014
©️2020 CSDN 皮肤主题: 精致技术 设计师:CSDN官方博客