Bleu 计算介绍
文章目录
reference: http://acl2014.org/acl2014/W14-33/pdf/W14-3346.pdf
1.直觉
Bleu 目标就是计算,原文(reference)与译文(translate)的匹配程度及precision(查准率)。
具体体现在“词”级别的precision
令R
为 reference 以“词"为维度的集合
e.g. reference = 'Today is a nice day'
R = {'Today', 'is', 'a', 'nice', 'day'}
令T
为translate以“词"为维度的集合
e.g. Translate = 'It is a nice day today'
T = {'It', 'is', 'a', 'nice', 'day', 'today'}
那么
precision = match的数量/总共的数量
= m/l
= len(R∩T)/len(T)
以上面的例子为例(1)
R = {'Today', 'is', 'a', 'nice', 'day'}
T = {'It', 'is', 'a', 'nice', 'day', 'today'}
precision = 5/6
2. 引入n-gram
如果只考虑单个词的accuracy, 会忽视词所在的位置问题。
e.g
reference = 'I am fine'
transliate = 'I am not fine'
p = 3/4 # 但实际上差别很大
因此查看n个词的“联合分布”也很重要,及gram。
令n = 2
R_1 = {'I', 'am', 'fine'}
T_1 = {'I','am','not','fine'}
R_2 = {'I am', 'am fine'}
T_2 = {'I am', 'am not', 'not fine'}
p_1 = 3/4
p_2 = 1/3
3.计算n_gram 的平均分数
Bleu 通常使用几何平均
p(n,translate,reference) = pow(product(p_0..p_1), 1/n)
n: gram的数量
继续上面的例子
p(n,translate, reference) = pow(p_1*p_2, 1/2)
= sqrt(1/4)
= 0.5
4. 坑-重复问题
reference:https://www.cnblogs.com/by-dream/p/7679284.html
只是用目前的方法来计算precision会存在召回率问题:
e.g
translate = 'I am fine I am fine '
reference = 'I am fine'
p_1 = 6/6 = 1
p_2 = 4/5
p(n,t,r) = sqrt(1) = 0.8944
解决方法:
p_n = min(count(R[ri]), count(T[ri]))
ri : r集合里的第i个词/词组
只取这个词出现的最小频率
那么上面的例子就变成
e.g
translate = 'I am fine I am fine '
reference = 'I am fine'
p_1 = 3/6 = 1/2
p_2 = 2/5
p(n,t,r) = sqrt(1/5) = 0.447
4.长度惩罚
从上个例子可以看到,虽然translate 与 reference 长度不一样但是precision却是100%, 因此需要引入长度惩罚(bervity penalty)
BP(translate, reference)
= min(1.0, exp(1 - len(reference)/len(translate)))
5. BlEU 基础计算公式
BLEU(n, translate, reference) =
p(n, translate, reference)* BP(translate, reference)
6. 坑-p_i不匹配导BLEU=0
BLEU一般都是考虑”文档“之间的相似度, 一旦降到句子级别就会出不匹配情况。
e.g
translate = 'I like beijing '
reference = 'I am fine'
p_1 = 1/3
p_2 = 0/2 = 0
p(n,t,r) = sqrt(1/3*0) = 0
bleu = bp*0 = 0
这里提供7种smooth方法
6.1. smooth 1 加入噪音
p_n = (m_n+sigma)/l_n
sigma 可以是一个很小的数值
继续上一个例子
translate = 'I like beijing '
reference = 'I am fine'
sigma = 0.001
p_1 = 1.001/3
p_2 = 0.001/2
p(n,t,r) = 0.012916397846665041
6.2. smooth 2 加一
p_n = (m_n + 1)/(l_n + 1)
6.3. smooth 3 对m=0情况单独处理
invcnt = 1
for n in 1 to N
if m = 0
invcnt = invcnt * 2
m = 1/invcnt
p = m/l
....
6.4. smooth 4 6.3升级版本
invcnt = invcnt*K/ln(len(Translate))
K 为自定义参数
6.5. smooth 5 平滑m
m_n = (m_(n-1) + m_n + m_n+1) /3 # for n > 0
m_0 = m_1 + 1
6.6. smooth 6 平滑precision
p_n = (m_n + alpha*p_n0)/(l_n + alpha)
p_n0 = p_(n-1) * p_(n-1)/p_(n-2)
alpha 为自定义参数
6.7. smooth 7 (smooth 4 与smooth5 的结合版本)
1. m = 1/invcnt if m = 0
invcnt = invcnt*K/ln(len(Translate))
2. m_n = (m_(n-1) + m_n + m_n+1) /3 for n > 0
7 小结
BLEU 通用公式
BlEU = P*BP
P = power(product(p_1..p_n), 1/n)
p_n = m_n / l_n
BP = min(1, exp(1 - l_r/l_t))
其中
p_n : 在 gram = n的时刻 译文(translate) 和原文(reference)的 Precision
m_n : 在 gram = n的时刻 译文在原文上的命中次数
l_n : 在 gram = n的时刻 译文的被切割的长度
1_r : 原文词语个数
l_t : 译文词语个数
8 log Bleu
当n比较大的时候 product(p_1..p_n)
会造成溢出问题, 这里可以通过ln将求积
转换为求和
推导如下
BLEU = P*BP
log(BLEU) = ln(P) + ln(BP)
= ln(P) + min(ln(1), 1-1_r/l_t)
= ln(P) + min(0, 1-1_r/l_t)
= ln(power(product(p_1..p_n), 1/n)) + min(0, 1-1_r/l_t)
= 1/n*ln(product(p_1..p_n)) + min(0, 1-1_r/l_t)
= 1/n*sum(ln(p_1)..ln(p_n)) + min(0, 1-1_r/l_t)
= 1/n*sum(ln(m_1)-ln(l_n)..ln(m_n)-ln(l_n)) + min(0, 1-1_r/l_t)
因此
log_BLEU = log_P + log_BP
log_P = 1/n*sum(ln_p_1..ln_p_n)
log_BP = min(0, 1-l_r/1_t)
ln_p_n = ln(m_n)- ln(l_n)
9 Bleu代码
代码主要分为两个部分
├── normalizer.py # 将String转换成List[str]
└── similarity.py # 计算 ref 与trans的分数
9.1 normalizer.py
# -*- coding:utf-8 -*-
# CREATED BY: bohuai jiang
# CREATED ON: 2020/10/10 2:42 PM
# LAST MODIFIED ON:
# AIM:
from typing import List
from abc import ABC
import re
class NormalizerABC(ABC):
def len(self, string: str, gram: int = 1) -> int:
length = len(self.to_list(string))
return max(0, length - gram + 1)
def to_list(self, string: str) -> List[str]:
pass
def normalize(self, string: str, gram: int):
str_list = self.to_list(string)
length = len(str_list)
start = 0
for i in range(gram, length + 1):
yield str_list[start:i]
start += 1
class StrToListEN(NormalizerABC):
def to_list(self, string: str) -> List[str]:
string = string.lower()
return re.findall('[a-z0-9]+', string)
9.2 similarity.py
这里使用 smooth 5 与3 的结合版本
这里做了一些改动,我们期望当两个文章/句子完全不一样时,返回0。
# -*- coding:utf-8 -*-
# CREATED BY: bohuai jiang
# CREATED ON: 2020/9/25 11:21 AM
# LAST MODIFIED ON:
# AIM: 计算同语系里两个句子的相似度
from .normalizer import StrToListEN, NormalizerABC
from collections import defaultdict
import math
class BlueSimilarity:
'''
refL http://acl2014.org/acl2014/W14-33/pdf/W14-3346.pdf
use smooth 7
pn = m_n/l_n
p(n,t,r) = power(product(p0..pn), 1/n)
m' = min(count(R[ri]), count(T[ri]))
mn = (m_(n-1) + m_n + m_(n+1))/3 if m > 0 & n > 9
m0 = m1 + 1
m = 1/invcnt
BP = min(1, exp(1- l_tran/l_target))
BLEU = BP * exp(WnlogPn)
'''
def __init__(self, normalizer: NormalizerABC = StrToListEN()):
'''
:param normalizer: 将str 转换成 list的逻辑
:param smoother: Wn
'''
self.normalizer = normalizer
super(BlueSimilarity, self).__init__()
def init_m_catch(self):
self.m_cache = dict()
# --------------- #
# algorithm #
# --------------- #
def count_n(self, value: str, n_gram: int):
token_counter = defaultdict(lambda: 0)
for token in self.normalizer.normalize(value, n_gram):
if token:
token_counter[tuple(token)] += 1
return token_counter
def m_n(self, n: int, trans: str, ref: str, invcnt: float):
# -- check if in cache -- #
if n in self.m_cache:
return self.m_cache[n]
if n == 0:
# - smoothing 3
return self.m_n(1, trans, ref, invcnt) + 1
trans_n = self.count_n(trans, n)
ref_n = self.count_n(ref, n)
m = 0
for i in ref_n:
m += min(ref_n[i], trans_n.get(i, 0))
# - smoothing 5 - #
if m == 0 and n > 1:
invcnt = invcnt * 2
m = 1 / invcnt
self.m_cache[n] = m
self.m_cache[n] = m
return m
def Pn(self, trans: str, target: str, n_gram: int):
Pn = 1
invcnt = 1
for n in range(1, n_gram + 1):
mn = self.m_n(n, trans, target, invcnt)
m_pre = self.m_n(n - 1, trans, target, invcnt)
m_next = self.m_n(n + 1, trans, target, invcnt)
# - smoothing 5
mn = (m_pre + mn + m_next) / 3
ln = self.normalizer.len(trans, n)
try:
Pn *= mn / ln
except ZeroDivisionError:
break
return math.pow(Pn, 1 / n_gram)
def BP(self, len_trans: int, len_target: int) -> float:
return min(1.0, math.exp(1.0 - float(len_target) / float(len_trans)))
def get_bleu(self, trans: str, target: str, n_gram: int):
self.init_m_catch()
trans_len = len(self.normalizer.to_list(trans))
target_len = len(self.normalizer.to_list(target))
return self.BP(trans_len, target_len) * self.Pn(trans, target, n_gram)