阅读《Automatic Testing and Improvement of Machine Translation》

最新推荐文章于 2021-10-06 17:46:29 发布

专心致志写BUG

最新推荐文章于 2021-10-06 17:46:29 发布

阅读量186

点赞数

分类专栏： reading

本文链接：https://blog.csdn.net/weixin_43975374/article/details/119489518

版权

reading 专栏收录该内容

20 篇文章 2 订阅

订阅专栏

Automatic Testing and Improvement of Machine Translation

ABSTRACT

本文介绍了TransRepair，一种全自动测试和修复机器翻译系统一致性的方法。TransRepair结合了mutation和metamorphic testing来检测不一致的bug(无需使用human oracles)。然后采用probability-reference或cross-reference的方法对翻译进行后处理，以灰盒或黑盒的方式修复不一致。我们对两种最先进的翻译器谷歌Translate和Transformer的评估表明，TransRepair在生成具有一致翻译的输入对方面具有很高的精确度(99%)。通过这些测试，使用自动一致性度量和手动评估，我们发现谷歌Translate和Transformer有大约36%和40%的不一致bug。黑盒修复修复了谷歌Translate和Transformer平均28%和19%的bug。灰盒修复平均为Transformer修复30%的bug。人工检查表明，通过我们的方法修复的翻译在87%的情况下提高了一致性(degrading it in 2%)，我们的修复在27%的情况下有更好的翻译可接受性(worse in 8%)。

在这里插入图片描述

Approach

1: Automatic Test Input Generation

1.1 Context-similarity Corpus Building

要进行上下文相似词替换，关键步骤是找到一个可以用其他(相似词)替换的词，而不损害句子结构。单词替换生成的新句子应该与原文一致。

单词向量通过上下文来捕捉单词的意思。为了度量相似度，我们使用从文本语料库中训练出来的词向量。在我们的方法中，单词 $w_1$ 和 $w_2$ 之间的单词相似度，用 $sim(w_1,w_2)$ 表示，通过下面的公式计算，其中 $v_x$ 表示单词x的向量。
$sim(w_1, w_2) = v_{w_1}v_{w_2}/{|v_{w_1}||v_{w_2}|}$
为了构建一个可靠的上下文相似语料库，我们采用两个词向量模型，并利用它们训练结果的交集。第一个模型是GloVe，它是根据维基百科2014数据和GigaWord 5数据训练而成的。第二个模型是SpaCy，这是CNN在OntoNotes上训练的多任务，包括从电话对话、新闻专线、新闻组、广播新闻、广播对话和博客收集的数据。当两个词的相似度在0.9以上时，我们认为该词对是上下文相似的，并将其放在上下文相似语料库中。使用这种方法，我们总共收集了131,933对单词。

1.2 Translation Input Mutation

1.2.1 Word replacement

对原句中的每个词进行搜索，以确定语料库中是否有匹配的词。如果找到匹配，我们就用上下文相似的词替换这个词，并生成最终的变异输入句子。与原句相比，每个突变句都包含一个单独的替换词。为了减少产生无法解析的突变体的可能性，我们只替换名词、形容词和数字。

1.2.2 Structural filtering

生成的突变句子可能无法解析，因为替换的单词可能不适合新句子的上下文。例如，"one"和"another"是上下文相似的单词，但是"a good one"可以进行解析，而"a good another"则不行。为了解决这类解析失败问题，我们应用额外的约束来检查生成的突变。特别地，我们应用了基于Stanford Parser的结构过滤。假设original sentence $s = w_1,w_2, ...,w_i, ...,w_n$ ，the mutated sentence $s' = w_1,w_2, ...,w_i', ...,w_n$ ( $w_i$ in $s$ is replaced with $w_i'$ in $s'$ .)。对于每个句子，the Stanford Parser outputs $l(w_i)$ ，每个单词的词性标签 in the Penn Treebank Project，如果 $l(w_i) \neq l(w_i ')$ ，我们从候选突变体中删除 $s^{'}$ ，因为突变产生了句法结构的变化。

1.3Automatic Test Oracle Generation

为了执行测试，我们需要用测试oracle来增广我们生成的测试输入，也就是检查是否发现了不一致的bug。为了做到这一点，我们假设句子中没有改变的部分除了改变后的词外，仍然保持其充分性和流畅性。适当性是指译文是否传达相同的意思，是否有信息丢失、增加或歪曲;流利意味着输出是否流利，语法是否正确。

两个问题：( $t (s)$ is the translations of sentences s)

$w$ and $w^{'}$ may change the entire translation of the sentences.
it is not easy to accurately map the words $w$ and $w^{'}$ with their respective one(s) in the translated text.

解决方法：

calculate the similarity of subsequences of $t (s)$ and $t (s^{'})$ , and use the largest similarity to approximate the consistency level between $t (s)$ and $t (s^{'})$ .
使用Wdiff进行切分，然后比较

1.4 Automatic Inconsistency Repair

1.4.1 Overall Repair Process

For $t (s)$ , which has inconsistency bugs, generate a set of mutants and get their translations $t(s_1),t(s_2), ...,t(s_n)$ .
rank these mutants
apply word alignment to obtain the mapped words between $s$ and $t (s)$ .
repair

rank method:

Translation Ranking based on Probability

grey-box: choose the mutant with the highest probability

Translation Ranking based on Cross-reference

black-box: choose translation with largest mean similarity score

专心致志写BUG

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
阅读《Automatic Testing and Improvement of Machine Translation》

Automatic Testing and Improvement of Machine TranslationABSTRACT本文介绍了TransRepair，一种全自动测试和修复机器翻译系统一致性的方法。TransRepair结合了mutation和metamorphic testing来检测不一致的bug(无需使用human oracles)。然后采用probability-reference或cross-reference的方法对翻译进行后处理，以灰盒或黑盒的方式修复不一致。我们对两种最先进的翻
复制链接

扫一扫