MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance-CSDN博客

本文链接：https://blog.csdn.net/Hekena/article/details/128410611

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

MoverScore是用来评测text generation的性能的。
一般常见的text generation任务有： summary、machine translation、image caption、data-to-text generation。

Introduction

在introduction中交代的目标是：
Our goal in this paper is to devise an automated evaluation metric assigning a single holistic score to any system-generated text by comparing it against human references for content matching。
从这个goal中，看出是和human annotated reference做的比较，生成的score分值。

不同于Bertscore评测方式，是以单个token为计量单位的方式（one to one），MoverScore是以n-gram为计量单位，评测相似度 (many to one)。

权重值采用IDF，逆文档频率。
x=[x1,x2,…xm]的sequence，变为n-gram后，表示为xn.
相似度评分矩阵表示为： d(xin,yjn),the distance between the i-th n-gram of x and the j-th n-gram of y。
距离度量公式采用的是欧式距离，Euclidean distance.
在这里插入图片描述
xin, the i-th n-gram的embedding表示为所含token的embedding的权重和：

如果不是只用最后一个layer的output，还需要考虑将多个layer的结果做concat，得到最终的表示结果，文中提出的是采用Pooling means方法：

Word Mover’s Distance (WMD) 表示为： <C,F>表示C和F中element做element-wise multiplication。
在这里插入图片描述

Variations

沿四个维度提出的变体：
(i) 嵌入的粒度，即 n-gram 的 n 大小，
(ii) 预训练嵌入机制的选择，static embedding with Word2vector & contextualized embedding with ELMO and Bert
(iii) 用于 BERT3 的微调任务, 是否在NLI（natural language inference）任务中微调，以得到更好的表示？
(iv) 聚合技术（p means或其他）