相似度系列9: unify USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

最新推荐文章于 2024-08-09 09:30:34 发布

YingJingh

最新推荐文章于 2024-08-09 09:30:34 发布

阅读量174

点赞数

分类专栏：相似度文章标签：分类

本文链接：https://blog.csdn.net/Hekena/article/details/127873378

版权

相似度专栏收录该内容

27 篇文章 2 订阅

订阅专栏

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

模型特点：multiple model variants

response, r, conditioned on dialog context, c, and fact, f. The input to the transformer is the concatenation of c and f

Different decoding strategies are used to obtain four different outputs from this model.

standard argmax sampling
nucleus sampling (Holtzman et al., 2019) is used at three different rates: p = {0.3, 0.5, 0.7}

数据构建

没有选择众包。文中有原因：然而，没有使用众包，因为（1）注释说明很长，（2）进行了初步的注释，然后是小组讨论，（3）有许多来自少数注释者的注释，可以检查注释者的主观性。

**数据标注过程：**注释者得到了一套说明（附录A）。进行了一次小规模的初步注释，每个人注释了5个对话背景（总共30个回答）。对每个问题都计算了注释者之间的一致性。在初步通过和讨论会议之后，对指示进行了改进（例如，维持语境被改为3分，而不是2分）。在对指示进行修改后，进行了全面的注释工作。

关心的指标有：

可理解的（0-1）。鉴于之前的背景，该反应是否可以理解？
- 自然（1-3）。该反应是否看起来是一个人自然会说的东西？
- 保持语境（1-3）。答复是否作为前面对话的有效延续？
- 有趣（1-3）。答复是枯燥的还是有趣的？
- 使用知识（0-1）。考虑到该回答所依据的事实，该回答在多大程度上使用了该事实？
- 整体质量（1-5）。鉴于你的上述答案，你对这段话的质量的总体印象是什么？

• Understandable (0 - 1): Is the response understandable given the previous context?
• Natural (1 - 3): Does the response seem to be
something that a person would naturally say?
• Maintains Context (1 - 3): Does the response
serve as a valid continuation of the preceding
conversation?
• Interesting (1 - 3): Is the response dull or
interesting?
• Uses Knowledge (0 - 1): Given the fact that
the response is conditioned on, how well does
the response use that fact?
• Overall Quality (1 - 5): Given your answers
above, what is your overall impression of the
quality of this utterance?

Three models were used to generate system outputs: a sequence-to-sequence model (Seq2Seq),
an LSTM language model (LM) and a Key-Value
Profile Memory Network (KV-MemNN)
在这里插入图片描述

在这里插入图片描述

YingJingh

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
相似度系列9: unify USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

模型特点：multiple model variants。
复制链接

扫一扫