论文阅读：Best of Both Worlds: Transferring Knowledge from D to G

最新推荐文章于 2021-10-06 19:04:42 发布

Lcyztf

最新推荐文章于 2021-10-06 19:04:42 发布

阅读量284

点赞数

分类专栏： Dialogue Systems ML 文章标签： dialogue system

本文链接：https://blog.csdn.net/Lcyztf/article/details/82559206

版权

Dialogue Systems 同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

3 篇文章 0 订阅

订阅专栏

首先pretrain D和G，然后fix D，让G不断sample response，然后根据D的监督信号进行更新。这里使用Gumbel Softmax来解决non-differentialable problem。

作者从MLE（or equivalently CE）的generic and safe response问题入手，指出MLE训练的生成模型容易“game” MLE，会倾向于“average” training corpus然后就容易生成frequent responses。

One reason for this emergent beavior is that the space of possible next utterances in a dialog is highly multi-modal (there are many possible paths a dialog may take in the future). In the face of such highly multi-modal output distributions, models ‘game’ MLE by latching on to the head of the distribution or the frequent responses, which by nature tend to be generic and widely applicable

为了解决MLE的这个问题，可以考虑的一类解决办法是sequence level training， speciﬁcally, using reinforcement learning to optimize taskspeciﬁc sequence metrics. 但是unfortunately，dialogue没有一个automatic metric that is highly correlated with human judgement。（作者其实是说在dialogue中没有一个很好的objective function。）

这里有一个感觉，由于source 对target的约束太小，导致target sequence有非常大的space（semantic or else），MLE的 constraint 过于tight，而且对于大多数生成数据集都存在的问题是对于一个source只有一个对应的target更加剧了这个情况。这个时候一个learnable supervisor就非常重要了，noted as D。这里就涉及到一个问题，如何学到一个真正好的D，让它真的能够应付huge output space，给不是gound truth但是also appropriate response打一个高分？在同一个数据集？

这也是作者的想法，这样的responses应该在D映射到的hidden space中有着近的距离。

D learns a task-dependent perceptual similarity and learns to recognize multiple correct responses in the feature space. The interaction between responses is captured via the similarity between the learned embeddings.

值得注意的，这里是Visual dialogue的问题，和open-domain相比there holds more constraints.

核心部分，discriminator loss：

In particular, it needs to encourage perceptually meaningful similarities.

The N-pair loss objective encourages learning a space in which the ground truth answer is scored higher than other options, and at the same time, encourages options similar to ground truth answers to score better than dissimilar ones.

Unlike the multiclass logistic loss, the options that are correct but different from the correct option may not be overly penalized, and thus can be useful in providing a reliable signal to the generator. We regularize the L2 norm of the embedding vectors to be small.

Generator loss

Lcyztf

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
论文阅读：Best of Both Worlds: Transferring Knowledge from D to G

首先pretrain D和G，然后fix D，让G不断sample response，然后根据D的监督信号进行更新。这里使用Gumbel Softmax来解决non-differentialable problem。作者从MLE（or equivalently CE）的generic and safe response问题入手，指出MLE训练的生成模型容易“game” MLE，会倾向于“av...
复制链接

扫一扫

专栏目录