问题:解决数据问题
数据要求大
Wizard-of-Oz approach(没有具体查)得到的对话对话进展的范围度不强。
背景:
1.需要考虑的情况----
dialogue flow:语句携带的信息
surface flow:表意相同时可以替换的表达方式(两个flow是相互的。)
database states:数据库查询结果的情况
noise conditions:语音转换成文字的正确率
2.user simulator 生成的句子可能和真实用户的不一样
3.M2M:使用self-play和crowd-sourcing来得到好的对话集(之后他们使用SL训练)
创新点:
适应新领域
M2M的基础上使用RL,
exhaustively generate dialogue tem- plates for a given task using dialogue self-play between a simulated user and a task-independent programmed system agent, (2) obtain natural lan- guage rewrites of these templates using crowd sourcing, (3) train an end-to-end conversational agent on this fully annotated dataset, achieving a reasonable task completion rate, and (4) deploy this agent to interact with users and collect user feedback, which serves as a reward value to con- tinuously improve the agent’s policy with on-line reinforcement learning updates.
1)通过simulator和programmed agent使用self-play(游戏中的方法)来生成模板对话。
(slot is value and)
2)使用crowd-sourcing 产生流畅的自然语言,更加日常化
3)把agent直接和用户test,通过feedback使用on-line RL训练
在使用RL训练的时候,因为新领域数据少。解决方法:simulator发出的annotation,在众包里面搜索多样化的自然语言形式。;在真实交互时通过打分反馈。
simulator使用了task-independent的架构,
在rewrite的时候,我觉得是使用了先套用了模板,工人只需要在相应的位置填上相应的数据。
In contrast, M2M significantly reduces the crowd- sourcing expense by automatically annotating a majority of the dialogue turns and annotating the remaining turns with two simpler crowd-sourcing tasks: “Does this utterance contain this particular slot value?” and “Do these two utterances have the same meaning?”, which are easier for the average crowd worker.(这是M2M的标注方式)
本文中我们是要重写,那么就要
The paraphrase task is designed as a “contextual rewrite” task where a crowd worker sees the full dialogue template。This encourages the crowd worker to inject linguistic phenomena like coreference (“Reserve that restaurant”) and lexical entrainment (“Yes, the 6pm show”) into the utterances.
实验结果:
没有对比,因为数据集不用,只有RL和SL的对比,simulator和true user的对比。