训练数据格式为,ctx,input,out,ctx为goal,input 为对话,out为协商结果,例如
<input> 3 2 3 1 1 1 </input> <dialogue> THEM: im a reader , so id like the books . . . . you may have the hats and ball <eos> YOU: let me have two books and the hats <eos> THEM: its a trilogy so i really need to hold on to all the books <eos> YOU: cant do it <eos> THEM: ok , well best i can do is 2 books and the ball then . . . anything less and i cant make a deal <eos> YOU: so the hats and a book for me ? <eos> THEM: yes <eos> YOU: <selection> </dialogue> <output> item0=1 item1=3 item2=0 item0=2 item1=0 item2=1 </output> <partner_input> 3 3 3 0 1 1 </partner_input>
上述数据中ctx=[3,2,3,1,1,1]
input dialog=THEM: im a reader , so id like the books … . you may have the hats and ball YOU: let me have two books and the hats THEM: its a trilogy so i really need to hold on to all the books YOU: cant do it THEM: ok , well best i can do is 2 books and the ball then … anything less and i cant make a deal YOU: so the hats and a book for me ? THEM: yes YOU:
out=[ 3, 3 ,3 ,0 ,1, 1]
模型为,首先用一个GRUg对input goal ctx编码,最后的隐藏层输出得到hg.
之后用一个GRUw对 input dialog以及hg编码,得到输出ht,公式为:
ht=GRUw(ht−1,[Ext−1,hg])
得到ht后将其输入一个线性层得到输出概率,
pθ(xt|x0..t−1,g)=exp(ETht)
将input dialog以及input goal输入另一个双向GRU0得到输出out:
最后计算损失函数: