说到底还是线性组合,学习POMDP,
How Time Matters: Learning Time-Decay Attention for Contextual Spoken Language Understanding in Dialogues
Time Masking: Leveraging Temporal Information in Spoken Dialogue Systems
Decay-Function-Free Time-Aware Attention to Context and Speaker Indicator for Spoken Language Understanding
1、
考虑到system和user分别建模(role)
v c u r = BLSTM ( x , W h i s ⋅ v h i s ) o = sigmoid ( W S L U ⋅ v c u r ) \begin{aligned} \mathbf{v}_{\mathrm{cur}} &=\operatorname{BLSTM}\left(\mathbf{x}, W_{\mathrm{his}} \cdot \mathbf{v}_{\mathrm{his}}\right) \\ \mathbf{o} &=\operatorname{sigmoid}\left(W_{\mathrm{SLU}} \cdot \mathbf{v}_{\mathrm{cur}}\right) \end{aligned} vcuro=BLSTM(x,Whis⋅vhis)=sigmoid(WSLU⋅vcur)
v his = ∑ role v his, role = ∑ role B L S T M role ( x t , role ) \begin{aligned} \mathbf{v}_{\text {his }} &=\sum_{\text {role }} \mathbf{v}_{\text {his, role }} \\ &=\sum_{\text {role }} \mathrm{BLSTM}_{\text {role }}\left(x_{t, \text { role }}\right) \end{aligned} vhis =role ∑vhis, role =role ∑BLSTMrole (xt, role )
v h i s U = ∑ role B L S T M role ( x t , role , { α u j ∣ u j ∈ role } ) \mathbf{v}_{\mathrm{his}}^{U}=\sum_{\text {role }} \mathrm{BLSTM}_{\text {role }}\left(x_{t, \text { role }},\left\{\alpha_{u_{j}} | u_{j} \in \text { role }\right\}\right) vhisU=∑role BLSTMrole (xt, role ,{
αuj∣uj∈ role })
v h i s R = ∑ r o l e α r o l e ⋅ v h i s , r o l e \mathbf{v}_{\mathrm{his}}^{R}=\sum_{\mathrm{role}} \alpha_{\mathrm{role}} \cdot \mathbf{v}_{\mathrm{his}, \mathrm{role}} vhisR=∑roleαrole⋅vhis,role α r o l e = m a x ( α u j ) α_{role}=max(α_{u_j}) αrole