domain-slot-value文章学习

最新推荐文章于 2024-08-23 08:50:32 发布

zixufang

最新推荐文章于 2024-08-23 08:50:32 发布

阅读量578

点赞数 1

分类专栏： DST学习

本文链接：https://blog.csdn.net/yagreenhand/article/details/103152320

版权

DST学习专栏收录该内容

7 篇文章 1 订阅

订阅专栏

主要针对mutlimoz2.0数据集进行学习.
这五个模型还需要仔细对比，看实验部分，ablation study。
流行方法：建模context成sequential形式，单个domain-slot同时作为输入----->生成或者检索。
生成式：trade、comer、som-dst
检索式：submt、
生成检索结合：ds-dst
效果：

数据集2.1对state纠错。不喜欢NADST和ML-BST的模型，一个是深度建模，一个使用fertirzer对于每个value确定长度。但是相比trade和comer还是有显著提升的。
前三个使用decoder的都没有使用candidate list，都是通过context和vocab共同作用。
主要都是domain-slot结合，外加domain判断；也有domain单独生成的。
对于 $B_{t-1}$ 的使用，comer进行与ut同等对待的编码参与attn。som-dst是显式提取出生成的时候并不使用因为独立判断出update,检索式不用。
判断是否update很多时候并没有用到。只有som-dst和DSTReader用到。
观察其他论文的实验结果是怎么分类的。

trade:

slot-gate是指，是否需要b的generator, $c_{j0}$ 是context和domain-slot共同结合的结果。

$G_{j}=\operatorname{Softmax}\left(W_{g} \cdot\left(c_{j 0}\right)^{\top}\right) \in \mathbb{R}^{3}$
$\begin{aligned} P_{j k}^{\text {final }} &=p_{j k}^{\text {gen }} \times P_{j k}^{\text {vocab }} \\ &+\left(1-p_{j k}^{\text {gen }}\right) \times P_{j k}^{\text {history }} \in \mathbb{R}^{|V|} \end{aligned}$
$\begin{array}{c}{P_{j k}^{\text {vocab }}=\operatorname{Softmax}\left(E \cdot\left(h_{j k}^{\mathrm{dec}}\right)^{\top}\right) \in \mathbb{R}^{|V|}} \\ {P_{j k}^{\text {history }}=\operatorname{Softmax}\left(H_{t} \cdot\left(h_{j k}^{\mathrm{dec}}\right)^{\top}\right) \in \mathbb{R}^{\left|X_{t}\right|}}\end{array}$

$\begin{aligned} p_{j k}^{\mathrm{gen}}=& \operatorname{Sigmoid}\left(W_{1} \cdot\left[h_{j k}^{\mathrm{dec}} ; w_{j k} ; c_{j k}\right]\right) \in \mathbb{R}^{1} \\ & c_{j k}=P_{j k}^{\mathrm{history}} \cdot H_{t} \in \mathbb{R}^{d_{h d d}} \end{aligned}$
抽象点来说，都是先生成hidden state然后分别作用到context和vocab，context是为了实现copy机制，而vocab本来就是decode的正常模式。至于两者平衡的参数，是简单的线性模型(输入是lstm单元输入集上一个生成的词、lstm的h，带有生成h影响的context)决定的，
~~分别作用到context和vocab为了判断value是语句中已有还是要从数据库中提取，~~

comer
“Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation”
逐步生成.building DST modules capable of generating structured sequences,
这一篇只依靠context，encoder-decoder结构。具体生成是：
这个也是在decoder加东西，加残差和memory。
分别生成domain、slot、value的三个decoder共享参数。所以模型应该是conditional sequence generation问题。
decoder的输入输出 $\mathbf{s}, \mathbf{h}_{s}=\operatorname{CMRD}\left(\mathbf{e}_{x}, \mathbf{c}, H_{B}, H_{a}, H_{u}\right)$

LSTM先输出一个 $h_0$
$\mathbf{h}_{0}=\operatorname{LSTM}\left(\mathbf{e}_{x_{t}}, \mathbf{q}_{t-1}\right)$
Attn的几个前后 $h_i$ 最后concat成一个 $r$
$\begin{array}{l}{\mathbf{h}_{1}=\mathbf{h}_{0}+\mathbf{c}} \\ {\mathbf{h}_{2}=\mathbf{h}_{1}+\text { Attn belief }\left(\mathbf{h}_{1}, H_{e}\right)} \\ {\mathbf{h}_{3}=\mathbf{h}_{2}+\operatorname{Att} n_{\mathrm{sys}}\left(\mathbf{h}_{2}, H_{a}\right)} \\ {\mathbf{h}_{4}=\mathbf{h}_{3}+\operatorname{At} n_{\mathrm{usr}}\left(\mathbf{h}_{3}, H_{u}\right)} \\ {\mathbf{r}=\mathbf{h}_{1} \oplus \mathbf{h}_{2} \oplus \mathbf{h}_{3} \oplus \mathbf{h}_{4} \in R^{4 d_{m}}}\end{array}$
Attn模块
$\begin{aligned} \mathbf{a} &=W_{1}^{T} \mathbf{h}+\mathbf{b}_{1} & & \in R^{d_{m}} \\ \mathbf{c} &=\operatorname{softmax}\left(H^{T} a\right) & & \in R^{l} \\ \mathbf{h} &=H \mathbf{c} & & \in R^{d_{m}} \\ \mathbf{h}_{a} &=W_{2}^{T} \mathbf{h}+\mathbf{b}_{2} & & \in R^{d_{m}} \end{aligned}$
4） $r$ 经过复杂的MLP模块
$\begin{aligned} \mathbf{r}_{1} &=\sigma\left(W_{1}^{T} \mathbf{r}_{0}+\mathbf{b}_{1}\right) \\ \mathbf{r}_{2} &=\sigma\left(W_{2}^{T} \mathbf{r}_{1}+\mathbf{b}_{2}\right) \\ \mathbf{r}_{3} &=\sigma\left(W_{3}^{T} \mathbf{r}_{2}+\mathbf{b}_{3}\right) \\ \mathbf{h}_{k} &=\sigma\left(W_{4}^{T} \mathbf{r}_{3}+\mathbf{b}_{4}\right) \end{aligned}$
5）经过dropout、softmax输出结果
$\begin{aligned} \mathbf{h}_{s} &=\operatorname{dropout}\left(\mathbf{h}_{k}\right) & & \in R^{d_{m}} \\ \mathbf{h}_{o} &=W_{k}^{T} \mathbf{h}_{s}+\mathbf{b}_{s} & & \in R^{d_{e}} \\ \mathbf{p}_{s} &=\operatorname{softmax}\left(E^{T} \mathbf{h}_{o}\right) & & \in R^{d_{v}} \\ s &=\operatorname{argmax}\left(\mathbf{p}_{s}\right) & & \in R \end{aligned}$

我认为多层MLP有很大影响。Attn模块的基本算法非常简单，多层总是可以得到很好的效果。对于文本的挖掘体现在Att模块中。
采用监督学习应该效果是很好的。可能存在过拟合情况。

som-dst:
为了克服从ontology中找的问题，previous either directly generates or extracts the value from the dialogue context for every slot, allowing open vocabulary-based DST。即对比源从candidate set变成到vocabulary-based。
依然存在的问题：
while scalable and robust to handling unseen slot values,
do not efficiently perform DST since they predict the dialogue state from scratch at every dialogue turn.
是维护一个domain-slot表格，每次判断是否需要更新，更新产生结果值。
更新算法是：
predictor采用预训练bert, 每个slot是一个分类器，判断师傅需要更新， $h^{[CLS]}$ 是domain label判断当前轮是处于哪个domain；
generator采用trade类似模式，copy机制，通过context和vocab共同决定。 $g_{t}^{j, k}=\operatorname{GRU}\left(g_{t}^{j, k-1}, e_{t}^{j, k}\right)$ ，起始输入是 $g_{t}^{j, 0}=h_{t}^{\mathrm{X}} = \tanh \left(W_{\text {pool}} h_{t}^{[\mathrm{CLS}]}\right) \text { and } e_{t}^{j, 0}=h_{t}^{[S L 0 T]^{j}}$ (生成的时候没有用到上一轮的value信息，因为update)

结果分析：
（带着疑问，结果是会发生变化）
为什么是四种状态。 DELETE DONTCARE UPDATE CARRYOVER 前三个替代传统的update，因为delete和dontcare一般是隐式变化，更需要建模？
ablation表明，**slot状态预测很重要。**在decode过程中，每次输入是真值影响不算大，但是有提升。说明GRU本身预测能力还很差。

submt:

匹配模式：
使用bert建模context，匹配所有的candidate。
另外，以前的文章domain和slot分开，现在统一。

ds-dst：
直接把slot进行分类，划分成picklist-based 和 span-based，对于两种slot分别做处理。
每个slot都要做一次与context联合的建模。
slot-gate判断是否更新,对bert第一个输出采用mlp操作。(t表示turn，j表示domain-slot顺序）
$P_{t j}^{g a t e}=\operatorname{softmax}\left(W_{g a t e} \cdot\left(r_{t j}^{\mathrm{CLS}}\right)^{\top}+b_{g a t e}\right)$
对于span-based直接使用context进行mlp操作产生起止坐标
$\left[\alpha_{t j}^{s t a r t}, \alpha_{t j}^{e n d}\right]=W_{s p a n} \cdot\left(\left[r_{t j}^{1}, \ldots, r_{t j}^{K}\right]\right)^{\top}+b_{s p a n}$
对于picklist-based：
bert输入candidate list的表示， $Y_{j}=\operatorname{BERT}\left([\mathrm{CLS}] \oplus V_{j} \oplus[\mathrm{SEP}]\right)$ ,再进行余弦相似度比较 $\cos \left(r_{t j}^{\mathrm{CLS}}, y_{l}^{\mathrm{CLS}}\right)$