SF&ID文章

最新推荐文章于 2023-07-09 19:30:09 发布

zixufang

最新推荐文章于 2023-07-09 19:30:09 发布

阅读量646

点赞数

分类专栏： DST学习

本文链接：https://blog.csdn.net/yagreenhand/article/details/103141973

版权

DST学习专栏收录该内容

7 篇文章 1 订阅

订阅专栏

在这里插入图片描述
CM-Net：
过去的方法：
1.slotfilling和intentdetect分开的，rnn或者cnn建模语句，之后用分类算法
2.slotfilling和intentdetect是有关系的，

Goo2018intent影响slotfilling（intent产生之后在slot生成的LSTM中加元素）
zhang2018提出capsule，word->slot->intent，再倒回去。（ limited in capturing complicated correlations among words, slots and intents；；local context information which has been shown highly useful for the slot filling (Mesnil et al., 2014), is not explicitly modeled.）
我们的方法：
- directly capture semantic relationships among words, slots and intents, which is conducted simultaneously at each word position in a collaborative manner.
- alternately perform information exchange among the task-specific features referred from memories, local context representations and global sequential information via the well-designed block, named CM-block.

其实算法很简单：
以前：slot方式得到 $p(y^{slot}|H)$ (test的时候使用viterbi算法)；intent是 $h^t$ 的平均值然后使用分类算法。slot和intent的算法结合。

slot memory是最初提取的intent和slot特征（粗略）（随机初始化）
$\begin{aligned} \widetilde{\mathbf{h}}_{t}^{i n t} &=A T T\left(\mathbf{h}_{t}, \mathbf{M}^{\mathrm{int}}\right) \\ \mathbf{h}_{t}^{s l o t} &=A T T\left(\left[\mathbf{h}_{t} ; \widetilde{\mathbf{h}}_{t}^{i n t}\right], \mathbf{M}^{\mathrm{slot}}\right) \end{aligned}$
$\begin{array}{l}{A T T\left(\mathbf{h}_{\mathbf{t}}, \mathbf{M}^{x}\right)=\sum_{i} \alpha_{i} \mathbf{m}_{\mathbf{i}}^{\mathbf{x}}} \\ {\alpha_{i}=\frac{\exp \left(\mathbf{u}^{\top} s_{i}\right)}{\sum_{j} \exp \left(\mathbf{u}^{\top} s_{j}\right)}} \\ {s_{i}=\mathbf{h}_{t}^{\top} \mathbf{W} \mathbf{m}_{\mathbf{i}}^{\mathbf{x}}}\end{array}$
$\begin{array}{l}{\widetilde{\mathbf{h}}_{t}^{s l o t}=A T T\left(\mathbf{h}_{\mathbf{t}}, \mathbf{M}^{\mathrm{slot}}\right)} \\ {\mathbf{h}_{t}^{i n t}=A T T\left(\left[\mathbf{h}_{\mathbf{t}} ; \widetilde{\mathbf{h}}_{t}^{s l o t}\right], \mathbf{M}^{\mathrm{int}}\right)}\end{array}$
$h^t$ -> $\widetilde{\mathbf{h}}_{t}^{int}$ -> $h_t^{slot}$ 第一步，ATT函数是得到带有h信息的intent-memory表示。第二步，带有intent的信息然后又去和slot-memory交互，得到slot的确切信息。
借鉴18年模型使用S-LSTM(张岳的工作，但是认为可以用bert代替）,将生成的信息联合原来的表示得到新的local feature，如下图：

生成${i,o,f,l,r,u}$6个lstm中状态转换相关的量。（解释：the hidden state is updated with abundant information from different perspectives, namely word embed- dings, local contexts, slots and intents representations.）
第三步是：global。用bilstm得到全局sequence信息。
最后的生成函数简单。
总的来说，就是信息深度融合。起的名字非常花哨。
但是我还是不太服气。不借助外部KB信息真的能做的这么好吗？
这样无法大量泛化，因为模型学到的抽取特征和意图分类仅仅是训练数据得来的。因为和sequence的结构性没有对上钩。

SF-ID network:
没有前一个指标高，（elmo和bert狠厉害，但是cmnet超过了他们）

$c_{s l o t}^{i}=\sum_{j=1}^{T} \alpha_{i, j}^{S} h_{j}$ ( $c_{slot}同理，attention产生$ )
SF subnet:
$f=\sum V * \tanh \left(c_{s l o t}^{i}+W * c_{i n t e}\right)$
$r_{s l o t}^{i}=f \cdot c_{s l o t}^{i}$
ID subnet:
$r=\sum_{i=1}^{T} \alpha_{i} \cdot r_{s l o t}^{i}$
$\alpha_{i}=\frac{\exp \left(e_{i, i}\right)}{\sum_{j=1}^{T} \exp \left(e_{i, j}\right)}$
$e_{i, j}=W * \tanh \left(V_{1} * r_{s l o t}^{i}+V_{2} * h_{j}+b\right)$
$r_{i n t e}=r+c_{i n t e}$
Iteration Mechanism:(SF的f函数替换成这个，主要是 $r_{inte}$ 替换 $c_{inte}$ ）
$f=\sum V * \tanh \left(c_{s l o t}^{i}+W * r_{i n t e}\right)$
生成:
$y_{i n t e}=\operatorname{softmax}\left(W_{i n t e}^{h y} \operatorname{concat}\left(h_{T}, r_{i n t e}\right)\right)$
$y_{s l o t}^{i}=\operatorname{softmax}\left(W_{s l o t}^{h y} \operatorname{concat}\left(h_{i}, r_{s l o t}^{i}\right)\right)$
分析感觉intent对slot的影响不大， intent->slot->intent,第一步对结果影响不大。
这个信息融合也比较简单。iteration的方式也没有很夸张。
intent先产生对于slot的产生更有帮助。

DCMTL方法：
Segment tagging and named entity tagging can be regarded as syntactic labeling, while slot filling is more like semantic labeling. With the help of information sharing ability of multi-task learning, once we learn the information of syntactic structure of an input sentence, filling the semantic labels becomes much easier。
The slot labels are large-scaled, informative and diverse in the case of E-commerce, and the syntactic structure of input Chinese utterance are complicated, so that the slot filling problem becomes hard to solve. If we directly train an endto-end sequential model, the tagging performance will suffer from data sparsity severely. When we try to handle slot filling (can be seen as semantic labeling task), some low-level tasks such as named entity tagging or segment tagging (can be seen as syntactic labeling task) may first make mistakes. If the low-level tasks get wrong, so as to the target slot filling task. That is to say it is easy to make wrong decisions in the low-level tasks, if we try to fill in all the labels at once. Then the error will propagate and lead to a bad performance of slot filling, which is our high-level target.
提出我们的模型：
“However, when it comes to problems where different tasks maintain a strict order, in another word, the performance of high-level task dramatically depends on low-level tasks, the hierarchy structure is not compact and effective enough. Therefore, we propose cascade and residual connections to allow high-level tasks to take the tagging results and hidden states from low-level tasks as additional input. These connections serves as “shortcuts” that create a more closely coupled and efficient model. We call it deep cascade multi-task learning,”
原来的hierarchical是hidden作为下一层的hidden，作者做了两个改进。不仅是上一层的hidden还加上一层的output，不仅是上一层的hidden还加上一层的output还加上上一层的input.
因为是做的完全不同的数据集、而且没有加入今年的方法所以对比性不强。但是还是值得学习。Cascade Connection和Residual Connection。写作方法也不错。

zixufang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SF&ID文章

CM-Net：过去的方法：1.slotfilling和intentdetect分开的，rnn或者cnn建模语句，之后用分类算法2.slotfilling和intentdetect是有关系的，Goo2018intent影响slotfilling（intent产生之后在slot生成的LSTM中加元素）zhang2018提出capsule，word->slot->intent，...
复制链接

扫一扫