SF&ID文章

在这里插入图片描述
CM-Net:
过去的方法:
1.slotfilling和intentdetect分开的,rnn或者cnn建模语句,之后用分类算法
2.slotfilling和intentdetect是有关系的,

  • Goo2018intent影响slotfilling(intent产生之后在slot生成的LSTM中加元素)
  • zhang2018提出capsule,word->slot->intent,再倒回去。( limited in capturing complicated correlations among words, slots and intents;;local context information which has been shown highly useful for the slot filling (Mesnil et al., 2014), is not explicitly modeled.)
  • 我们的方法:
    • directly capture semantic relationships among words, slots and intents, which is conducted simultaneously at each word position in a collaborative manner.
    • alternately perform information exchange among the task-specific features referred from memories, local context representations and global sequential information via the well-designed block, named CM-block.

其实算法很简单:
以前:slot方式得到 p ( y s l o t ∣ H ) p(y^{slot}|H) p(yslotH)(test的时候使用viterbi算法);intent是 h t h^t ht的平均值然后使用分类算法。slot和intent的算法结合。

slot memory是最初提取的intent和slot特征(粗略)(随机初始化)
h ~ t i n t = A T T ( h t , M i n t ) h t s l o t = A T T ( [ h t ; h ~ t i n t ] , M s l o t ) \begin{aligned} \widetilde{\mathbf{h}}_{t}^{i n t} &=A T T\left(\mathbf{h}_{t}, \mathbf{M}^{\mathrm{int}}\right) \\ \mathbf{h}_{t}^{s l o t} &=A T T\left(\left[\mathbf{h}_{t} ; \widetilde{\mathbf{h}}_{t}^{i n t}\right], \mathbf{M}^{\mathrm{slot}}\right) \end{aligned} h tinthtslot=ATT(ht,Mint)=ATT([ht;h tint],Mslot)
A T T ( h t , M x ) = ∑ i α i m i x α i = exp ⁡ ( u ⊤ s i ) ∑ j exp ⁡ ( u ⊤ s j ) s i = h t ⊤ W m i x \begin{array}{l}{A T T\left(\mathbf{h}_{\mathbf{t}}, \mathbf{M}^{x}\right)=\sum_{i} \alpha_{i} \mathbf{m}_{\mathbf{i}}^{\mathbf{x}}} \\ {\alpha_{i}=\frac{\exp \left(\mathbf{u}^{\top} s_{i}\right)}{\sum_{j} \exp \left(\mathbf{u}^{\top} s_{j}\right)}} \\ {s_{i}=\mathbf{h}_{t}^{\top} \mathbf{W} \mathbf{m}_{\mathbf{i}}^{\mathbf{x}}}\end{array} ATT(ht,Mx)=iαimixαi=jexp(usj)exp(usi)si=htWmix
h ~ t s l o t = A T T ( h t , M s l o t ) h t i n t = A T T ( [ h t ; h ~ t s l o t ] , M i n t ) \begin{array}{l}{\widetilde{\mathbf{h}}_{t}^{s l o t}=A T T\left(\mathbf{h}_{\mathbf{t}}, \mathbf{M}^{\mathrm{slot}}\right)} \\ {\mathbf{h}_{t}^{i n t}=A T T\left(\left[\mathbf{h}_{\mathbf{t}} ; \widetilde{\mathbf{h}}_{t}^{s l o t}\right], \mathbf{M}^{\mathrm{int}}\right)}\end{array} h tslot=ATT(ht,Mslot)htint=ATT([ht;h tslot],Mint)
h t h^t ht-> h ~ t i n t \widetilde{\mathbf{h}}_{t}^{int} h tint-> h t s l o t h_t^{slot} htslot第一步,ATT函数是得到带有h信息的intent-memory表示。第二步,带有intent的信息然后又去和slot-memory交互,得到slot的确切信息。
借鉴18年模型使用S-LSTM(张岳的工作,但是认为可以用bert代替),将生成的信息联合原来的表示得到新的local feature,如下图:

生成${i,o,f,l,r,u}$6个lstm中状态转换相关的量。(解释:the hidden state is updated with abundant information from different perspectives, namely word embed- dings, local contexts, slots and intents representations.)
第三步是:global。用bilstm得到全局sequence信息。
最后的生成函数简单。
总的来说,就是信息深度融合。起的名字非常花哨。
但是我还是不太服气。不借助外部KB信息真的能做的这么好吗?
这样无法大量泛化,因为模型学到的抽取特征和意图分类仅仅是训练数据得来的。因为和sequence的结构性没有对上钩。

SF-ID network:
没有前一个指标高,(elmo和bert狠厉害,但是cmnet超过了他们)

c s l o t i = ∑ j = 1 T α i , j S h j c_{s l o t}^{i}=\sum_{j=1}^{T} \alpha_{i, j}^{S} h_{j} csloti=j=1Tαi,jShj( c s l o t 同 理 , a t t e n t i o n 产 生 c_{slot}同理,attention产生 cslotattention)
SF subnet:
f = ∑ V ∗ tanh ⁡ ( c s l o t i + W ∗ c i n t e ) f=\sum V * \tanh \left(c_{s l o t}^{i}+W * c_{i n t e}\right) f=Vtanh(csloti+Wcinte)
r s l o t i = f ⋅ c s l o t i r_{s l o t}^{i}=f \cdot c_{s l o t}^{i} rsloti=fcsloti
ID subnet:
r = ∑ i = 1 T α i ⋅ r s l o t i r=\sum_{i=1}^{T} \alpha_{i} \cdot r_{s l o t}^{i} r=i=1Tαirsloti
α i = exp ⁡ ( e i , i ) ∑ j = 1 T exp ⁡ ( e i , j ) \alpha_{i}=\frac{\exp \left(e_{i, i}\right)}{\sum_{j=1}^{T} \exp \left(e_{i, j}\right)} αi=j=1Texp(ei,j)exp(ei,i)
e i , j = W ∗ tanh ⁡ ( V 1 ∗ r s l o t i + V 2 ∗ h j + b ) e_{i, j}=W * \tanh \left(V_{1} * r_{s l o t}^{i}+V_{2} * h_{j}+b\right) ei,j=Wtanh(V1rsloti+V2hj+b)
r i n t e = r + c i n t e r_{i n t e}=r+c_{i n t e} rinte=r+cinte
Iteration Mechanism:(SF的f函数替换成这个,主要是 r i n t e r_{inte} rinte替换 c i n t e c_{inte} cinte
f = ∑ V ∗ tanh ⁡ ( c s l o t i + W ∗ r i n t e ) f=\sum V * \tanh \left(c_{s l o t}^{i}+W * r_{i n t e}\right) f=Vtanh(csloti+Wrinte)
生成:
y i n t e = softmax ⁡ ( W i n t e h y concat ⁡ ( h T , r i n t e ) ) y_{i n t e}=\operatorname{softmax}\left(W_{i n t e}^{h y} \operatorname{concat}\left(h_{T}, r_{i n t e}\right)\right) yinte=softmax(Wintehyconcat(hT,rinte))
y s l o t i = softmax ⁡ ( W s l o t h y concat ⁡ ( h i , r s l o t i ) ) y_{s l o t}^{i}=\operatorname{softmax}\left(W_{s l o t}^{h y} \operatorname{concat}\left(h_{i}, r_{s l o t}^{i}\right)\right) ysloti=softmax(Wslothyconcat(hi,rsloti))
分析感觉intent对slot的影响不大, intent->slot->intent,第一步对结果影响不大。
这个信息融合也比较简单。iteration的方式也没有很夸张。
intent先产生对于slot的产生更有帮助。

DCMTL方法:
Segment tagging and named entity tagging can be regarded as syntactic labeling, while slot filling is more like semantic labeling. With the help of information sharing ability of multi-task learning, once we learn the information of syntactic structure of an input sentence, filling the semantic labels becomes much easier。
The slot labels are large-scaled, informative and diverse in the case of E-commerce, and the syntactic structure of input Chinese utterance are complicated, so that the slot filling problem becomes hard to solve. If we directly train an endto-end sequential model, the tagging performance will suffer from data sparsity severely. When we try to handle slot filling (can be seen as semantic labeling task), some low-level tasks such as named entity tagging or segment tagging (can be seen as syntactic labeling task) may first make mistakes. If the low-level tasks get wrong, so as to the target slot filling task. That is to say it is easy to make wrong decisions in the low-level tasks, if we try to fill in all the labels at once. Then the error will propagate and lead to a bad performance of slot filling, which is our high-level target.
提出我们的模型:
“However, when it comes to problems where different tasks maintain a strict order, in another word, the performance of high-level task dramatically depends on low-level tasks, the hierarchy structure is not compact and effective enough. Therefore, we propose cascade and residual connections to allow high-level tasks to take the tagging results and hidden states from low-level tasks as additional input. These connections serves as “shortcuts” that create a more closely coupled and efficient model. We call it deep cascade multi-task learning,”
原来的hierarchical是hidden作为下一层的hidden,作者做了两个改进。不仅是上一层的hidden还加上一层的output,不仅是上一层的hidden还加上一层的output还加上上一层的input.
因为是做的完全不同的数据集、而且没有加入今年的方法所以对比性不强。但是还是值得学习。Cascade Connection和Residual Connection。写作方法也不错。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值