【论文笔记】A Symmetric Local Search Network for Emotion-Cause Pair Extraction

A Symmetric Local Search Network for Emotion-Cause Pair Extraction

Abstract

要解决的问题:Emotion-cause pair extraction (ECPE)

已有的方案:a two-step method (Xia & Ding, 2019)

Xia & Ding (2019) 的缺点:未考虑情感从句与原因从句之间的相关性

To tackle this task, a two-step method was proposed by previous study which first extracted emotion clauses and cause clauses individually, then paired the emotion and cause clauses, and filtered out the pairs without causality.

作者的工作核心local search

Symmetric Local Search Network (SLSN): perform the detection and matching simultaneously by local search

SLSN 的两个对称的子网络:

  • the emotion subnetwork
  • the cause subnetwork

两个子网络的组成部分:

  • a clause representation learner - a specially-designed cross-subnetwork component
  • a local pair searcher (LPS)

SLSN consists of two symmetric subnetworks, namely the emotion subnetwork and the cause subnetwork.

Each subnetwork is composed of a clause representation learner and a local pair searcher. The local pair searcher is a specially-designed cross-subnetwork component which can extract the local emotion-cause pairs.

Introduction

作者认为:模拟人类的行为,应该同时考虑情感/原因的检测与匹配

However, when humans deal with the ECPE task, they usually consider the detection and matching problems at the same time.

image-20211016225433649

local search 的优点:可以忽略距离很远的 emotion-cause pair

The advantage of local search is that the wrong pairs (e.g., (c4, c12)) beyond the local context scope can be avoided. Additionally, when local searching the cause clause corresponding to the target emotion clause, humans not only judge whether the clause is a cause clause, but also consider whether it matches the target emotion clause.

LPS 引入 local context window限制 local search 的 context scope

Specifically, the LPS introduces a local context window to limit the scope of context for local search.

Symmetric Local Search Network

Task Definition

每个数据集 D D D 中包含了多个文档 d d d
d = [ c 1 , c 2 , ⋯   , c n ] d = [c_1,c_2,\cdots, c_n] d=[c1,c2,,cn]

  • c e c^e ce​ 表示情感从句(emotion clause)
  • c c c^c cc 表示原因从句(cause clause)

目标:提取所有的 emotion-cause pairs
P = { ⋯   , ( c e , c c ) , ⋯   } P = \{\cdots, (c^e, c^c), \cdots\} P={,(ce,cc),}

An Overview of SLSN

image-20211017002422190

  • input - a sequence of clauses from a document
  • output - the local pair labels for these clauses

对于每个 c i c_i ci,SLSN 预测两种标签(label):

  • E-LC label - y ^ i e l c \hat y^{elc}_i y^ielc
    • the emotion label (E-label) y ^ i e \hat y^e_i y^ie​​ of the i-th clause
    • the local cause labels (LC-label) ( y ^ i − 1 c , y ^ i c , y ^ i + 1 c ) (\hat y^c_{i-1},\hat y^c_i, \hat y^c_{i + 1}) (y^i1c,y^ic,y^i+1c)​ of the clauses near the i-th clause
  • C-LE label - y ^ i c l e \hat y^{cle}_i y^icle
    • the cause label (C-label) y ^ i c \hat y^c_i y^ic of the i-th clause
    • the local emotion labels (LE-label) ( y ^ i − 1 e , y ^ i e , y ^ i + 1 e ) (\hat y^e_{i-1},\hat y^e_i, \hat y^e_{i + 1}) (y^i1e,y^ie,y^i+1e)​ of the clauses near the i-th clause

y ^ i e l c \hat y^{elc}_i y^ielc​ 所对应的 E-C pair set 为 P e l c P_{elc} Pelc​, y ^ i c l e \hat y^{cle}_i y^icle​ 所对应的 E-C pair set 为 P c l e P_{cle} Pcle​,则最终的 E-C pair set P e l c ∪ P c l e P_{elc} \cup P_{cle} PelcPcle

Components of SLSN

image-20211017002411547

SLSN 的两个 subnetworks:

  • the emotion subnetwork (E-net)
    • for the E-LC label prediction
  • the cause subnetwork (C-net)
    • for the C-LE label prediction

E-net 与 C-net 相似的结构:词嵌入(word embedding)、clause encoder、隐层状态学习(hidden state learning

E-net and C-net have similar structures in terms of word embedding, clause encoder, and hidden state learning.

Word Embedding

输入: d = [ c 1 , c 2 , ⋯   , c n ] d = [c_1, c_2, \cdots, c_n] d=[c1,c2,,cn] with c i = [ w i 1 , w i 2 , ⋯   , w i l i ] c_i = [w^1_i, w^2_i, \cdots, w^{li}_i] ci=[wi1,wi2,,wili] c i c_i ci 包含了 l i li li 个词)

输出: v i = [ v 1 i , v i 2 , ⋯   , v i l i ] v_i = [v^i_1, v^2_i, \cdots, v^{li}_i] vi=[v1i,vi2,,vili]

Clause Encoder

结构:word-level Bi-LSTM & Attention

目的:学习从句的表示(learn the representation of clauses)

【以 E-net 为例】

Bi-LSTM 输入: v i = [ v 1 i , v i 2 , ⋯   , v i l i ] v_i = [v^i_1, v^2_i, \cdots, v^{li}_i] vi=[v1i,vi2,,vili]

Bi-LSTM 输出 & Attention 输入: r i = [ r 1 i , r i 2 , ⋯   , r i l i ] r_i = [r^i_1, r^2_i, \cdots, r^{li}_i] ri=[r1i,ri2,,rili]

Attention 将 r i r_i ri 映射至 s i e s^e_i sie 并聚集它们:
u i j = t a n h ( W w r i j + b w ) a i j = e x p ( ( u i j ) T u s ) ∑ t e x p ( ( u i t ) T u s ) s i e = ∑ j a i j r i j u^j_i = tanh(W_w r^j_i + b_w) \\ a^j_i = {{exp((u^j_i)^Tu_s)} \over {\sum_t exp((u^t_i)^Tu_s)}} \\ s^e_i = \displaystyle \sum_j a^j_i r^j_i uij=tanh(Wwrij+bw)aij=texp((uit)Tus)exp((uij)Tus)sie=jaijrij

Hidden State Learning

使用一个 clause-level Bi-LSTM

【以 E-net 为例】

输入: [ s 1 e , s 2 e , ⋯   , s n e ] [s^e_1, s^e_2, \cdots, s^{e}_n] [s1e,s2e,,sne]

输出: [ h 1 e , h 2 e , ⋯   , h n e ] [h^e_1, h^e_2, \cdots, h^{e}_n] [h1e,h2e,,hne]

同理,C-net 输出 [ h 1 c , h 2 c , ⋯   , h n c ] [h^c_1, h^c_2, \cdots, h^{c}_n] [h1c,h2c,,hnc]​​

Local Pair Searcher

在 E-net 与 C-net 中构造对称的结构,以预测每个从句的 local pair labels

【以 E-net 为例】

在 E-net 预测 E-label 的时候,只使用从句的 emotion hidden state,并使用 softmax 进行预测:
y ^ i e = s o f t m a x ( W e h i e + b e ) \hat y^e_i = softmax(W_eh^e_i + b_e) y^ie=softmax(Wehie+be)
在 E-net 预测 LC-label 时:若当前从句非情感从句(即预测当前句子不是 emotion clause),则对应的 LC-label 是个零向量(zero vector);否则,对所有 local context window 中的所有从句预测 LC-label。

LPS 首先计算每个 clause 的 emotion attention ratio λ j \lambda_j λj
γ ( h i e , h j c ) = h i e h j c λ j = e x p ( γ ( h i e , h j c ) ) ∑ j = i − k i + k e x p ( γ ( h i e , h j c ) ) \gamma(h^e_i,h^c_j)=h^e_ih^c_j \\ \lambda_j = {{exp(\gamma(h^e_i,h^c_j))} \over {\sum^{i+k}_{j=i-k}exp(\gamma(h^e_i,h^c_j))}} γ(hie,hjc)=hiehjcλj=j=iki+kexp(γ(hie,hjc))exp(γ(hie,hjc))
其中, γ ( h i e , h j c ) \gamma(h^e_i,h^c_j) γ(hie,hjc)度量 local cause 与 target emotion 之间相关性的 emotion attention function,通过将 emotion hidden state 与 cause hidden state 相乘获得;使用 softmax 计算窗口内的 emotion attention ratio λ j \lambda _j λj λ j \lambda _j λj​ 被用于 scale (不知道 scale 怎么翻译) the origin hidden states:
q j l c = λ j ⋅ h j c q^{lc}_j = \lambda_j \cdot h^c_j qjlc=λjhjc
其中, q j l c q^{lc}_j qjlc 就是第 j 个 local context clause 的 cause hidden state。

作者还使用了一个 local Bi-LSTM 层去学习每个 local context clause 的 contextualized reqresentation:
o j → = L S T M l c → ( q j l c ) , j ∈ [ i − k , i + k ] o j ← = L S T M l c ← ( q j l c ) , j ∈ [ i − k , i + k ] \overrightarrow {o_j} = \overrightarrow {LSTM_{lc}}(q^{lc}_j), j \in [i-k,i+k] \\ \overleftarrow {o_j} = \overleftarrow {LSTM_{lc}}(q^{lc}_j), j \in [i-k,i+k] oj =LSTMlc (qjlc),j[ik,i+k]oj =LSTMlc (qjlc),j[ik,i+k]

最终,将 o j → \overrightarrow {o_j} oj ​ 与 o j ← \overleftarrow {o_j} oj ​ 进行拼接​得到 o j o_j oj,通过 o j o_j oj​​ 预测第 j 个位置的 local context clause 的 LC-label y ^ j l c \hat y ^{lc}_j y^jlc
y ^ j l c = s o f t m a x ( W l c o j + b l c ) \hat y ^{lc} _j = softmax(W_{lc} o_j + b_{lc}) y^jlc=softmax(Wlcoj+blc)

Model Training

E-net 用于预测 E-LC label,C-net 用于预测 C-LE label,因此 SLSN 的 loss function 是两者的加权平均:
L = α L e l c + ( 1 − α ) L c l e L e l c = β L e + ( 1 − β ) L l c L c l e = β L c + ( 1 − β ) L l e L = \alpha L^{elc} + (1 - \alpha) L^{cle} \\ L^{elc} = \beta L^e + (1 - \beta) L^{lc} \\ L^{cle} = \beta L^c + (1 - \beta) L^{le} L=αLelc+(1α)LcleLelc=βLe+(1β)LlcLcle=βLc+(1β)Lle
L e L^e Le​​、 L l c L^{lc} Llc​​、 L c L^c Lc​​、 L l e L^{le} Lle​​ 是预测 E-label y ^ i e \hat y^{e}_i y^ie​​、LC-label y ^ i l c \hat y^{lc}_i y^ilc​​、C-label y ^ i c \hat y^{c}_i y^ic​​、LE-label y ^ i l e \hat y^{le}_i y^ile​​ 的交叉熵损失:
L e = − 1 n ∑ i = 1 n η y i e l o g ( y ^ i e ) L l c = − 1 p e ( 2 k + 1 ) ∑ i = 1 n I ( y ^ i e = 1 ) ∑ j = i − k i + k y j l c l o g ( y ^ j l c ) L c = − 1 n ∑ i = 1 n η y i c l o g ( y ^ i c ) L l e = − 1 p c ( 2 k + 1 ) ∑ i = 1 n I ( y ^ i c = 1 ) ∑ j = i − k i + k y j l e l o g ( y ^ j l e ) L^e = - {1 \over n} \displaystyle \sum^n_{i=1} \eta y^e_i log(\hat y^e_i) \\ L^{lc} = - {1 \over {p^e(2k+1)}} \displaystyle \sum^n_{i=1} I(\hat y^e_i = 1) \sum^{i+k}_{j = i - k} y^{lc}_j log(\hat y^{lc}_j) \\ L^c = - {1 \over n} \displaystyle \sum^n_{i=1} \eta y^c_i log(\hat y^c_i) \\ L^{le} = - {1 \over {p^c(2k+1)}} \displaystyle \sum^n_{i=1} I(\hat y^c_i = 1) \sum^{i+k}_{j = i - k} y^{le}_j log(\hat y^{le}_j) Le=n1i=1nηyielog(y^ie)Llc=pe(2k+1)1i=1nI(y^ie=1)j=iki+kyjlclog(y^jlc)Lc=n1i=1nηyiclog(y^ic)Lle=pc(2k+1)1i=1nI(y^ic=1)j=iki+kyjlelog(y^jle)
其中, I ( ⋅ ) I(\cdot) I() 是个 indicator function。

参考文献

Rui Xia and Zixiang Ding. 2019. Emotion-cause pair extraction: A new task to emotion analysis in texts. In Proceedings ofthe 57th Conference ofthe Association for Computational Linguistics, pages 1003–1012.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值