A Symmetric Local Search Network for Emotion-Cause Pair Extraction
文章目录
Abstract
要解决的问题:Emotion-cause pair extraction (ECPE)
已有的方案:a two-step method (Xia & Ding, 2019)
Xia & Ding (2019) 的缺点:未考虑情感从句与原因从句之间的相关性
To tackle this task, a two-step method was proposed by previous study which first extracted emotion clauses and cause clauses individually, then paired the emotion and cause clauses, and filtered out the pairs without causality.
作者的工作核心:local search
Symmetric Local Search Network (SLSN): perform the detection and matching simultaneously by local search
SLSN 的两个对称的子网络:
- the emotion subnetwork
- the cause subnetwork
两个子网络的组成部分:
- a clause representation learner - a specially-designed cross-subnetwork component
- a local pair searcher (LPS)
SLSN consists of two symmetric subnetworks, namely the emotion subnetwork and the cause subnetwork.
Each subnetwork is composed of a clause representation learner and a local pair searcher. The local pair searcher is a specially-designed cross-subnetwork component which can extract the local emotion-cause pairs.
Introduction
作者认为:模拟人类的行为,应该同时考虑情感/原因的检测与匹配
However, when humans deal with the ECPE task, they usually consider the detection and matching problems at the same time.
local search 的优点:可以忽略距离很远的 emotion-cause pair
The advantage of local search is that the wrong pairs (e.g., (c4, c12)) beyond the local context scope can be avoided. Additionally, when local searching the cause clause corresponding to the target emotion clause, humans not only judge whether the clause is a cause clause, but also consider whether it matches the target emotion clause.
LPS 引入 local context window 以限制 local search 的 context scope
Specifically, the LPS introduces a local context window to limit the scope of context for local search.
Symmetric Local Search Network
Task Definition
每个数据集
D
D
D 中包含了多个文档
d
d
d
d
=
[
c
1
,
c
2
,
⋯
,
c
n
]
d = [c_1,c_2,\cdots, c_n]
d=[c1,c2,⋯,cn]
- 令 c e c^e ce 表示情感从句(emotion clause)
- 令 c c c^c cc 表示原因从句(cause clause)
目标:提取所有的 emotion-cause pairs
P
=
{
⋯
,
(
c
e
,
c
c
)
,
⋯
}
P = \{\cdots, (c^e, c^c), \cdots\}
P={⋯,(ce,cc),⋯}
An Overview of SLSN
- input - a sequence of clauses from a document
- output - the local pair labels for these clauses
对于每个 c i c_i ci,SLSN 预测两种标签(label):
- E-LC label -
y
^
i
e
l
c
\hat y^{elc}_i
y^ielc
- the emotion label (E-label) y ^ i e \hat y^e_i y^ie of the i-th clause
- the local cause labels (LC-label) ( y ^ i − 1 c , y ^ i c , y ^ i + 1 c ) (\hat y^c_{i-1},\hat y^c_i, \hat y^c_{i + 1}) (y^i−1c,y^ic,y^i+1c) of the clauses near the i-th clause
- C-LE label -
y
^
i
c
l
e
\hat y^{cle}_i
y^icle
- the cause label (C-label) y ^ i c \hat y^c_i y^ic of the i-th clause
- the local emotion labels (LE-label) ( y ^ i − 1 e , y ^ i e , y ^ i + 1 e ) (\hat y^e_{i-1},\hat y^e_i, \hat y^e_{i + 1}) (y^i−1e,y^ie,y^i+1e) of the clauses near the i-th clause
令 y ^ i e l c \hat y^{elc}_i y^ielc 所对应的 E-C pair set 为 P e l c P_{elc} Pelc, y ^ i c l e \hat y^{cle}_i y^icle 所对应的 E-C pair set 为 P c l e P_{cle} Pcle,则最终的 E-C pair set 为 P e l c ∪ P c l e P_{elc} \cup P_{cle} Pelc∪Pcle。
Components of SLSN
SLSN 的两个 subnetworks:
- the emotion subnetwork (E-net)
- for the E-LC label prediction
- the cause subnetwork (C-net)
- for the C-LE label prediction
E-net 与 C-net 相似的结构:词嵌入(word embedding)、clause encoder、隐层状态学习(hidden state learning)
E-net and C-net have similar structures in terms of word embedding, clause encoder, and hidden state learning.
Word Embedding
输入: d = [ c 1 , c 2 , ⋯ , c n ] d = [c_1, c_2, \cdots, c_n] d=[c1,c2,⋯,cn] with c i = [ w i 1 , w i 2 , ⋯ , w i l i ] c_i = [w^1_i, w^2_i, \cdots, w^{li}_i] ci=[wi1,wi2,⋯,wili]( c i c_i ci 包含了 l i li li 个词)
输出: v i = [ v 1 i , v i 2 , ⋯ , v i l i ] v_i = [v^i_1, v^2_i, \cdots, v^{li}_i] vi=[v1i,vi2,⋯,vili]
Clause Encoder
结构:word-level Bi-LSTM & Attention
目的:学习从句的表示(learn the representation of clauses)
【以 E-net 为例】
Bi-LSTM 输入: v i = [ v 1 i , v i 2 , ⋯ , v i l i ] v_i = [v^i_1, v^2_i, \cdots, v^{li}_i] vi=[v1i,vi2,⋯,vili]
Bi-LSTM 输出 & Attention 输入: r i = [ r 1 i , r i 2 , ⋯ , r i l i ] r_i = [r^i_1, r^2_i, \cdots, r^{li}_i] ri=[r1i,ri2,⋯,rili]
Attention 将
r
i
r_i
ri 映射至
s
i
e
s^e_i
sie 并聚集它们:
u
i
j
=
t
a
n
h
(
W
w
r
i
j
+
b
w
)
a
i
j
=
e
x
p
(
(
u
i
j
)
T
u
s
)
∑
t
e
x
p
(
(
u
i
t
)
T
u
s
)
s
i
e
=
∑
j
a
i
j
r
i
j
u^j_i = tanh(W_w r^j_i + b_w) \\ a^j_i = {{exp((u^j_i)^Tu_s)} \over {\sum_t exp((u^t_i)^Tu_s)}} \\ s^e_i = \displaystyle \sum_j a^j_i r^j_i
uij=tanh(Wwrij+bw)aij=∑texp((uit)Tus)exp((uij)Tus)sie=j∑aijrij
Hidden State Learning
使用一个 clause-level Bi-LSTM
【以 E-net 为例】
输入: [ s 1 e , s 2 e , ⋯ , s n e ] [s^e_1, s^e_2, \cdots, s^{e}_n] [s1e,s2e,⋯,sne]
输出: [ h 1 e , h 2 e , ⋯ , h n e ] [h^e_1, h^e_2, \cdots, h^{e}_n] [h1e,h2e,⋯,hne]
同理,C-net 输出 [ h 1 c , h 2 c , ⋯ , h n c ] [h^c_1, h^c_2, \cdots, h^{c}_n] [h1c,h2c,⋯,hnc]
Local Pair Searcher
在 E-net 与 C-net 中构造对称的结构,以预测每个从句的 local pair labels
【以 E-net 为例】
在 E-net 预测 E-label 的时候,只使用从句的 emotion hidden state,并使用 softmax 进行预测:
y
^
i
e
=
s
o
f
t
m
a
x
(
W
e
h
i
e
+
b
e
)
\hat y^e_i = softmax(W_eh^e_i + b_e)
y^ie=softmax(Wehie+be)
在 E-net 预测 LC-label 时:若当前从句非情感从句(即预测当前句子不是 emotion clause),则对应的 LC-label 是个零向量(zero vector);否则,对所有 local context window 中的所有从句预测 LC-label。
LPS 首先计算每个 clause 的 emotion attention ratio
λ
j
\lambda_j
λj:
γ
(
h
i
e
,
h
j
c
)
=
h
i
e
h
j
c
λ
j
=
e
x
p
(
γ
(
h
i
e
,
h
j
c
)
)
∑
j
=
i
−
k
i
+
k
e
x
p
(
γ
(
h
i
e
,
h
j
c
)
)
\gamma(h^e_i,h^c_j)=h^e_ih^c_j \\ \lambda_j = {{exp(\gamma(h^e_i,h^c_j))} \over {\sum^{i+k}_{j=i-k}exp(\gamma(h^e_i,h^c_j))}}
γ(hie,hjc)=hiehjcλj=∑j=i−ki+kexp(γ(hie,hjc))exp(γ(hie,hjc))
其中,
γ
(
h
i
e
,
h
j
c
)
\gamma(h^e_i,h^c_j)
γ(hie,hjc) 是度量 local cause 与 target emotion 之间相关性的 emotion attention function,通过将 emotion hidden state 与 cause hidden state 相乘获得;使用 softmax 计算窗口内的 emotion attention ratio
λ
j
\lambda _j
λj,
λ
j
\lambda _j
λj 被用于 scale (不知道 scale 怎么翻译) the origin hidden states:
q
j
l
c
=
λ
j
⋅
h
j
c
q^{lc}_j = \lambda_j \cdot h^c_j
qjlc=λj⋅hjc
其中,
q
j
l
c
q^{lc}_j
qjlc 就是第 j 个 local context clause 的 cause hidden state。
作者还使用了一个 local Bi-LSTM 层去学习每个 local context clause 的 contextualized reqresentation:
o
j
→
=
L
S
T
M
l
c
→
(
q
j
l
c
)
,
j
∈
[
i
−
k
,
i
+
k
]
o
j
←
=
L
S
T
M
l
c
←
(
q
j
l
c
)
,
j
∈
[
i
−
k
,
i
+
k
]
\overrightarrow {o_j} = \overrightarrow {LSTM_{lc}}(q^{lc}_j), j \in [i-k,i+k] \\ \overleftarrow {o_j} = \overleftarrow {LSTM_{lc}}(q^{lc}_j), j \in [i-k,i+k]
oj=LSTMlc(qjlc),j∈[i−k,i+k]oj=LSTMlc(qjlc),j∈[i−k,i+k]
最终,将
o
j
→
\overrightarrow {o_j}
oj 与
o
j
←
\overleftarrow {o_j}
oj 进行拼接得到
o
j
o_j
oj,通过
o
j
o_j
oj 预测第 j 个位置的 local context clause 的 LC-label
y
^
j
l
c
\hat y ^{lc}_j
y^jlc:
y
^
j
l
c
=
s
o
f
t
m
a
x
(
W
l
c
o
j
+
b
l
c
)
\hat y ^{lc} _j = softmax(W_{lc} o_j + b_{lc})
y^jlc=softmax(Wlcoj+blc)
Model Training
E-net 用于预测 E-LC label,C-net 用于预测 C-LE label,因此 SLSN 的 loss function 是两者的加权平均:
L
=
α
L
e
l
c
+
(
1
−
α
)
L
c
l
e
L
e
l
c
=
β
L
e
+
(
1
−
β
)
L
l
c
L
c
l
e
=
β
L
c
+
(
1
−
β
)
L
l
e
L = \alpha L^{elc} + (1 - \alpha) L^{cle} \\ L^{elc} = \beta L^e + (1 - \beta) L^{lc} \\ L^{cle} = \beta L^c + (1 - \beta) L^{le}
L=αLelc+(1−α)LcleLelc=βLe+(1−β)LlcLcle=βLc+(1−β)Lle
L
e
L^e
Le、
L
l
c
L^{lc}
Llc、
L
c
L^c
Lc、
L
l
e
L^{le}
Lle 是预测 E-label
y
^
i
e
\hat y^{e}_i
y^ie、LC-label
y
^
i
l
c
\hat y^{lc}_i
y^ilc、C-label
y
^
i
c
\hat y^{c}_i
y^ic、LE-label
y
^
i
l
e
\hat y^{le}_i
y^ile 的交叉熵损失:
L
e
=
−
1
n
∑
i
=
1
n
η
y
i
e
l
o
g
(
y
^
i
e
)
L
l
c
=
−
1
p
e
(
2
k
+
1
)
∑
i
=
1
n
I
(
y
^
i
e
=
1
)
∑
j
=
i
−
k
i
+
k
y
j
l
c
l
o
g
(
y
^
j
l
c
)
L
c
=
−
1
n
∑
i
=
1
n
η
y
i
c
l
o
g
(
y
^
i
c
)
L
l
e
=
−
1
p
c
(
2
k
+
1
)
∑
i
=
1
n
I
(
y
^
i
c
=
1
)
∑
j
=
i
−
k
i
+
k
y
j
l
e
l
o
g
(
y
^
j
l
e
)
L^e = - {1 \over n} \displaystyle \sum^n_{i=1} \eta y^e_i log(\hat y^e_i) \\ L^{lc} = - {1 \over {p^e(2k+1)}} \displaystyle \sum^n_{i=1} I(\hat y^e_i = 1) \sum^{i+k}_{j = i - k} y^{lc}_j log(\hat y^{lc}_j) \\ L^c = - {1 \over n} \displaystyle \sum^n_{i=1} \eta y^c_i log(\hat y^c_i) \\ L^{le} = - {1 \over {p^c(2k+1)}} \displaystyle \sum^n_{i=1} I(\hat y^c_i = 1) \sum^{i+k}_{j = i - k} y^{le}_j log(\hat y^{le}_j)
Le=−n1i=1∑nηyielog(y^ie)Llc=−pe(2k+1)1i=1∑nI(y^ie=1)j=i−k∑i+kyjlclog(y^jlc)Lc=−n1i=1∑nηyiclog(y^ic)Lle=−pc(2k+1)1i=1∑nI(y^ic=1)j=i−k∑i+kyjlelog(y^jle)
其中,
I
(
⋅
)
I(\cdot)
I(⋅) 是个 indicator function。
参考文献
Rui Xia and Zixiang Ding. 2019. Emotion-cause pair extraction: A new task to emotion analysis in texts. In Proceedings ofthe 57th Conference ofthe Association for Computational Linguistics, pages 1003–1012.