阅读《SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge》

SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge

Abstract

现有的预训练语言表征模型大多忽略了文本的语言知识,而文本的语言知识可以促进自然语言处理任务中的语言理解。为了便于情感分析的下游任务,我们提出了一种新的语言表示模型,称为SentiLARE,该模型将包括词性标签(POS tag)和情感极性(sentiment polarity, 从SentiWordNet推断)在内的词级语言知识引入到预训练模型中。首先,我们提出了一种上下文感知情感注意机制来获取情感极性,同时通过查询SentiWordNet来获取每个词的词性标签。然后,我们设计了一个新的预训练任务——label-aware masked language来构建知识感知的语言表示。实验表明,SentiLARE在各种情感分析任务上获得了最新的性能。

Model

我们的任务是定义如下:给定一个文本序列 X = ( x 1 , x 2 , ⋅ ⋅ ⋅ , x n ) X = (x_1, x_2, · · · , x_n) X=(x1,x2,,xn)长度为n的,我们的目标是获得整个序列的表示 H = ( h 1 , h 2 , ⋅ ⋅ ⋅ , h n ) ㄒ ∈ R n × d H = (h_1, h_2 , · · · , h_n)^ㄒ∈R^{n×d} H=(h1,h2,,hn)Rn×d, 抓住了上下文信息和语言知识, d d d表示向量的维数表示。

图1给出了我们模型的概述,该模型包括两个步骤:
1)Acquiring the partof-speech tag and the sentiment polarity for each word;
2)Conducting pre-training via label-aware masked language model, which contains two pretraining sub-tasks, i.e., early fusion and late supervision.
与现有的BERT-style预训练模型相比,该模型利用部分词性标签和情感极性等语言知识丰富输入序列,并利用label-aware masked language model捕捉句子级语言表示和词级语言知识之间的关系。

Linguistic Knowledge Acquisition

input: a text sequence X = ( x 1 , x 2 , ⋅ ⋅ ⋅ , x n ) X = (x_1, x_2, · · · , x_n) X=(x1,x2,,xn), x i ( 1 ≤ i ≤ n ) x_i(1 ≤ i ≤ n) xi(1in) indicates a word in the vocabulary
Stanford Log-Linear Part-of-Speech Tagger: get part-of-speech tag p o s i pos_i posi of each word x i x_i xi, for simplicity, 只考虑五个tag: verb ( v ) (v) (v), noun ( n ) (n) (n), adjective ( a ) (a) (a), adverb ( r ) (r) (r), and others ( o ) (o) (o)
SentiWordNet: ( x i , p o s i ) (x_i, pos_i) (xi,posi) —> m个different p o l a r i polar_i polari

  1. each of which contains a sense number, a positive / negative score, and a gloss ( S N i ( j ) , P s c o r e i ( j ) / N s c o r e i ( j ) , G i ( j ) ) (SN_i^{(j)} , P_{score_i}^{(j)}/N_{score_i}^{(j)} , G^{(j)}_i ) (SNi(j),Pscorei(j)/Nscorei(j),Gi(j)), 1 ≤ j ≤ m

  2. ( S N SN SN表示各个sense的排名, P s c o r e i ( j ) / N s c o r e i ( j ) P_{score_i}^{(j)}/N_{score_i}^{(j)} Pscorei(j)/Nscorei(j)表示由SentiWordNet得到的positive/negative得分, G i ( j ) G^{(j)}_i Gi(j)代表每种sense的定义)

  3. 受SentiWordNet的启发,我们提出了一种情境感知的注意机制,该机制同时考虑了sense排名和context-gloss相似性来确定每个sense的注意权重:
    α i ( j ) = s o f t m a x ( 1 S N i ( j ) ⋅ s i m ( X , G i ( j ) ) ) α_i^{(j)} = sof tmax( \frac{1}{SN^{(j)}_i} · sim(X, G^{(j)}_i)) αi(j)=softmax(SNi(j)1sim(X,Gi(j)))

  • 1 S N i ( j ) \frac{1}{SN^{(j)}_i} SNi(j)1近似于sense频率的影响,因为sense等级越小,表示自然语言中使用该sense的频率越高
  • s i m ( X , G i ( j ) ) ) sim(X, G^{(j)}_i)) sim(X,Gi(j)))表示上下文与gloss of each sense之间的文本相似性,是无监督词义消歧中常用的一个重要特征。为了计算 X X X G ( j ) G^{(j)} G(j)的相似度, 我们用Sentence-BERT(SBERT)对它们进行编码,它实现了语义文本相似度任务的最新性能,并得到向量之间的余弦相似度:
    s i m ( X , G i ( j ) ) = c o s ( S B E R T ( X ) , S B E R T ( G i ( j ) ) ) sim(X, G^{(j)}_i ) = cos(SBERT(X), SBERT(G^{(j)}_i )) sim(X,Gi(j))=cos(SBERT(X),SBERT(Gi(j)))

Obtain the attention weight of eachsense: 计算每个 ( x i , p o s i ) (x_i, pos_i) (xi,posi)对的情感得分, by simply weighting the scores of all the senses:
s ( x i , p o s i ) = ∑ j = i m α i ( j ) ( P s c o r e i ( j ) − N s c o r e i ( j ) ) s(x_i, pos_i) = \sum\limits^m_{j=i}α_i^{(j)}(P_{score_i}^{(j)} − N_{score_i}^{(j)}) s(xi,posi)=j=imαi(j)(Pscorei(j)Nscorei(j))

Finally, the word-level sentiment polarity p o l a r i polari polari for the pair ( x i , p o s i ) (x_i, pos_i) (xi,posi) can be assigned with P o s i t i v e / N e g a t i v e / N e u t r a l Positive/Negative/Neutral Positive/Negative/Neutral when s ( x i , p o s i ) s(x_i, pos_i) s(xi,posi) is p o s i t i v e / n e g a t i v e / z e r o positive / negative / zero positive/negative/zero, respectively. Note that if we cannot find any sense for ( x i , p o s i ) (xi , posi) (xi,posi) in SentiWordNet, p o l a r i polari polari is assigned with N e u t r a l Neutral Neutral.

Pre-training Task

上面得到了knowledge enhanced text sequence X k = ( x i , p o s i , p o l a r i ) i = 1 n X_k = {(x_i, pos_i, polar_i)^n_{i=1}} Xk=(xi,posi,polari)i=1n

我们设计了一种新的有监督的训练前任务,称为label-aware masked language model(LA-MLM),该方法在预训练阶段引入句子级情感标签 l l l,捕捉句子级语言表征与单个词之间的依赖关系。它包含两个独立的子任务:早期融合和后期监督(early fusion and late supervision)。

Early Fusion

( h c l s E F , h 1 E F , . . . , h n E F , h s e p E F ) = T r a n s f o r m e r ( X ^ k , l ) (h^{EF}_{cls} , h^{EF}_{1}, ..., h^{EF}_{n}, h^{EF}_{sep} ) = Transformer( \hat X_k, l) (hclsEF,h1EF,...,hnEF,hsepEF)=Transformer(X^k,l)
X ^ \hat{X} X^包含:

  • embeddingused in BERT
  • the part-ofspeech (POS) embedding
  • word-level polarity embedding

模型需要分别预测掩码位置的词、词性标签和词级极性

Late Supervision

基于[CLS]和掩码位置的隐藏状态,让模型预测句子级标签和单词信息
( h c l s L S , h 1 L S , . . . , h n L S , h s e p L S ) = T r a n s f o r m e r ( X ^ k , l ) (h^{LS}_{cls} , h^{LS}_{1}, ..., h^{LS}_{n}, h^{LS}_{sep} ) = Transformer( \hat X_k, l) (hclsLS,h1LS,...,hnLS,hsepLS)=Transformer(X^k,l)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值