An Interactive Multi-Task Learning Network for End-to-EndAspect-Based Sentiment Analysis(20.11.09)

fuchengguo666

已于 2022-03-04 10:29:32 修改

阅读量367

点赞数

文章标签： nlp

于 2020-11-11 15:22:47 首次发布

本文链接：https://blog.csdn.net/fuchengguo666/article/details/109572205

版权

An Interactive Multi-Task Learning Network for End-to-EndAspect-Based Sentiment Analysis

端到端基于方面情感分析的交互式多任务学习网络

Abstract

This task is usually done in a pipeline manner, with aspect term extraction performed first, followed by sentiment predictions toward the extracted aspect terms.
这项任务通常以流水线的方式完成，首先执行方面术语提取，然后对提取的方面术语进行情感预测。
While easier to develop, such an approach does not fully exploit joint information from the two sub-tasks and does not use all available sources of training information that might be helpful, such as document-level labeled sentiment corpus.
虽然这样的方法更容易开发，但这种方法并没有充分利用来自两个子任务的联合信息，也没有使用所有可能有用的训练信息来源，例如文档级标记的情感语料库。
In this paper, we propose an interactive multi-task learning network (IMN) which is able to jointly learn multiple related tasks simultaneously at both the token level as well as the document level.
在本文中，我们提出了一个交互式多任务学习网络(IMN)，它能够同时在token级和文档document级联合学习多个相关任务。
Unlike conventional multi-task learning methods that rely on learn-ing common features for the different tasks,IMN introduces a message passing architecture where information is iteratively passed to different tasks through a shared set of latent variables.
与传统的多任务学习方法依赖于学习不同任务的共同特征不同，IMN引入了一种消息传递体系结构，通过一组共享的潜在变量将信息迭代地传递给不同的任务。
Experimental results demonstrate superior performance of the proposed method against multiple baselines on three benchmark datasets.
在三个基准数据集上的实验结果表明，该方法在多基线的情况下具有较好的性能。

三、Proposed Method

The IMN architecture is shown in Figure 1. It accepts a sequence of tokens{x1,…, xn} as in-put into a feature extraction component fθs that is shared among all tasks.
IMN架构如图1所示。它接受一系列tokens{x1，…，xn}作为特征提取组件fθs的输入，该组件在所有任务之间共享。
This component consists of a word embedding layer followed by a few feature extraction layers.
该组件由单词嵌入层和几个特征提取层组成。
Specifically, we employ m^s layers of CNNs after the word embedding layer in fθs.
具体地说，我们在fθs中的单词嵌入层之后使用了m^s层的CNN。
The output of fθs is a sequence of latent vectors{hs1, hs2, …, hsn} shared among all the tasks.
Fθs的输出是所有任务共享的潜在向量{hs1，hs2，…，hsn}的序列。
After initialization by fθs, this sequence of latent vectors is later updated by combining information propagated from different task components through message passing.
在由fθs进行初始化之后，该潜在向量序列随后通过组合通过消息传递从不同任务组件传播的信息来更新。
We denote hi^s(t) as the value of the shared latent vector corresponding to xi after t rounds of message passing, with hi^s(0) denoting the value after initialization.
我们将hi^s(T)表示为t轮消息传递后Xi对应的共享潜在向量的值，hi^s(0)表示初始化后的值。
The sequence of shared latent vectors³ {hs1, hs2,…, hsn} is used as input to the different task-specific components.Each task-specific component has its own sets of latent and output variables. The output variables correspond to a label sequence in a sequence tagging task; in AE, we assign to each token a label indicating whether it belongs to any aspect or opinion⁴ term, while in AS, we label each word with its sentiment.
共享潜在向量序列³{HS1，HS2，…，HSn}用作不同任务特定组件的输入。每个特定于任务的组件都有自己的潜在变量和输出变量集。输出变量对应于序列标签任务中的标签序列；在AE中，我们为每个令牌分配一个标签，表明它是否属于任何方面或观点⁴术语，而在AS中，我们为每个单词加上它的情感标签。
In a classification task, the output corresponds to the label of the input instance: the sentiment of the document for the sentiment classification task(DS), and the domain of the document for the domain classification task (DD).
在分类任务中，输出对应于输入实例的标签：情感分类任务(DS)的文档的情感，以及领域分类任务(DD)的文档域。
At each iteration, appropriate information is passed back to the shared latent vectors to be combined; this could be the values of the output variables or the latent variables, depending on the task. In addition, we also allow messages to be passed between the components in each iteration.
在每次迭代中，适当的信息被传递回共享的潜在向量以进行组合；这可以是输出变量的值，也可以是潜在变量的值，具体取决于任务。此外，我们还允许在每次迭代中在组件之间传递消息。
Specifically for this problem, we send information from the AE task to the AS task as shown in Figure 1. After T iterations of message passing, which allows information to be propagated through multiple hops, we use the values of the output variables as predictions. For this problem, we only use the outputs for AE and AS during inference as these are the end-tasks, while the other tasks are only used for training. We now describe each component and how it is used in learning and inference.
专门针对这个问题，我们将信息从AE任务发送到AS任务，如图1所示。在消息传递的T次迭代(允许通过多跳传播信息)之后，我们使用输出变量的值作为预测。对于这个问题，我们只使用AE和AS在推理过程中的输出，因为这些都是最终任务，而其他任务只用于训练。现在我们描述每个组件，以及它是如何用于学习和推理的。

在这里插入图片描述

3.1 Aspect-Level Tasks
-1)AE aims to extract all the aspect and opinion terms^5 appearing in a sentence, which is formulated as a sequence tagging problem with the BIO tagging scheme.
AE的目的是提取出现在句子中的所有方面和观点用语^ 5，用BIO标记方案将其表达为序列标记问题。
-2)Specifically, we use five class labels: Y^ae={BA,IA,BP,IP,O}, indicating the beginning of and inside of an aspect term, the beginning of and inside of an opinion term, and other words, respectively.具体地说，我们使用五个类别标签：
Y^ae={BA，IA，BP，IP，O}，分别表示方面项的开始和内部，意见项的开始和内部，以及其他单词。
-3)We also formulate AS as a sequence tagging problem with labels Y^as={pos, neg, neu}, indicating the token-level positive, negative, and neutral sentiment orientations.
我们还将AS描述为一个序列标注问题，标签Y^AS={pos，neg，neu}，表明了token级的积极、消极和中性情绪取向。
-4)Table 1 shows an example of aspect-level training instance with gold AE and AS labels.In aspect-level datasets, only aspect terms get sentiment annotated. Thus, when modeling AS as a sequence tagging problem, we label each token that is part of an aspect term with the sentiment label of the corresponding aspect term.
表1显示了一个带有黄金AE和AS标签的方面级别训练实例。在方面级别数据集中，只有方面术语会得到情感标注。因此，当建模为序列标注问题时，我们用对应的方面项的情感标签来标记作为方面项的一部分的每个token。
-5)For example, as shown in Table 1, we label “fish” as pos, and label “variety”, “of”, “fish” as neg, based on the gold sentiment labels of the two aspect terms“fish” and “ varity of fish” respectively. Since other tokens do not have AS gold labels, we ignore the predictions on them when computing the training loss for AS.
例如，如表1所示，我们分别基于两个方面术语“fish”和“variety”的黄金情感标签，将“fish”标记为POS，并将“variety”、“of”、“fish”标记为否定。由于其他token没有AS金标签，因此在计算AS的训练损失时，我们忽略了对它们的预测。
-6)The AE component fθae is parameterized by θae and outputs{ˆy1^ae,…,ˆynae}.The AS component fθ as is parameterized by θas and outputs{ˆy1^as,…,ˆynas}. The AE and AS encoders consist of m^ae and m^as layers of CNNs respectively, and they map the shared representations to{h1^ae,h2ae,…,hn^ae}and{h1as,h2^as,…,hnas} respectively. For the AS encoder, we employ an additional self-attention layer on top of the stackedCNNs. As shown in Figure 1, we make ˆyi^ae, the outputs from AE available to AS in the self-attention layer, as the sentiment task could benefit from knowing the predictions of opinion terms.Specifically, the self-attention matrix A∈Rn×n is computed as follows.
AE分量fθae由θae参数化并输出{y1 ^ae，…，yn ^ae}。AS分量fθ_as由θ_as参数化并输出{y1 ^as，…，yn ^as}。 AE和AS编码器分别由CNN的m ^ae和m ^as层组成，它们将共享表示映射到{h1 ^ ae^，h2 ^ ae^，…，hn ^ae}和{h1 ^as，h2 ^as，…，hn ^as}。对于AS编码器，我们在stackedCNN的顶部采用了一个额外的自我注意层。如图1所示，我们将AE的输出``yi ^ae’'提供给自我注意层中的AS，因为情感任务可以从了解意见项的预测中受益。特别是，自我注意矩阵A∈Rn ×n计算如下:
-7)where the first term in Eq.(1) indicates the semantic relevance between hi^as and hj^as with parameter matrix W^as, the second term is a distance-relevant factor, which decreases with increasing distance between the ith token and the jth token, and the third term Pj^op denotes the predicted probability that the jth token is part of any opinion term.
式(1)中的第一项表示具有参数矩阵W^as的hi^as和hj^as之间的语义相关性，第二项是距离相关因子，其随着第i个token和第j个token之间的距离的增加而减小，而第三项pj^op表示第j个token是任何意见项的一部分的预测概率。
-7)The probability Pj^op can be computed by summing the predicted probabilities on opinion-related labels BP and IP in yj^ae. In this way, AS is directly influenced by the predictions of AE. We set the diagonal elements in A to zeros, as we only consider context words for inferring the sentiment of the target token. The self-attention layer outputs hi^′as=∑n(j=1) Aijhj^as.
概率PJ^OP可以通过将YJ^AE中意见相关标签BP和IP上的预测概率相加来计算。这样，AS就直接受到AE预测的影响。我们将A中的对角线元素设置为零，因为我们只考虑上下文词语来推断目标标记的情感。自我关注层输出hi^‘as=∑n(j=1)aijhj^as。
-8)In AE, we concatenate the word embedding, the initial shared representation hi^s(0), and the task-specific representation hi^ae as the final representation of the ith token.In AS, we concatenate hi^s(0) and hi^′as as the final representation. For each task, we employ a fully-connected layer with softmax activation as the decoder, which maps the final token representation to probability distribution yi^ae(yi^as).
在AE中，我们将单词嵌入，初始共享表示hi ^s(0)和特定于任务的表示hi ^ae作为ith token的最终表示进行连接。在AS中，我们将hi ^s(0)进行连接。 hi^s(0)和hi ^'as作为最终表示形式。对于每个任务，我们采用具有softmax激活的完全连接层作为解码器，该层将最终token表示映射到概率分布yi ^ae(yi ^as)。
3.2 Document-Level Tasks
-1)To address the issue of insufficient aspect-level training data, IMN is able to exploit knowledge from document-level labeled sentiment corpora, which are more readily available.We introduce two document-level classification tasks to be jointly trained with AE and AS. One is document-level sentiment classification (DS), which pre-dicts the sentiment towards an input document.The other is document-level domain classification(DD), which predicts the domain label of an input document.
针对语义层训练数据不足的问题，IMN能够利用文档级情感标注语料库中更容易获得的知识，引入两个文档级分类任务，分别与AE和AS进行联合训练。一种是文档级情感分类(DS)，它预测对输入文档的情感；另一种是文档级领域分类(DD)，它预测输入文档的领域标签。
-2)As shown in Figure 1, the task-specific operation f_θo consists of m^o layers of CNNs that map the shared representations {h1^s,…,hn^s} to {h1^o,…,hn^o}, an attention layer att^o, and a decoding layer dec^o, where o∈{ds, dd} is the task symbol. The attention weight is computed as:
如图1所示，特定于任务的操作f_θo由m^o层CNN组成，它们将共享表示{h1^s，…，hn^s}映射到{h1^o，…，hn^o}，一个关注层att^o，以及一个解码层dec^o，其中o∈{ds，dd}是任务符号。注意力权重的计算公式为:

-3)where W^o is a parameter vector. The final document representation is computed as h^o=∑n(i=1)ai^ohi^o. We employ a fully-connected layer with softmax activation as the decoding layer, which maps h^o to y^o.
其中W^o是参数向量。最终文档表示的计算公式为h^o=∑n(i=1)ai^ohi^o。我们使用一个具有Softmax激活的全连通层作为解码层，它将h^o映射到y^o。
3.3 Message Passing Mechanism
-1)To exploit interactions between different tasks, the message passing mechanism aggregates predictions of different tasks from the previous iteration, and uses this knowledge to update the shared latent vectors {h1^s,…,hn^s}at the current iteration. Specifically, the message passing mechanism integrates knowledge from yi^ae, yi^as, y^ds, ai^ds, and ai^dd computed on an input{x1,…,xn}, and the shared hidden vector hi^s is updated as follows:
为了利用不同任务之间的交互，消息传递机制聚合来自上一次迭代的不同任务的预测，并使用该知识来更新当前迭代中的共享潜在向量{h1^s，…，hn^s}。具体地说，消息传递机制集成了来自yi^ae、yi^as、y^ds、ai^ds和ai^dd在输入{x1，…，xn}上计算的知识，并且共享隐藏向量hi^s更新如下：

-2)where t >0 and[:] denotes the concatenation operation. We employ a fully-connected layer with ReLu activation as the re-encoding function f_θre. To update the shared representations, we incorporate yi^ae(t−1) and yi^as(t−1), the outputs of AE and AS from the previous iteration, such that these information are available for both tasks in current round of computation.
其中t>0和[:]表示串联操作。我们使用具有RELU激活的全连接层作为重新编码函数f_θRe。为了更新共享表示，我们合并了来自上一次迭代的AE和AS的输出–yi^ae(t−1)和yi^as(t−1)，这样在本轮计算中这两个任务都可以使用这些信息。
-3)We also incorporate information from DS and DD. y^ds indicates the overall sentiment of the input sequence, which could be helpful for AS. The attention weights ai^ds and ai^dd generated by DS and DD respectively reflect how sentiment-relevant and domain-relevant the ith to-ken is. A token that is more sentiment-relevant or domain-relevant is more likely to be an opinion word or aspect word. This information is useful for the aspect-level tasks.
我们还结合了来自DS和DD的信息。y^DS表示输入序列的整体情绪，这可能对AS有帮助。ds和dd产生的注意权重ai^ds和ai^dd分别反映了第i个与ken的情感相关和领域相关程度。与情绪相关或领域相关的标记更有可能是观点词或方面词。此信息对于方面级任务很有用。
3.4 Learning
-1)Instances for aspect-level problems only have aspect-level labels while instances for document-level problems only have document labels. IMN is trained on aspect-level and document-level in-stances alternately.
方面级问题的实例只有方面级标签，而文档级问题的实例只有文档标签。 IMN在方面级实例和文档级实例上交替训练。
-2)When trained on aspect-level instances, the loss function is as follows:
在方面级实例上训练时，损耗函数如下：

-3)where T denotes the maximum number of iterations in the message passing mechanism, Na denotes the total number of aspect-level training instances, ni denotes the number of tokens contained in the ith training instance, and yi,j^ae(yi,j^as)denotes the one-hot encoding of the gold label for AE (AS).
其中，T表示消息传递机制中的最大迭代次数，Na表示方面级训练实例的总数，ni表示第i个训练实例中包含的token的数量，yi，j^ae(yi，j^as)表示AE(AS)的黄金标签的一次热编码。
-4)l is the cross-entropy loss applied to each token. In aspect-level datasets, only aspect terms have sentiment annotations. We label each token that is part of any aspect term with the sentiment of the corresponding aspect term. During model training, we only consider AS predictions on these aspect term-related tokens for computing the AS loss and ignore the sentiments predicted on other tokens.
L是应用于每个token
的交叉熵损失。在方面级别的数据集中，只有方面术语有情感标注。我们用相应的体项的情感来标记作为任何体项的一部分的每个记号。在模型训练过程中，我们只考虑作为对这些方面术语相关标记的预测来计算AS损失，而忽略了对其他标记的情感预测。
-5)When trained on document-level instances, we minimize the following loss:
在文档级实例上进行培训时，我们将以下损失降至最低：

-6)where N_ds and N_dd denote the number of training instances for DS and DD respectively, and yi^ds and yi^dd denote the one-hot encoding of the gold label. Message passing iterations are not used when training document-level instances
其中N_ds和N_dd分别表示ds和DD的训练实例数，yi^ds和yi^dd表示金标签的一次热编码。训练文档级实例时不使用消息传递迭代。
-7)For learning, we first pretrain the network on the document-level instances (minimizeLd) for a few epochs, such that DS and DD can make reasonable predictions. Then the network is trained on aspect-level instances and document-level instances alternately with ratio r, to minimize La and Ld.
为了学习，我们首先在文档级实例(MinizeLd)上预先训练网络几个纪元，这样DS和DD就可以做出合理的预测。然后，网络以比例r交替对方面级实例和文档级实例进行训练，以最小化La和Ld。
-8)The overall training process is given in Algorithm 1. D^a,D^ds, and D^dd denote the aspect-level training set and the training sets for DS, DD respectively. D^ds and D^a are from similar domains. D^dd contains review documents from at least two domains with yi^ds denoting the domain label, where one of the domains is similar to the domains of D^a and D^ds.
算法1给出了整个训练过程，D^a、D^DS和D^dd分别表示方面级训练集和DS、DD的训练集。 D^ds和D^a来自相似的域。 D^dd包含至少两个域的审阅文档，其中一个域与D^a和D^ds的域相似，其中一个域表示域标签。
-9)In this way, linguistic knowledge can be transferred from DS and DD to AE and AS, as they are semantically relevant. We fix θ_ds and θ_dd when updating parameters forLa, since we do not want them to be affected by the small number of aspect-level training instances.
这样，由于语言知识在语义上相关，因此可以将它们从DS和DD转移到AE和AS。在更新La的参数时，我们固定θ_ds和θ_dd，因为我们不希望它们受到少数方面的训练实例的影响。

四、Experiments

4.1 Experimental Settings

We run experimentson three benchmark datasets, taken from Se-mEval2014 (Pontiki et al., 2014) and SemEval2015 (Pontiki et al., 2015).
We use twodocument-level datasets from (He et al., 2018b).One is from the Yelp restaurant domain, and theother is from the Amazon electronics domain.
We use the concatenation of the two datasets with domain labels as D^dd
我们使用带有域标签的两个数据集的连接作为D^dd。
We use the Yelp dataset as D^ds when D^a is either D1 or D3, and use the electronics dataset as D^ds when D^a is D2.

当D^a为D1或D3时，我们将Yelp数据集用作D^DS，当D^a为D2时，我们将电子数据集用作D^DS

4.3 Result and Analysis
-.Table 3 shows the comparison results. Note that IMN performs co-extraction of aspect and opinion terms in AE, which utilizes additional opinion term labels during training, while the baseline methods except CMLA do not consider this information in their original models.
表3示出了比较结果。请注意，IMN在AE中执行方面和观点术语的共提取，在训练过程中会利用其他观点术语标签，而除CMLA之外的基准方法在其原始模型中未考虑此信息。
-.From Table 3, we observe that IMN−d is able to significantly outperform other baselines on F1-I.IMN further boosts the performance and outperforms the best F1-I results from the baselines by 2.29%, 1.77%, and 2.61% on D1, D2, and D3.
从表3中我们观察到，IMN−d在F1-I上的表现明显优于其他基线。IMN进一步提升了性能，并在D1、D2和D3上分别比基线上最好的F1-I结果高出2.29%、1.77%和2.61%。
-.Specifically, for AE (F1-a and F1-o), IMN−d performs the best in most cases. For AS (acc-s and F1-s), IMN outperforms other methods by large margins. PIPELINE, IMN−d, and the pipeline methods with dTrans also perform reasonably well on this task, outperforming other baselines by moderate margins.
具体地说，对于AE(F1-a和F1-o)，IMN−d在大多数情况下执行得最好。对于AS(acc-s和f1-s)，IMN比其他方法有更大的优势。管道、imn−d和使用dTrans的管道方法在这项任务上的表现也相当不错，以适度的优势超过了其他基线。
-.All these models utilize knowledge from larger corpora by either joint training of document-level tasks or using domain-specific embeddings. This suggests that domain-specific knowledge is very helpful, and both joint training and domain-specific embeddings are effective ways to transfer such knowledge.
所有这些模型都利用了来自较大语料库的知识，要么联合训练文档级任务，要么使用特定于领域的嵌入。这表明特定领域的知识非常有帮助，联合训练和特定领域的嵌入都是传递此类知识的有效方式。
-.We also show the results of IMN−d and IMN when only the general-purpose embeddings (with-out domain-specific embeddings) are used for initialization.
我们还给出了当只使用通用嵌入(没有域特定嵌入)进行初始化时，IMN、−d和IMN的结果。
-.They are denoted as IMN−d/IMN woDE. IMN wo DE performs only marginally below IMN. This indicates that the knowledge captured by domain-specific embeddings could be similar to that captured by joint training of the document-level tasks.
它们被表示为IMN-d / IMN woDE。 IMN woDE的表现仅次于IMN。这表明通过特定领域的嵌入捕获的知识可能与通过联合培训文档级任务捕获的知识相似。
-.IMN−d is more affected without domain-specific embeddings, while it still outperforms all other baselines except DECNN-dTrans. DECNN-dTrans is a very strong baseline as it exploits additional knowledge from larger corpora for both tasks. IMN−d wo DE is competitive with DECNN-dTrans even without utilizing additional knowledge, which suggests the effectiveness of the proposed network structure.
在没有特定领域嵌入的情况下，IMN−d受到的影响更大，但它的表现仍然优于除DECNN-dTrans之外的所有其他基线。 DECNN-dTrans是一个非常强大的基线，因为它利用了来自更大语料库的额外知识来完成这两项任务。即使在不利用额外知识的情况下，IMN−d wo DE与DECNN-dTrans相比也具有竞争力，这表明了所提出的网络结构的有效性。
Ablation study. 消融学习
-.To investigate the impact of different components, we start with a vanilla model which consists of fθs, fθae, and fθ as only without any informative message passing, and add other components one at a time.
为了研究不同组件的影响，我们从仅由fθs，fθae和fθ组成的原始模型开始，仅传递了任何信息性消息，并一次添加一个其他组件。

-.Table 4 shows the results of different model variants.表4显示了不同模型变量的结果。

+Opinion transmission denotes the operation of providing additional information Pj^op to the self-attention layer as shown in Eq.(1).
+opinion transmission 表示如等式（1）所示向自注意层提供附加信息 Pj^op的操作。
+Message passing-a denotes propagating the outputs from aspect-level tasks only at each message passing iteration.
+Message passing-a表示仅在每次消息传递迭代时传播来自方面级任务的输出。
+DS and +DD denote adding DS and DD with parameter sharing only.
+DS和+DD表示仅添加具有参数共享的DS和DD。
+Message passing-d denotes involving the document-level information for message passing.
+Message passing-d表示涉及用于消息传递的文档级信息。
We observe that +Message passing-a and+Message passing-d contribute to the performance gains the most, which demonstrates the effectiveness of the proposed message passing mechanism.
我们观察到+Message passing-a和+Message passing-d对性能提升的贡献最大，这证明了所提出的消息传递机制的有效性。
We also observe that simply adding document-level tasks (+DS/DD) with parameter sharing only marginally improves the performance of IMN^−d.
我们还观察到，简单地添加带有参数共享的文档级任务(+DS/DD)只能略微提高imn^−d的性能。
This again indicates that domain-specific knowledge has already been captured by domain embeddings, while knowledge obtained from DD and DS via parameter sharing could be redundant in this case.
这再次表明，领域嵌入已经捕获了特定领域的知识，而在这种情况下，通过参数共享从DD和DS获得的知识可能是多余的。
However, +Message passing-d is still helpful with considerable performance gains, showing that aspect-level tasks can benefit from knowing predictions of the relevant document-level tasks.
但是，+ Message passing -d仍然可以显着提高性能，这表明方面级任务可以从了解相关文档级任务的预测中受益。
Impact of T
We have demonstrated the effective-ness of the message passing mechanism. Here, we investigate the impact of the maximum number of iterations T. Table 6 shows the change of F1-I on the test sets as T increases. We find that convergence is quickly achieved within two or three iterations, and further iterations do not provide considerable performance improvement.
我们已经证明了消息传递机制的有效性。在这里，我们研究最大迭代次数T的影响。表6显示了随着T的增加，F1-I对测试集的变化。我们发现，可以在两到三个迭代中快速实现收敛，并且进一步的迭代不会显着提高性能。
Case study
-.To better understand in which conditions the proposed method helps, we examine the instances that are misclassified by PIPELINE and INABSA, but correctly classified by IMN.
为了更好地理解所提出的方法在哪些条件下提供帮助，我们检查了PIPELINE和INABSA错误分类但IMN正确分类的实例
-.For aspect extraction, we find the message passing mechanism is particularly helpful in two scenarios.对于方面提取，我们发现消息传递机制在两种情况下特别有用。
First, it helps to better recognize uncommon aspect terms by utilizing information from the opinion contexts. As shown in example 1 in Table 5, PIPELINE and INABSA fail to recognize “build” as it is an uncommon aspect term in the training set while IMN is able to correctly recognize it.
首先，它有助于利用意见语境中的信息更好地识别不常见的方面术语。如表5中的示例1所示，PIPELINE和INABSA无法识别“构建”，因为它是训练集中不常见的方面术语，而IMN能够正确识别它。
We find that when no message passing iteration is performed, IMN also fails to recognize “build”.
我们发现，当没有执行消息传递迭代时，IMN也无法识别“构建”。
However, when we analyze the predicted sentiment distribution on each token in the sentence, we find that except “durability”, only “build” has a strong positive sentiment, while the sentiment distributions on the other tokens are more uniform. This is an indicator that “build” is also an aspect term. IMN is able to aggregate such knowledge with the message passing mechanism, such that it is able to correctly recognize “build” in later iterations.
但是，当我们分析句子中每个标记的预期情绪分布时，我们发现除了“耐用性”之外，只有“构建”具有很强的积极情绪，而其他标记上的情绪分布则更为统一。这表明“构建”也是一个方面术语。 IMN能够利用消息传递机制来汇总此类知识，从而能够在以后的迭代中正确识别“构建”。
Due to the same reason, the message passing mechanism also helps to avoid extracting terms on which no opinion is expressed.As observed in example 2, both PIPELINE andINABSA extract “Pizza”. However, since no opinion is expressed in the given sentence, “Pizza” should not be considered as an aspect term. IMN avoids extracting this kind of terms by aggregating knowledge from opinion prediction and sentiment prediction.
由于相同的原因，消息传递机制还有助于避免提取没有表达意见的术语。如示例2所示，PIPELINE和INABSA都提取了“ Pizza”。但是，由于在给定的句子中未表达任何意见，因此“Pizza”不应视为一个方面的术语。 IMN通过汇总来自观点预测和情感预测的知识来避免提取此类术语
For aspect-level sentiment, since IMN is trained on larger document-level labeled corpora with balanced sentiment classes, in general it better captures the meaning of domain-specific opinion words (example 3), better captures sentiments of complex expressions such as negation (example4), and better recognizes minor sentiment classes in the aspect-level datasets (negative and neutral in our cases).
对于方面级情感，由于IMN是在具有平衡情感类的较大文档级标签语料库上训练的，因此一般来说，它更好地捕获了特定于领域的观点词的含义(示例3)，更好地捕获了否定等复杂表达的情感(示例4)，并且更好地识别了方面级数据集中的次要情感类(在我们的情况下是否定的和中性的)。
In addition, we find that knowledge propagated by the document-level tasks through message passing is helpful.此外，我们发现文档级任务通过消息传递传播的知识很有帮助。
For example, the sentiment-relevant attention weights are helpful for recognizing uncommon opinion words, and which further help on correctly predicting the sentiments of the aspect terms. As observed in example 5, PIPELINE and INABSA are unable to recognize “scratches easily” as the opinion term, and they also make wrong sentiment prediction on the aspect term “aluminum”.
例如，与情感相关的注意力权重有助于识别不常见的观点词，进而有助于正确预测体词的情感。如例5所示，Pipeline和INABSA不能将“容易划痕”识别为意见术语，并且它们还对方面术语“铝”进行了错误的情感预测。
IMN learns that “scratches” is sentiment-relevant through knowledge from the sentiment-relevant attention weights aggregated via previous iterations of message passing, and is thus able to extract “scratches easily”. Since the opinion predictions from AE are sent to the self-attention layer in the AS component, correct opinion predictions further help to infer the correct sentiment towards “aluminum”.
IMN通过信息传递的前几次迭代累积的与情绪相关的注意力权重的知识，学习到“抓痕”是与情绪相关的，因此能够很容易地提取“抓痕”。由于来自AE的意见预测被发送到AS组件中的自我关注层，正确的意见预测进一步有助于推断对“铝”的正确情绪。

五、Conclusion

We propose an interactive multi-task learning network IMN for jointly learning aspect and opinion term co-extraction, and aspect-level sentiment classification.
我们提出了一种交互式多任务学习网络IMN，用于联合学习方面和观点词的共同提取，以及方面级别的情感分类。
The proposed IMN introduces a novel message passing mechanism that allows informative interactions between tasks, enabling the correlation to be better exploited. In addition, IMN is able to learn from multiple training data sources, allowing fine-grained token-level tasks to benefit from document-level labeled corpora. The proposed architecture can potentially be applied to similar tasks such as relation extraction, semantic role labeling, etc.
提出的IMN引入了一种新颖的消息传递机制，允许任务之间的信息交互，从而使这种相关性得到了更好的利用。此外，IMN能够从多个训练数据源中学习，允许细粒度的token level务受益于文档级标记语料库。该体系结构可以潜在地应用于关系抽取、语义角色标注等类似的任务。

在这里插入图片描述