Aspect extraction for opinion mining with a deep convolutional neural network(2020.10.21)

Aspect extraction for opinion mining with a deep convolutional neural network

基于深度卷积神经网络的观点挖掘特征提取

一、Abstract

  • In this paper, we present the first deep learning approach to aspect extraction in opinion mining.在本文中,我们提出了第一种深度学习方法,用于观点挖掘中的方面提取。
  • We used a 7-layer deep convolutional neural network to tag each word in opinionated sentences as either aspect or non-aspect word. We also developed a set of linguistic patterns for the same purpose and combined them with the neural network. The resulting ensemble classifier, coupled with a word-embedding model for sentiment analysis, allowed our approach to obtain significantly better accuracy than state-of-the-art methods.
    我们使用了一个7层深度卷积神经网络来将有意见句中的每个词标记为aspectnon-aspect。 我们还为同样的目的开发了一组语言模式,并将它们与神经网络相结合。 由此产生的集成(ensemble)分类器,再加上用于情感分析的单词嵌入模型,使得我们的方法获得了比最先进的方法明显更好的准确率。

二、Introduction

  • Opinion mining techniques can be used for the creation and automated upkeep of review and opinion aggregation websites.意见挖掘技术可用于创建和自动维护评论和意见汇总网站。
  • In opinion mining, different levels of analysis granularity have been proposed, each one having its own advantages and drawbacks.在意见挖掘中,已提出了不同级别的分析粒度,每种级别都有其自身的优缺点。在意见挖掘中,已提出了不同级别的分析粒度,每种级别都有其自身的优缺点。基于Aspect的意见挖掘[4,5]关注的是Aspect与文档极性之间的关系。
  • There are two types of aspects defined in aspect-based opinion mining: explicit aspects and implicit aspects.在基于方面的观点挖掘中定义了两种类型的方面:显式方面和隐式方面。
  • Explicit aspects are words in the opinionated document that explicitly denote the opinion target. For instance, in the above example, the opinion targets screen and resolution are explicitly mentioned in the text.In contrast, an implicit aspect is a concept that represents the opinion target of an opinionated document but which is not specified explicitly in the text. One can infer that the sentence,“This camera is sleek and very affordable” implicitly contains a positive opinion of the aspects appearance and price of the entity camera. These same aspects would be explicit in an equivalent sentence:“The appearance of this camera is sleek and its price is very affordable.”
    显式方面是带有注释的文档中的词,明确表示‘意见目标’。 例如,在上面的示例中,在文本中明确提到了意见目标的屏幕和解决方案。相反,‘隐式方面’是一个概念,它表示带有观点的文档的意见目标,但在文本中未明确指定。 可以推断出“此款相机时尚且价格适中”这句话暗含了对实体相机外观和价格‘方面’的正面评价。 这些‘相同的方面’可以用等价的句子表达出来:“这款相机的外观时尚,价格也非常实惠。”
  • Most of the previous works in aspect term extraction have either used conditional random fields(CRFs)[6,7] or linguistic patterns[4,8]. Both of these approaches have their own limitations: CRF is a linear model, so it needs a large number of features to work well; linguistic patterns need to be crafted by hand, and they crucially depend on the grammatical accuracy of the sentences.
    Aspect术语提取中的大多数先前工作都使用条件随机场(CRF)[6,7]或语言模式[4,8]。 这两种方法都有其自身的局限性:CRF是线性模型,因此需要大量功能才能正常工作; 语言模式需要手工制作,并且关键取决于句子的语法准确性。
  • In this paper, we overcome both limitations by using a convolutional neural network(CNN),a non-linear supervised classifier that can more easily fit the data. Previously,[9] used such a network to solve a range of tasks( not for aspect extraction), on which it out performed other state-of-the-art NLP methods. In addition, we use linguistic patterns to further improve the performance of the method, though in this case the above-mentioned issues inherent in linguistic patterns affect the framework.
    在本文中,我们通过使用’卷积神经网络(CNN)'克服了这两个限制,这是一种更容易拟合数据的非线性监督分类器。 以前,[9]使用这样的网络解决了一系列任务(不是用于方面提取),在这些任务上,它的表现优于其他最先进的NLP方法。 此外,我们使用’语言模式’来进一步提高方法的性能,尽管在这种情况下,语言模式中固有的上述问题会影响框架。
  • This paper is the first one to introduce the application of a deep learning approach to the task of aspect extraction. Our experimental results show that a deep CNN is more efficient for aspect extraction than existing approaches. We also introduced specific linguistic patterns and combined a linguistic pattern approach with a deep learning approach for the aspect extraction task.
    本文首次介绍了深度学习方法在’特征提取’中的应用。 实验结果表明,‘深度CNN’比现有的特征提取方法更有效。 我们还引入了特定的’语言模式’,并将语言模式方法和深度学习方法结合起来进行aspect提取任务

二、Related work

  • 观点的aspect提取最早是由‘Hu’和‘Liu’研究的[4]。 他们介绍了显性方面和隐性方面的区别。 然而,作者只处理了明确的方面,并使用了一套基于统计观察的规则。
  • ‘Hu’和‘Liu’的方法后来被‘Popescu’和‘Etzioni’ [10]和Blair-Goldensohn等[11]改进。 ‘Popescu’和’Etzioni‘ [10]假设产品类别是事先已知的。他们的算法通过计算名词短语和产品类别之间的逐点互信息来检测名词或名词短语是产品特征。
  • Scaffidi et al.[12]presented a method that uses language model to identify product features. They assumed that product features are more frequent in product reviews than in a general natural language text. However, their method seems to have low precision since retrieved aspects are affected by noise. Some methods treated the aspect term extraction as sequence labeling and used CRF for that. Such methods have performed very well on the datasets even in cross-domain experiments.
    Scaffidi‘等人[12]提出了一种使用语言模型识别产品特征的方法。 他们认为产品功能在产品评论中比在一般自然语言文字中更为频繁。 但是,由于检索到的方面受噪声影响,因此它们的方法似乎精度较低。 一些方法将方面术语提取作为序列标记,并为此使用了CRF。 即使在跨域实验中,此类方法在数据集上也表现出色。
  • 主题建模已被广泛用作执行方面的提取和分组的基础[13,14]。考虑了两个模型:pLSA [15]和LDA [16]。 两种模型都在可观察变量“文档”和“单词”之间引入了潜在变量“主题”,以分析文档的语义主题分布。在主题模型中,每个文档都表示为潜在主题的随机混合,其中每个主题的特征都是单词分布。这样的方法已在社交媒体分析中越来越流行,例如Twitter中新兴的政治话题检测[17]。LDA模型定义了用于文档主题分布的Dirichlet概率生成过程; 在每个文档中,根据由Dirichlet先验α控制的多项式分布选择潜在方面。然后,给定一个方面,根据由另一个Dirichlet先验β控制的另一个多项式分布来提取单词。
    Among existing works employing these models are the extraction of global aspects(such as the brand of a product) and local aspects(such as the property of a product[18]),the extraction of key phrases[19], the rating of multi-aspects[20], and the summarization of aspects and sentiments.[21],[22]employed the maximum entropy method to train a switch variable based on POS tags of words and used it to separate aspect and sentiment words
    在采用这些模型的现有作品中,包括全局方面(例如产品的品牌)和局部方面(例如产品的属性[18])的提取,关键短语的提取[19],多重评价 [20],以及方面和情感的概述。[21],[22]采用最大熵方法训练基于词的POS标签的转换变量,并使用它来区分长宽比和情感词。
  • McAuliffe’和’Blei’[23]将用户反馈作为与每个文档相关的响应变量添加到LDA。 ‘Lu’和’Zhai’[24]提出了一个半监督模型。
  • Poria et al.[26]integrated common-sense computing[27] in the calculation of word distributions in the LDA algorithm, thus enabling the shift from syntax to semantics in aspect-based sentiment analysis.
    Poria’等人[26]在LDA算法中集成了‘常识计算’[27]来计算单词分布,从而在基于方面的情感分析中实现了从句法到语义的转换。
  • Wang et al.[28]proposed two semi-supervised models for product aspect extraction based on the use of seeding aspects.‘Wang’等[28]基于种子方面的使用,提出了两个半监督的产品方面提取模型。在有监督的方法类别中,[29]使用种子词来指导主题模型来学习用户感兴趣的主题,而[20]和[30]则使用种子词来从产品评论中提取相关的产品方面。
  • 另一方面,在一系列自然语言处理(NLP)任务上,最近使用深度CNN的方法[9,31]显示出比最先进的方法有显著的性能改进。Collobert et al.[9]fed word embeddings into a CNN to solve standard NLP problems such as named entity recognition(NER),part-of-speech(POS) tagging and semantic role labeling.‘Collobert’等人[9]将单词嵌入到CNN中,以解决标准的NLP问题,如命名实体识别(NER)、词性标记(POS)和语义角色标记。

三、Some background on deep CNN

四、Training CNN for sequential data

我们使用了Collobert等人提出的一种适用于序列数据的特殊训练算法[9],在这里我们将对其进行总结,主要有以下几个方面[34]。

  • The algorithm trains the neural network by back-propagation in order to maximize the likelihood over training sentences. Consider the network parameter θ. We say that hy is the output score for the likelihood of an input x to have the tag y. Then, the probability to assign the label y to x is calculated as.
    该算法通过反向传播来训练神经网络,以使训练语句的似然最大化。 请考虑网络参数θ。 我们说hy是输入x具有标签y的可能性的输出分数。然后,将标签y分配给x的概率计算如下。
    在这里插入图片描述
    In aspect term extraction, the terms can be organized as chunks and are also of ten surrounded by opinion terms.Hence, it is important to consider sentence structure on a whole in order to obtain additional clues.
    在方面术语提取中,这些术语可以组织为大块,也可以是十个由观点术语围绕的术语。因此,为了获得额外的线索,从整体上考虑句子结构是很重要的
    Let it be given that there are T tokens in a sentence and y is the tag sequence while ht, i is the network score for the t-th tag having i-th tag.We introduce Ai,j transition score from moving tag i to tag j. Then, the score tag for the sentence s to have the tag path y is defined by。
    在这里插入图片描述
    This formula represents the tag path probability over all possible paths. Now, from(8) we can write the log-likelihood.
    在这里插入图片描述
    However, using dynamic programming techniques, one can compute in polynomial time the score for all paths that end in a given tag[9].Let ykt denote all paths that end with the tag k at the token t. Then, using recursion, we obtain:但是,使用动态编程技术,可以在多项式时间内计算以给定标签结尾的所有路径的分数[9]。'ykt’表示在标记’t’处以标签’k’结尾的所有路径。 然后,使用递归,我们获得
    在这里插入图片描述
  • For the sake of brevity, we shall not delve into details of the recursive procedure , which can be found in[9].The next equation gives the log-add for all the paths to the token T.
    为了简洁起见,我们不会深入研究递归过程的细节,该过程可以在[9]中找到。下一个等式给出了token T的所有路径的对数加法。
    在这里插入图片描述
  • Using these equations, we can maximize the likelihood of(11) over all training pairs. For inference, we need to find the best tag path using the Viterbi algorithm ;e.g.,we need to find the best tag path that minimizes the sentence score(10)
    使用这些等式,我们可以使所有训练对的(11)可能性最大化。 为了进行推理,我们需要使用Viterbi算法找到最佳的标记路径;例如,我们需要找到最小化句子分数的最佳标记路径(10)

五、Our network architecture

  • The features of an aspect term depend on its surrounding words. Thus, we used a window of 5 words around each word in a sentence, i.e.,±2 words. We formed the local features of that window and considered them to be features of the middle word. Then, the feature vector was fed to a CNN.
    Aspect term的特征取决于其周围的单词。 因此,我们在句子中每个单词周围使用了5个单词的窗口,即±2个单词。 我们形成了该窗口的局部特征,并认为它们是中间单词的特征。 然后,将特征向量馈入CNN.
  • The network contained one input layer, two convolution layers, two max-pool layers, and a fully connected layer with softmax output.The first convolution layer consisted of 100 feature maps with filter size2.The second convolution layer had 50 feature maps with filter size3. The stride in each convolution layer is 1 as we wanted to tag each word. A max-pooling layer followed each convolution layer.The pool size we use in the max-pool layers was2.We used regularization with drop out on the penultimate layer with a constraint on L2-norms of the weight vectors, with 30 epochs. The output of each convolution layer was computed using a non-linear function; in our case we used the hyperbolic tangent.
    该网络包含一个输入层,两个卷积层,两个最大池化层以及一个带softmax输出的完全连接层。第一卷积层由100个特征图组成,过滤器大小为2。第二个卷积层具有50个特征图,其过滤器大小为3。 每个卷积层的步幅为1,因为我们要标记每个单词。 每个卷积层之后都有一个最大池化层。我们在倒数第二层使用了正则化,并在权重向量的L2范数上约束了30个‘epochs’。 使用非线性函数计算每个卷积层的输出; 在我们的例子中,我们使用了双曲正切。
  • As features, we used word embedding strained on two different corpora . We also used some additional features and rules to boost the accuracy; see Section7.The CNN produces local features around each word in a sentence and then combines these features in to a global feature vector. Since the kernel size for the two convolution layers was different, the dimensionality Lx×Ly mentioned in Section3 was 3×300 and 2×300, respectively .The input layer was 65×300, where 65 was the maximum number of words in a sentence, and 300 the dimensionality of the word embeddings used, per each word.
    作为特征,我们使用了在两个不同的语料库上加紧的词嵌入。 我们还使用了一些其他功能和规则来提高准确性。 CNN会在句子中每个单词周围产生局部特征,然后将这些特征组合成全局特征向量。 由于两个卷积层的内核大小不同,因此第3节中提到的维数Lx×Ly分别为3×300和2×300。输入层为65×300,其中65是句子中最大单词数 ,以及每个单词使用的单词嵌入的维数300.
  • The process was performed for each word in a sentence. Unlike traditional max-likelihood leaning scheme, we
    trained the system using propagation after convolving all tokens in the sentence. Namely, we stored the weights, biases, and features for each token after convolution and only back-propagated the error in order to correct them once all tokens were processed using the training scheme from in section4.
    对句子中的每个单词执行该过程。 与传统的最大似然学习方案不同,我们在对句子中的所有标记进行卷积后使用传播来训练系统。 也就是说,在卷积后,我们为每个token存储了权重,偏差和特征,并且仅使用第4节中的训练方案对所有令牌进行处理后,才对错误进行反向传播以纠正它们。

六、Datasets used

  • 6.1. Word embeddings
    在这里插入图片描述
    在这里插入图片描述
  • 6.1.1. Google embeddings
    Mikolov et al.[35]presented two different neural network models for creating word embeddings. Word2vec/CBOW/Skip-gram.
  • 6.1.2. Our Amazon embeddings
  • 6.2. Evaluation corpora
    为了训练和评估所提议的方法,我们使用了两个语料库:
  • Aspect-based sentiment analysis dataset developed by Qiu et al.[37];see Table 1, and
  • SemEval 2014 dataset. The dataset consists of training and test sets from two domains, Laptop and Restaurant; see Table2. SemEval 2014数据集。 数据集包括来自笔记本电脑和餐厅这两个领域的训练和测试集; 见表2
    两个语料库中的注释均根据IOB2进行编码(一种广泛用于表示序列的编码方案)。在这种编码中,每个块的第一个单词以“ B-Type”标签开头,“ I-Type”是块的延续,“ O”用于标记不在块中的单词。在我们的案例中,我们感兴趣的是确定一个单词word或一个块chunk是否是一个aspect,因此我们仅对该单词使用“ B-A”,“ I-A”和“ O”标签。 这是IOB2标签的示例。
    在这里插入图片描述

七、Features and rules used

  • 7.1. Features
  1. Word embeddings:(section 6.1)each word was encoded as 300-dimensional vector, which was fed to the network.
  2. Part of speech tags(POS):Most of the aspect terms are either nouns or noun chunk.(大多数aspect term都是名词或名词块)。We used the POS tag of the word as its additional feature. We used 6 basic parts of speech(noun, verb, adjective, adverb, preposition, conjunction)encoded as a 6-dimensional binary vector. We used Stanford Tagger as a POS tagger.
    我们将单词的POS标签用作其附加功能。 我们使用了语音的6个基本部分(名词,动词,形容词,副词,介词,连词)编码为6维二进制矢量。 我们使用Stanford Tagger作为POS标记器。

These two features vectors were concatenated and fed to CNN. So, for each word the final feature vector is 306 dimensional
将这两个特征向量连接起来并馈入CNN。 因此,对于每个单词,最终特征向量为306维。

  • 7.2. Linguistic patterns

八、Experimental results

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

  • Table 1表示,我们的方法比Popescu和Etzioni[10]以及基于依赖项的传播[37]的最新方法分别提高了5%-10%。 配对t检验表明,在95%的置信水平上,我们所有的改进在统计上都是显著的。

  • Table 4显示,我们的方面术语提取框架的准确性-工作在笔记本电脑和餐厅领域。 该框架在餐厅领域的评论中给出了更好的准确性,因为方面可用术语的种类比笔记本电脑领域的少。 然而,在这两种情况下,召回率都低于准确率。 表4显示了使用POS功能时在查准率和查全率方面的改进。

  • CNN suffered from low recall, i.e.,it missed some valid aspect terms. Linguistic analysis of the syntactic structure of the sentences substantially helped to overcome some drawbacks of machine learning-based analysis.CNN的召回率很低,即错过了一些有效的方面条款。 句子句法结构的语言分析在很大程度上帮助克服了基于机器学习的分析的某些弊端。见Table 5

  • As to the linguistic patterns, the removal of stop-words,Rule1, and Rule 3 were most beneficial. Fig.1 shows a visualization for Table 5.在语言模式上,去掉停用词、Rule 1和Rule 3是最有益的。 图1显示了表5的可视化效果。
    在这里插入图片描述
    在这里插入图片描述

  • Table 6 and Fig.2 shows the comparison between the proposed method and the state of the art on the SemEval dataset.表6和图2显示了SemEval数据集上提出的方法和最新技术之间的比较。

  • As on the SemEval dataset, linguistic patterns together with CNN increased the overall accuracy.就像在SemEval数据集上一样,语言模式和CNN一起提高了总体准确性。

  • the aspect dataset originally developed by Qiuetal.(This is to date the largest comprehensive aspect-based sentiment analysis dataset. Table1, left part ,shows the details of this dataset.)
    我们还在Qiuetal开发的aspect 数据集上进行试验(这是迄今为止最大的基于方面的综合情感分析数据集。 表1左侧显示了此数据集的详细信息)

  • Iinguistic pattern在上面这个数据集上的表现比在SemEval数据集上好,其中原因如下:
    1)one of the possible reasons for this is that most of the sentences in this dataset are grammatically correct and contain only one aspect term.该数据集中的大多数句子在语法上都是正确的,并且仅包含一个方面词。
    2)Here we combined the linguistic patterns and a CNN to achieve even better results than the approach of by Qiuetal.[37] based only on linguistic patterns.在这里,我们将语言模式和CNN结合起来,比Qiuetal的方法甚至获得了更好的结果。[37] 仅基于语言模式。
    3)Our experimental results showed that this ensemble algorithm(CNN+LP) can better understand the semantics of the text than[37]’s pure LP-based algorithm, and thus extracts more salient aspect terms. Table 8 and Fig.3 shows the performance and comparisons of different frameworks.我们的实验结果表明,与[37]的基于纯LP的算法相比,该集成算法(CNN + LP)可以更好地理解文本的语义,从而提取出更多的显着方面项。 表8和图3显示了不同框架的性能和比较。

  • We believe that there are two key reasons for our framework to out perform state-of-the-art approaches. First, a deep CNN, which is non-linear in nature, better fits the data than linear models such as CRF. Second, the pre-trained word embedding features help our framework to out perform state-of-the-art methods that do not use word embeddings. The main advantage of our framework is that it does not need any feature engineering. This minimizes development cost and time.
    我们认为,我们的框架优于 'state-of-the-art’方法有两个关键原因。 首先,深度CNN本质上是非线性的,比CRF等线性模型更适合数据。 其次,预先训练的单词嵌入功能帮助我们的框架在性能上超过了不使用单词嵌入的最先进的方法。 我们框架的主要优点是它不需要任何功能工程。 这最大限度地减少了开发成本和时间。

九、Conclusion

  • We have introduced the first deep learning-based approach to aspect extraction. As expected, this approach gave a significant improvement in performance over state-of-the-art approaches. We proposed a specific deep CNN architecture that comprises seven layers: the input layer, consisting of word embedding features for each word in the sentence; two convolution layers, each followed by a max-pooling layer; a fully connected layer; and, finally, the output layer, which contained one neuron per each word.
    我们引入了第一个基于深度学习的方面提取方法。 正如预期的那样,与最新方法相比,该方法在性能上有了显着提高。 我们提出了一种特定的深层CNN架构,该架构包括七层:输入层,由句子中每个单词的单词嵌入特征组成; 两个卷积层,每个后跟一个最大池化层; 完全连接的层; 最后是输出层,每个单词包含一个神经元。
  • We also developed a set of heuristic linguistic patterns and integrated them with the deep learning classifier. In the future, we plan to extend and refine these patterns.
    我们还开发了一套启发式语言模式,并将其与深度学习分类器集成在一起。 将来,我们计划扩展和完善这些模式。在这里插入图片描述
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值