1Automatically Extracting Polarity-Bearing Topics for Cross-DomainSentiment Classification(20.10.13)

本文链接：https://blog.csdn.net/fuchengguo666/article/details/109060140

Automatically Extracting Polarity-Bearing Topics for Cross-DomainSentiment Classification

自动提取带有极性的主题以进行跨域情感分类

一、Abstract

Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text.先前提出了联合情感主题（JST）模型来从文本中同时检测情感和主题。
The only supervision required by JST model learning is domain-independent polarity word priors.JST模型学习唯一需要的监督是与领域无关的极性词先验。
In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors.在本文中，我们通过修改主题词Dirichlet先验并入词极性先验来修改JST模型。
We study the polarity-bearing topics extracted byJST and show that by augmenting the original feature space with polarity-bearing topics, the in domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset.我们研究了JST提取的带有极性的主题，我们也展示了通过增加带有极性主题的原始特征空间。从增强特征表示中学到的域内监督分类器在电影评论数据上的最新性能达到了95％，在多域情感数据集上的平均水平达到了90％。
Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification.此外，根据信息增益标准使用特征增强和选择进行跨域情感分类。our proposed approach performs either better or comparably compared to previous approaches.与以前的方法相比，我们提出的方法的效果更好或更可比。
Nevertheless, our approach is much simpler and does not require difficult parameter tuning.不过，我们的方法更简单，不需要困难的参数调整。

二、 Related Work

MAP、ME、ASO、SCL、SFA、SVM

三、Model：Joint Sentiment-Topic (JST) Model 联合情感主题（JST）模型

四、Polarity Words Extracted by JST

JST模型允许聚类共享相似情感的不同术语。
我们可以看到，JST似乎更好地捕捉了源域和目标域中的情感关联分布。
这些观察结果促使我们探索JST提取的极性主题用于跨域情感分类，因为将来自不同领域但具有相似情感的词进行分组具有克服两个领域数据分布差异的效果。

五、 Domain Adaptation using JST

在这里插入图片描述
正如JST模型直接建模P(l|d)一节所讨论的那样，可以相应地对给定文档的情感标签概率进行分类，从而对文档极性进行分类。
由于JST模型学习不需要提供文档标签，因此可以通过JST模型从目标域添加最可靠的伪标签文档来扩充源域数据，如算法所示。

六、 Experiments

在两个数据集上使用model
-1） the movie review (MR) data
-2）the multi-domain sentiment (MDS) dataset.
-电影评论数据包括从IMDB电影档案库中提取的1000条正面和1000条负面电影评论，而多域情感数据集包含从Amazon.com提取的四种不同类型的产品评论，包括图书，DVD，电子产品和厨房电器。

在这里插入图片描述

6.1 Experimental Setup
6.2 Supervised Sentiment Classification
-We performed 5-fold cross validation for the performance evaluation of supervised sentiment classification.我们对监督情绪分类的表现评估进行了5-fold的交叉验证。（交叉验证在前面笔记里提过了）
-We have tested several classifiers including Naive Bayes (NB) and support vector machines (SVMs) from WEKA5, and maximum entropy (ME) from MALLET6.我们已经测试了多个分类器，包括来自WEKA5的朴素贝叶斯（NB）和支持向量机（SVM），以及来自MALLET6的最大熵（ME）。
-结果表明，ME的平均性能优于NB和SVM。
-因此，我们仅报告在文档向量上经过训练的ME的结果，每个术语均根据其频率加权。

-当T设置为5时，在正面，负面或中性情绪标签下分别有5个主题组，因此共有15个特征类。
在模型训练之前，从JST模型为每个文档生成的主题被简单地添加到其词袋(BOW)特征表示中。图2显示了从1到200的top-IC数量对五个不同域的分类结果。
可以看出，当主题数设置为1时，分类准确率最高，增加主题个数会导致准确性下降，但在15个主题之后它会趋于稳定。
-在所有领域中，使用JST特性增强仍然优于没有特性增强的ME(基线模型)。
-It is worth pointing out that the JST model with single topic becomes the standard LDA model with only three sentiment topics.值得指出的是，只有一个主题的JST模型成为只有三个情感主题的标准LDA模型。
-尽管如此，我们已经提出了一种有效的方法来将与领域无关的词极性先验信息纳入模型学习中。
-the JST model with word polarity priors incorporated performs significantly better than the LDA model without incorporating such prior information.结合了字极性先验的JST模型比不结合这种先验信息的LDA模型的性能要好得多。
6.3 Domain Adaptation
-我们对MDS数据集进行了域适应实验，该数据集包含四个不同的域，即书本（B），DVD（D），电子产品（E）和厨房电器(K).我们将每个域数据随机分为1600个实例的训练集和400个实例的测试集。在一个域的训练集上训练的分类器在不同域的测试集上进行测试。我们执行了5次随机拆分，并报告了在5次这样的运行中的平均结果。
-Comparison with Baseline Models
We compare our proposed approaches with two baseline models. The first one (denoted as “Base” in Table3) is an ME classifier trained without adaptation.我们将我们提出的方法与两个基线模型进行了比较。第一个(在表3中表示为“Base”)是在没有自适应的情况下训练的ME分类器。
Finally, we performed feature selection by selecting the top 2000 features according to the information gain criteria.最后，根据信息增益标准选择前2000个特征进行特征选择。
共有12个跨域情感分类任务
-我们在表3中显示了适应损失的结果，其中每个域和每个方法的结果通过改变源域对所有三个可能的适应任务进行平均。根据域内黄金标准分类结果计算适应损失。
Parameter Sensitivity
Comparison with Existing Approaches
-We compare in Figure3 our proposed approach with two other domain adaptation algorithms for sentiment classification, SCL and SFA.在图3中，我们将我们提出的方法与另外两种用于情感分类的领域自适应算法SCL和SFA进行了比较。
Our proposed JST-IG approach outperforms SCL in average and achieves comparable results to SFA.我们提出的JST-IG方法在平均性能上优于SCL，并且获得了与SFA相当的结果。

在这里插入图片描述

七、Conclusion

In this paper, we have studied polarity-bearing top-ics generated from the JST model and shown that by augmenting the original feature space with polarity-bearing topics.本文研究了由JST模型生成的含极性顶层ICS，并证明了通过在原始特征空间中增加含极性主题，可以有效地提高特征空间的识别率。
the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance on both the movie review data and the multi-domain sentiment dataset.从扩展特征表示中学习的域内监督分类器在电影评论数据和多领域情感数据集上都达到了最先进的性能。
Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification,在此基础上，根据信息增益准则对特征进行扩充和选择，进行跨域情感分类。
our proposed approach outperforms SCL and gives similar results as SFA. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.我们提出的方法优于SCL，并给出了与SFA相似的结果。然而，我们的方法要简单得多，并且不需要困难的参数调整。