【SO-PMI】Sentiment classification of movie reviews using contextual valence shifters

Notes of Sentiment classification of movie reviews using contextual valence shifters [2006.11.2]

In this paper, author proposed two methods for determining the sentiment expressed by a movie review. Besides only consider the semantic polarity of a particular term, first they combined term-counting method with valence shifter through General Inquirer, CTRW and large Web corpus. By adding different existing database, they found that extending the term-counting method with contextual valence shifters improve the accuracy of the classification.

Second method is to use SVM with unigram features and then add bigrams that consist of a valence shifter. The results shows that compared to first method, using SVM has a huge improvement of classification. But valence shifter just has a slightly improvement on the performance.

Finally, author also tried to combine term-counting methods with SVM and it shows that it can achieve better results than either method alone. Because SVM tend to be weak to classify positive ones, and even though only using term-counting methods has a really low accuracy, it is better to deal with the positive classifications, which means these two method just like complementary products of each other.

Above all, the innovative part of this paper is to add degree into polarity through valence shifters.

使用的一些工具:

  • Lemmatization(词性还原):Xerox Incremental parser(XIP)
  • Word sense disambiguation(词义消歧):
    -所有可能的情感都相加,哪种情况数量多就是哪种倾向
    -选择最有可能出现的情感倾向作为当前词的情感,好!
  • Unigram的选取
    -选择出现在数据集三次以上的词
    -只选择在GI中出现的词
    -选择所有在GI,CTRW和形容词作为特征集,好!并证明了与SVM结合能提升性能

Results and discussions:

  • Valence shifteres has an improving effect on the classification of reviews on term-counting system.
  • Adding positive and negative terms from CTRW generally improved the accuracy of the classification no matter it has a valence shifters or not. But adding overstatements and understatements from CTRW did not make a difference.
  • When add a large number of positive and negatiave terms with automatically computed SO-PMI values, the performance is not always better since the positive/negative labels computed with SO-PMI are not always reliable.
  • The accuracy of SVM method are much higher than term-counting method. The positive and negative terms have an important contribution to the success of the ML method. While the valence shifter bigrams are not that significant.
  • The combination of the term-counting system and ML system slightly improved the results because two classifiers don not make the same kind of classificaiton errors.

文中的术语

  • General Inquirer是最早的一款情感词库兼计算机情感分析程序,他是从现有的字典中搜索词汇,人为对其进行按正负性进行分类建立情感词库,其情绪词来源于《哈佛词典(第4版)》和《拉斯韦尔词典》。[English senses, tags include positive, negative, overstatement or understatement.]

  • CTRW:a dictionary of synonym, which lists nuances of lexical meaning.

  • Term-counting method: count the number of the positive and negative words and decide the polarity through the number.优点是不需要训练

  • Valence shifter: negations, intensifiers and dimenishers(否定,加强语气,弱化语气)

  • PMI(pointwise mutual information,点态互信息)
    在这里插入图片描述
    如果x和y越相关,p(x,y)和p(x)p(y)的比就越大。从后两个条件概率可能更好解释, 在y出现的条件下x出现的概率除以单看x出现的概率这个值越大表示x和y越相关。

  • SO-PMI(semantic orientation from pointwise mutual information): gives the degree to which each term is positive or negative
    在这里插入图片描述
    SO-PMI(word1)> 0;为正面倾向,即褒义词
    SO-PMI(word1) = 0;为中性倾向,即中性词
    SO-PMI(word1) < 0;为负面倾向,即贬义词

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值