【SO-PMI】Sentiment classification of movie reviews using contextual valence shifters

Notes of Sentiment classification of movie reviews using contextual valence shifters [2006.11.2]

In this paper, author proposed two methods for determining the sentiment expressed by a movie review. Besides only consider the semantic polarity of a particular term, first they combined term-counting method with valence shifter through General Inquirer, CTRW and large Web corpus. By adding different existing database, they found that extending the term-counting method with contextual valence shifters improve the accuracy of the classification.

Second method is to use SVM with unigram features and then add bigrams that consist of a valence shifter. The results shows that compared to first method, using SVM has a huge improvement of classification. But valence shifter just has a slightly improvement on the performance.

Finally, author also tried to combine term-counting methods with SVM and it shows that it can achieve better results than either method alone. Because SVM tend to be weak to classify positive ones, and even though only using term-counting methods has a really low accuracy, it is better to deal with the positive classifications, which means these two method just like complementary products of each other.

Above all, the innovative part of this paper is to add degree into polarity through valence shifters.


  • Lemmatization(词性还原):Xerox Incremental parser(XIP)
  • Word sense disambiguation(词义消歧):
  • Unigram的选取

Results and discussions:

  • Valence shifteres has an improving effect on the classification of reviews on term-counting system.
  • Adding positive and negative terms from CTRW generally improved the accuracy of the classification no matter it has a valence shifters or not. But adding overstatements and understatements from CTRW did not make a difference.
  • When add a large number of positive and negatiave terms with automatically computed SO-PMI values, the performance is not always better since the positive/negative labels computed with SO-PMI are not always reliable.
  • The accuracy of SVM method are much higher than term-counting method. The positive and negative terms have an important contribution to the success of the ML method. While the valence shifter bigrams are not that significant.
  • The combination of the term-counting system and ML system slightly improved the results because two classifiers don not make the same kind of classificaiton errors.


  • General Inquirer是最早的一款情感词库兼计算机情感分析程序,他是从现有的字典中搜索词汇,人为对其进行按正负性进行分类建立情感词库,其情绪词来源于《哈佛词典(第4版)》和《拉斯韦尔词典》。[English senses, tags include positive, negative, overstatement or understatement.]

  • CTRW:a dictionary of synonym, which lists nuances of lexical meaning.

  • Term-counting method: count the number of the positive and negative words and decide the polarity through the number.优点是不需要训练

  • Valence shifter: negations, intensifiers and dimenishers(否定,加强语气,弱化语气)

  • PMI(pointwise mutual information,点态互信息)
    如果x和y越相关,p(x,y)和p(x)p(y)的比就越大。从后两个条件概率可能更好解释, 在y出现的条件下x出现的概率除以单看x出现的概率这个值越大表示x和y越相关。

  • SO-PMI(semantic orientation from pointwise mutual information): gives the degree to which each term is positive or negative
    SO-PMI(word1)> 0;为正面倾向,即褒义词
    SO-PMI(word1) = 0;为中性倾向,即中性词
    SO-PMI(word1) < 0;为负面倾向,即贬义词





当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


