原标题:苹果手机评论情感分析(附python源码和评论数据)
首先抓取网页上的数据,每一页十条评论,生成为一个txt文件。
数据链接
回复公众号 datadw 关键字“苹果”获取。
以下采用既有词典的方式:
准备四本词典,停用词,否定词,程度副词,情感词,链接也给出来:
回复公众号 datadw 关键字“苹果”获取。
[python]view plaincopy
f=open(r'C:/Users/user/Desktop/stopword.dic')#停止词
stopwords = f.readlines()
stopwords=[i.replace("n","").decode("utf-8")foriinstopwords]
fromcollectionsimportdefaultdict
# (1) 情感词
f1 =open(r"C:UsersuserDesktopBosonNLP_sentiment_score.txt")
senList = f1.readlines()
senDict = defaultdict()
forsinsenList:
s=s.decode("utf-8").replace("n","")
senDict[s.split(' ')[0]] = float(s.split(' ')[1])
# (2) 否定词
f2=open(r"C:UsersuserDesktopnotDict.txt")
notList = f2.readlines()
notList=[x.decode("utf-8").replace("n","")forxinnotListifx !='']
# (3) 程度副词
f3=open(r"C:UsersuserDesktopdegreeDict.txt")
degreeList = f3.readlines()
degreeDict = defaultdict()
fordindegreeList:
d=d.decode("utf-8")
degreeDict[d.split(',')[0]] = float(d.split(',')[1])
导入数据并且分词
[python]view plaincopy
importjieba
defsent2word(sentence):
"""
Segment a sentence to words
Delete stopwords
"""
segList = jieba.cut(sentence)
segResult = []
forwinsegList:
segResult.append(w)
newSent = []
forwordinsegResult:
ifwordinstopwords: