情感分析中文分词词频统计等附代码

最新推荐文章于 2023-04-08 15:53:19 发布

小李爱发呆

最新推荐文章于 2023-04-08 15:53:19 发布

阅读量1.1k

点赞数 1

分类专栏：结巴分词 python 情感分析文章标签： python 后端

本文链接：https://blog.csdn.net/weixin_45899520/article/details/108965894

版权

python 同时被 3 个专栏收录

5 篇文章 0 订阅

订阅专栏

结巴分词

2 篇文章 0 订阅

订阅专栏

情感分析

1 篇文章 0 订阅

订阅专栏

对获取的评论匹配褒义词表统计褒义词数量并且可以将统计的褒义词输出到文件中，同理贬义词也可以。包含去除停用词结巴分词。
褒义词表可以下载

from collections import Counter
import jieba

#创建停用词list
def stopwordslist(filepath):
    stopwords = [line.strip() for line in open(filepath, 'r',encoding='utf-8').readlines()]#这里打开文件时  选择UTF-8编码
    return stopwords
#对句子进行分词
def seg_sentence(sentence):
    sentence_seged = jieba.cut(sentence.strip())
    stopwords = stopwordslist('E:\\pythonimg\\情感词典\\praise.txt')  # 这里加载褒义词文本的路径  这里可以再加自定义的褒义词
    outstr = ''
    for word in sentence_seged:
        if word in stopwords:   #如果褒义词文本中的褒义词在 分词后的影评中    即匹配褒义词
             #if word != '\t':
                outstr += word
             #   outstr += " "
    return outstr   #输出褒义词

inputs = open('E:\\pythonimg\\comment\\0.txt', 'r',encoding='utf-8')  # 加载要处理的文件的路径即影评的地址
outputs = open('E:\\pythonimg\\已匹配褒义词.txt', 'w',encoding='utf-8')  # 加载处理后的文件路径 已匹配的褒义词存取路径
for line in inputs:            #读取评论文件中的字符
    line_seg = seg_sentence(line)  # 这里的返回值是字符串
    outputs.write(line_seg)   #已匹配的褒义词保存在输出文件
outputs.close()
inputs.close()
with open('E:\\pythonimg\\已匹配褒义词.txt','r',encoding='utf-8') as fr:  # 读入已经匹配褒义词的文件  加载处理后的文件路径
#with open(outputs,'r',encoding='utf-8') as fr:
    data = jieba.cut(fr.read())    #分词
data = dict(Counter(data))

count=0
for k, v in data.items():
    count += v
print(count)  #自己统计的包含褒义词的数量
'''
count=0
with open('E:\\pythonimg\\褒义词及词频.txt','w',encoding='utf-8') as fw:  # 读入已经匹配褒义词的文件并且统计词频
    for k, v in data.items():
        fw.write('%s,%d\n' % (k, v))
        count+=v
'''
#print(count)  #自己统计的包含褒义词的数量