利用jieba和wordcloud写政府工作报告的词云统计和显示

最新推荐文章于 2023-06-11 14:35:09 发布

inthebox2018

最新推荐文章于 2023-06-11 14:35:09 发布

阅读量2k

点赞数

本文链接：https://blog.csdn.net/tdjack/article/details/86738824

版权

这篇博客记录了一位非科班学习者使用Python的jieba和wordcloud库，对政府工作报告进行词频统计和词云图制作的过程。首先，通过jieba进行文本分词，然后排除特定词汇，接着统计词频并打印高频词。最后，利用wordcloud生成普通词云图和以中国地图为形状的词云图。

摘要由CSDN通过智能技术生成

非科班小白，断断续续学习一些python相关的知识，做个简单记录，方便以后查阅，代码中的filename文件需要在py文件同一个文件夹内，另外chinamap这个图片也需要在同意文件夹内。图片需要白色背景

import jieba
from wordcloud import WordCloud
from scipy.misc import imread

###排除词库
excludes = ["我们"]

#打开读取关闭文件
filename = 'zfgzbg2018.txt' #这里的filename是变量
file = open(filename,"r",encoding="utf-8")
txt = file.read()
file.close()

#分词,jieba返回的是列表类型，所以words是一个列表
words = jieba.lcut(txt)

###词频统计，计数
counts={}
for word in words:
if len(word)==1: #排除单个字符的分词结果
continue
elif word in counts:
counts[word] = counts[word] + 1
else:
counts[word] = 1
'''
#或者elif和else可以用下列简短形式表达
else:
counts[word] = counts.get(word,0) + 1