读取三国演义.txt文件,分析统计其中人物出现的频率,使用Matplotlib绘制词频统计结果,生成人物词云。
提示:以下是本篇文章正文内容,下面案例可供参考
1.分词,并统计出现的次数
import jieba
m = open('三国演义(白话文版).txt', 'r', encoding='utf-8')
s = m.read()
m.close()
with open('中文停用词库.txt', 'r', encoding='utf-8') as f:
stopwords = f.read().split()
word_list = jieba.cut(s)
# 使用jieba.cut进行分词
dic = {}
for word in word_list:
if word not in stopwords and len(word) > 1:
dic[word] = dic.get(word, 0) + 1
fre_list = sorted(dic.items(), key=lambda x: x[1], reverse=True)
st = ''
with open('三国词频_人名.txt', 'w', encoding='utf-8') as f:
for word, fre in fre_list:
st += '{} {}\n'.format(word, fre)
f.write(st)
# 在循环结束后写入文件
结果:
2.生成词云
import jieba
import matplotlib
import wordcloud
from matplotlib import pyplot as plt
f = open('三国词频_人名.txt', 'r', encoding='utf-8')
text = f.read()
wcloud = wordcloud.WordCloud(font_path='C:/Windows/Fonts/simkai.ttf',background_color="white", width=1000, max_words=500, height=860, margin=2).generate(text)
plt.rcParams["font.sans-serif"] = ["SimHei"] # 设置字体
wcloud.to_file("三国词频cloud.png")
f.close()
结果:
解决simkai字体找不到的问题:
下载,点开,安装
我把包附上了,看看能不能下载