一、安装所需要的第三方库
jieba (jieba是优秀的中文分词第三分库)
pyecharts (一个优秀的数据可视化库)
《三国演义》.txt下载地址(提取码:kist )
使用pycharm安装库
打开Pycharm选择【File】下的Settings
出现下面页面,
选择右边的【+】出现下面页面,在此页面顶端搜索想要的库,然后安装就可以了
二、编写代码
import jieba #导入库
import os
print("人物出现次数前十名:")
txt = open('三国演义.txt', 'r' ,encoding='gb18030').read()
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
elif word == "诸葛亮" or word == "孔明曰":
rword = "孔明"
elif word == "关公" or word == "云长":
rword = "关羽"
elif word == "玄德" or word == "玄德曰":
rword = "刘备"
elif word == "孟德" or word == "丞相":
rword = "曹操" # 把相同意思的名字归为一个人
else:
rword = word
counts[rword] = counts.get(rword, 0) + 1
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)
for i in range(10):
word, count=items[i]
print("{}:{}".format(word, count)) # 打印前十名名单
结果如下图:
可以看到这里面有很多不是人物的名字,所以咱们要把这些删掉。更改代码如下
import jieba #导入库
import os
print("人物出现次数前十名:")
txt = open('三国演义.txt', 'r' ,encoding='gb18030').read()
remove = {"将军", "却说", "不能", "后主", "上马", &