threekingdoms.txt全文下载:https://python123.io/resources/pye/threekingdoms.txt
CalThreeKingdomsV1.py:
#CalThreeKingdomsV1.py
import jieba
txt = open("threekingdoms.txt", "r", encoding = "utf-8").read()
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
else:
counts[word] = counts.get(word, 0) + 1
items = list(counts.items())
items.sort(key = lambda x:x[1], reverse = True)
for i in range(15):
word, count = items[i]
print("{0:<10}{1:>5}".format(word, count))
运行结果:
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ZHUYUA~1\AppData\Local\Temp\jieba.cache
Loading model cost 1.086 seconds.
Prefix dict has been built successfully.
曹操 953
孔明 836
将军 772
却说 656
玄德 585
关公 510
丞相 491
二人 469
不可 440
荆州 425
玄德曰 390
孔明曰 390
不能 384
如此 378
张飞 358
CalThreeKingdomsV2.py:
#CalThreeKingdomsV2.py
import jieba
txt = open("threekingdoms.txt", "r", encoding = "utf-8").read()
excludes = {"将军", "却说", "荆州", "二人", "不可", "不能", "如此"}
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
elif word == "诸葛亮" or word == "孔明曰":
rword = "孔明"
elif word == "关公" or word == "云长":
rword = "关羽"
elif word == "玄德" or word == "玄德曰":
rword = "刘备"
elif word == "孟德" or word == "丞相":
rword = "曹操"
else:
rword = word
counts[rword] = counts.get(rword, 0) + 1
for word in excludes:
del counts[word]
items = list(counts.items())
items.sort(key = lambda x:x[1], reverse = True)
for i in range(10):
word, count = items[i]
print("{0:<10}{1:>5}".format(word, count))
运行结果:
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ZHUYUA~1\AppData\Local\Temp\jieba.cache
Loading model cost 1.766 seconds.
Prefix dict has been built successfully.
曹操 953
孔明 836
张飞 358
商议 344
如何 338
主公 331
军士 317
吕布 300
左右 294
军马 293
继续优化: