threekingdoms.txt中文三国演义下载及实现人物出场统计

本文通过使用jieba分词工具对《三国演义》进行文本分析,展示了出现频率最高的词汇,并对部分人物名称进行了合并处理,如将孔明与孔明曰统一为孔明等,最后排除了一些常见但不具代表性的词汇,得到更精确的高频词汇排名。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

threekingdoms.txt全文下载:https://python123.io/resources/pye/threekingdoms.txt

CalThreeKingdomsV1.py:

#CalThreeKingdomsV1.py
import jieba
txt = open("threekingdoms.txt", "r", encoding = "utf-8").read()
words = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    else:
        counts[word] = counts.get(word, 0) + 1
items = list(counts.items())
items.sort(key = lambda x:x[1], reverse = True)
for i in range(15):
    word, count = items[i]
    print("{0:<10}{1:>5}".format(word, count))

运行结果:

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ZHUYUA~1\AppData\Local\Temp\jieba.cache
Loading model cost 1.086 seconds.
Prefix dict has been built successfully.
曹操          953
孔明          836
将军          772
却说          656
玄德          585
关公          510
丞相          491
二人          469
不可          440
荆州          425
玄德曰         390
孔明曰         390
不能          384
如此          378
张飞          358

CalThreeKingdomsV2.py:

#CalThreeKingdomsV2.py
import jieba
txt = open("threekingdoms.txt", "r", encoding = "utf-8").read()
excludes = {"将军", "却说", "荆州", "二人", "不可", "不能", "如此"}
words = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    elif word == "诸葛亮" or word == "孔明曰":
        rword = "孔明"
    elif word == "关公" or word == "云长":
        rword = "关羽"
    elif word == "玄德" or word == "玄德曰":
        rword = "刘备"
    elif word == "孟德" or word == "丞相":
        rword = "曹操"
    else:
        rword = word
        counts[rword] = counts.get(rword, 0) + 1
for word in excludes:
    del counts[word]
items = list(counts.items())
items.sort(key = lambda x:x[1], reverse = True)
for i in range(10):
    word, count = items[i]
    print("{0:<10}{1:>5}".format(word, count))

运行结果:

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ZHUYUA~1\AppData\Local\Temp\jieba.cache
Loading model cost 1.766 seconds.
Prefix dict has been built successfully.
曹操          953
孔明          836
张飞          358
商议          344
如何          338
主公          331
军士          317
吕布          300
左右          294
军马          293

继续优化:
20

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值