中文文字计数以三国演义为例,打印出出现次数最多的15个,文章下载地址为https://python123.io/resources/pye/threekingdoms.txt
主要涉及到jieba包的使用。
代码实现如下:
# 以《三国演义》为例,下载地址为:https://python123.io/resources/pye/threekingdoms.txt
import jieba
txt = open("threekingdoms.txt", "r", encoding="utf-8").read() # 读取文件
words = jieba.lcut(txt) # 利用jieba进行分词处理,形成一个带有所有单词的列表words
counts = {
} # 构造字典counts
for word in words:
if len(word) == 1:
continue