实例10:文本词频统计 – Hamlet
文本词频统计:一篇文章,出现了哪些词?哪些词出现的最多
解答
def getText():
txt = open("hamlet.txt", "r").read().lower()
for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':
txt = txt.replace(ch, ' ')
return txt
dic = {}
hamletText = getText()
words = hamletText.split()
for w in words:
dic[w] = dic.get(w, 0) + 1
data = sorted(dic.items(), key = lambda kv:(kv[1], kv[0]), reverse = True)
for i in range(10):
word, count = data[i]
print(word)
题目出处:
Python语言程序设计 (第13期)