用python对英语文章进行词频统计（以hamlet为例）

Rajer911

已于 2022-09-20 20:04:44 修改

阅读量980

点赞数 1

分类专栏： python-嵩天主编-上机课代码文章标签： python

于 2022-09-20 19:53:55 首次发布

本文链接：https://blog.csdn.net/weixin_64118613/article/details/126960299

版权

python-嵩天主编-上机课代码专栏收录该内容

8 篇文章 2 订阅

订阅专栏

excludes = {"the","and","of","you","a","i","my","in","to","it","that","is","not","his","this","but","with","for","your","me","be","as","he","what","him","so","have","will","do","no","we","are","on","all","our","by","or","shall","if","o","good","come","they","now","more","let"}
def getText():
    txt = open("hamlet.txt", "r").read()
    txt = txt.lower()
    for ch in '!"#$%&()*+,-.<=>?@[\\]^_‘{|}~':
        txt = txt.replace(ch, " ")   #将文本中特殊字符替换为空格
    return txt
hamletTxt = getText()
words  = hamletTxt.split()
counts = {}
for word in words:			
    counts[word] = counts.get(word,0) + 1
for word in excludes:
    del(counts[word])    
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True) 
for i in range(10):
    word, count = items[i]
    print ("{0:<10}{1:>5}".format(word, count))