python 统计英文词频

python 统计英文词频 (以1986-2017年考研英语真题为例)

运行截图:
在这里插入图片描述
代码:

import re
#除去简单词、序号等
excludes = ['the', 'of', 'to', 'and', 'in', 'a', 'is', 'were', 'was', 'you',
            'I', 'he', 'his', 'there', 'those', 'she', 'her', 'their',
            'that', '[a]', '[b]', '[c]', '[d]', 'them', 'or','for','as',
            'are','on','it','be','with','by','have','from','not','they',
            'more','but','an','at','we','has','can','this','your','which','will',
            'one','should','points)','________','________.','all','than','what',
            'people','if','been','its','new','our','would','part','may','some','i',
            'who','answer','when','most','so','section','no','into','do','only',
            'each','other','following','had','such','much','out','--','up','these',
            'even','how','directions:','use','because','(10','time','(15','[d].',
            '-','it.','[b],','[a],','however,','1','c','1','2','b','d','a','(10',
            '2','12','13','29','3','4','5','6','7','8','9','10','11','14',
            '15','20','22','23','24','25','26','27']

def gettext():
    txt=open("1986年到2017年考研英语2真题.txt","r").read()
    txt=txt.lower()
    for ch in '! " #$%&()*+,_./:;<=>?@[\\]^_`{|}~ ':
        txt=txt.replace(ch,"")
    return txt

Txt=gettext()
words=Txt.split()
counts={}
for word in words:
    flag=True
    for word1 in excludes:
        if word==word1:
            flag=False
        else:
            continue            
    if flag is True:
        counts[word]=counts.get(word,0)+1
    else:
        continue
            
countslist=list(counts.items())
countslist.sort(key=lambda x:x[1],reverse=True)

for i in range(10):
    word,count=countslist[i]
    print("{0:<10}{1:>5}".format(word,count))
    
  • 2
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值