文件方式实现完整的英文词频统计实例

最新推荐文章于 2020-11-24 11:12:21 发布

weixin_30752377

最新推荐文章于 2020-11-24 11:12:21 发布

阅读量40

点赞数

原文链接：http://www.cnblogs.com/marsk/p/7604086.html

版权

可以下载一长篇的英文小说，进行词频的分析。

1.读入待分析的字符串

2.分解提取单词

3.计数字典

4.排除语法型词汇

5.排序

6.输出TOP(20)

7.对输出结果的简要说明。

str=open('F:\\wanghao.txt','r')

#读入待分析的字符串
str=str.read()

#将所有大写转换为小写
str=str.lower()

#将所有将所有其他做分隔符（,.？！）替换为空格
for i in ',.?!:':
    str=str.replace(i,' ')

#分隔出一个一个单词
str=str.split(' ')

#排除语法型词汇
exp={'is','and','that','it','a','our','have','','the','for','of','as','on','be','will','we','can','with','all','more','be','in','to','this','an','own','how','at','are','one'}
word=set(str)-exp

#计数字典 
dic={}
for i in word:
    dic[i]=str.count(i)
str=list(dic.items())

#排序
str.sort(key=lambda x:x[1],reverse=True)
for i in range(10):
    print(str[i])