文本文件中单词统计

最新推荐文章于 2018-10-11 20:52:00 发布

阳阳唐

最新推荐文章于 2018-10-11 20:52:00 发布

阅读量1.8k

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/yirexiao/article/details/79163335

版权

Python 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

path = 'Walden1.txt' #引入文件路径
with open(path, 'r') as text:  #将文本文件可读方式打开
words = text.read().split()  #将文本文件中出现的单词按空格进行分隔
print(words)
for word in words:
#print('{}-{} times'.format(word, words.count(word)))
print('%s-%s times'%(word, words.count(word)))

上面的代码有一些问题：

1）有一些带标点符号的单词被单独统计了次数；

2）部分单词不止一次输出出现次数；

3）开头大写的单词被单独统计

import string

path = 'Walden1.txt'
with open(path,'r') as text:
words = [raw_word.strip(string.punctuation).lower() for raw_word in text.read().split()] #string.punctuation将单词右侧的逗号去掉
words_index = set(words)
counts_dict = {index:words.count(index) for index in words_index}
for word in sorted(counts_dict,key=lambda x: counts_dict[x],reverse=True):

print('{}——{}times'.format(word,counts_dict[word]))

运行部分结果：

the——7346times
and——4602times
of——3492times
to——3107times
a——3033times
in——2060times
i——2011times
it——1715times
that——1336times
is——1336times
as——1212times
not——1057times
for——987times
was——886times
or——885times
with——883times
which——869times
but——812times
my——781times
he——761times
be——741times
his——724times
they——715times
on——711times
by——692times
are——673times
have——672times
at——653times
this——569times
——566times
if——553times
通过上面的两部分程序的执行结果可以发现，第二种的执行速度更快。