统计重复出现的单词
文本来源:https://pan.baidu.com/s/1o75GKZ4
path = r'C:\Users\Administrator\PycharmProjects\untitled1\Walden.txt'
with open(path,'r',encoding='utf8 ') as text:
words = text.read().split()
print(words)
for word in words:
print('{}-{} times'.format(word,words.count(word)))
结论:
- 有一些带标点符号的单词被单独统计了次数;
- 有些单词不止一次地展示了出现的次数;
- 由于Python对大小写敏感,开头大写的单词被单独统计了。
现在我们根据这些点调整一下我们的统计方法。对单词做一些预处理:
mport string
path = r'C:\Users\Administrator\PycharmProjects\untitled1\Walden.txt'
with open(path,'r',encoding='utf8 ') as text:
words = [raw_word.strip(string.punctuation).lower() for raw_word in text.read().split()]
words_index = set(words)
counts_dict = {index: