问题
任一个英文的纯文本文件,统计其中的单词出现的个数。
代码
# -*- coding: utf-8 -*-
from collections import defaultdict
import re
def count_word(file_name):
try:
result = defaultdict(int)
with open(file_name) as f:
for line in f:
words = re.findall(r'\b\w+\b', line)
# print(words)
for word in words:
result[word] += 1
return result
except:
print('Error: File not exits or other error')
return None
if __name__ == '__main__':
file_name = '0001.py'
result = count_word(file_name)
word_sorted = sorted(zip(result.values(), result.keys()), reverse=True)
for num, word in word_sorted:
print(word, ' 出现了: ', num, ' 次')
知识点
- 正则表达式,
\b\w+\b
,详解可见 正则表达式30分钟入门教程 - defaultdict的使用, 详解可见 defaultdict 对象
- 字典值大小比较,详细可见 字典的运算