Python Every Day, 第 39 期
题目:
任一个英文的纯文本文件,统计其中的单词出现的个数。
如下:
In our world , one creature without any rivals is a lifeless creature.
If a man lives without rivals, he is bound to be satisfied with the present and will not strive for the better.
He would hold back before all difficulties and decline in inaction and laziness.
Adverse environment tends to cultivate successful people. Therefore, your rivals are not your opponents or those you grudge.
Instead , they are your good friends!
In our lives, we need some rivals to "push us into the river", leaving us striving ahead in all difficulties and competitions.
In our work, we need some rivals to be picky about us and supervise our work with rigorous requirements and standards.
Due to our rivals, we can bring out our potential to the best; Due to our rivals, we will continuously promote our capabilities when competing with them!
常规解法
完整参考代码如下:
# coding=utf-8
from collections import defaultdict
import re
# 替换除了n't这类连字符外的所有非单词字符和数字字符
def replace(s):
if s.group(1) == 'n\'t':
return s.group(1)
return ' '
def cal(filename='english.txt'):
# 使用lambda来定义简单的函数
dic = defaultdict(lambda: 0)#dic = defaultdict(int)也可以
with open(filename, 'r') as f:
data = f.read()
# 全部变为小写字母
data = data.lower()
# 替换除了n't这类连字符外的所有非单词字符和数字字符
data = re.sub(r'(n[\']t)|([\W\d])', replace, data)
datalist = re.split(r'[\s\n]+', data)
for item in datalist:
dic[item] += 1
del dic['']
return dic
if __name__ == '__main__':
dic = cal()
for key, val in dic.items():
print('%15s ---> 出现了 %s 次' % (key,val))
执行结果部分截图:
以上,是一种比较常规的方法,有没有更简单的方法?并且要求按照出现次数排序显示。
Pythonic的解法:
我在每日一题36期中简单的讲了一下内置模块collections的用法
这里就可以借用里面的Counter方法的功能去完成我们的问题,
即:
Counter用于追踪值的出现次数(列表中每个元素出现的次数)
# coding=utf-8
import re
from collections import Counter
def cal(filename='english.txt'):
with open(filename, 'r') as f:
data = f.read()
data = data.lower().replace(',', '').replace('.', '')
# 替换除了n't这类连字符外的所有非单词字符和数字字符
datalist = re.split(r'[\s\n]+', data)
return Counter(datalist).most_common()
if __name__ == '__main__':
dic = cal()
for i in range(len(dic)):
print('%15s ---> 出现了 %s 次' % (dic[i][0], dic[i][1]))
输出结果部分截图:
以上,便是今天的分享,希望大家喜欢,觉得内容不错的,欢迎点击「在看」支持,谢谢各位。
如需查看更多[Python Every Day]系列,请点击我的主页的【每日一题】菜单。
感谢您的阅读