统计 Python 中的字数
问题
在 Python 中实施函数“count_words()”,该函数将字符串“s”和数字“n”用作输入,并返回“s”中“n”个出现频率最高的单词。返回值应该是一个元组列表 - 出现频率最高的“n”个单词及其相应的出现次数“[(, ), (, ), …]”,按出现次数的降序排列。
您可以假设所有输入都是小写形式,并且不含标点符号或其他字符(只包含字母和单个分隔空格)。如果出现次数相同,则按字母顺序排列出现次数相同的单词。
例如:
print count_words("betty bought a bit of butter but the butter was bitter",3)
Output:
[('butter', 2), ('a', 1), ('betty', 1)]
代码:
"""Count words."""
from collections import Counter
def count_words(s, n):
"""Return the n most frequently occuring words in s."""
# TODO: Count the number of occurences of each word in s
res_c = Counter(s.split())
res_l = res_c.items()
# TODO: Sort the occurences in descending order (alphabetically in case of ties)
res_alphabet = sorted(res_l,key=lambda x:x[0])
res_time = sorted(res_alphabet,key=lambda x:x[1],reverse=True)
top_n = res_time[0:n]
# TODO: Return the top n words as a list of tuples (<word>, <count>)
return top_n
def test_run():
"""Test count_words() with some inputs."""
print count_words("cat bat mat cat bat cat", 3)
print count_words("betty bought a bit of butter but the butter was bitter", 3)
if __name__ == '__main__':
test_run()
通过两次排序实现,复杂规则(有正序也有反序的情况)排序.
从 Python 2.2 开始,排序将保证能够 stable。 这意味着当多个记录拥有相同的 key 时,它们的原始顺序将被保留。
IN:data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
IN:sorted(data, key=itemgetter(0))
OUT:[('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
这里有两条记录都包含 ‘blue’ 并且原列表中 (‘blue’, 1) 排在 (‘blue’, 2) 之前,排序后这个顺序依旧被保留。
这个非常有用的特性可以用来实现包含多重排序(一会升序,一会降序)的复杂排序。比如,目标是实现 student 数据先以 grade 降序排序再以 age 升序排序:
>>> class Student:
def __init__(self, name, grade, age):
self.name = name
self.grade = grade
self.age = age
def __repr__(self):
return repr((self.name, self.grade, self.age))
>>> student_objects = [
Student('john', 'A', 15),
Student('jane', 'B', 12),
Student('dave', 'B', 10),
]
>>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key
>>> s
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
>>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]