一般采取sorted自建函数进行排序,但考虑到Counter.most_common 可以取到前若干个,那么不妨实现并比较下二者的效率
# coding:utf - 8 import copy import random import sys import time from collections import Counter from collections import defaultdict from operator import itemgetter reload(sys) sys.setdefaultencoding("utf-8")
thresh = 100000 mapper = defaultdict(int) random_creator = random.Random() const = 10000 for index in xrange(thresh): # type of float random_number = random_creator.random() mapper[int(random_number * const)] += 1 mapper_1 = copy.deepcopy(mapper) mapper_2 = copy.deepcopy(mapper) # counter start = time.time() counter = Counter(mapper_1) print counter.most_common(const) print str(time.time() - start) # sorted start = time.time() print sorted(mapper_2.iteritems(), key=itemgetter(1), reverse=True) print str(time.time() - start)在一台6500的机器上,输出如下:
[(9298, 25), (647, 22), (848, 22), (1365, 22), (2036, 22), (3566, 22)……
0.150000095367
[(9298, 25), (647, 22), (848, 22), (1365, 22), (2036, 22), (3566, 22)……
0.138999938965
看起来还是sorted更胜一筹,代码实现可供参考,多谢捧场