楼主这个问题应用场景中使用的还挺广,我之前就多次遇到这样的问题,最开始我也是傻瓜式地采用自己一个一个往dict里面存,然后发现实在是太**慢了,后来也用了楼上说的collections库,发现比傻瓜式的确快了很多,,然而!这种方式并不是理想的方式,直到我知道了有个numpy库,里面有很多对科学计算相关的函数封装,简直是屡试不爽。。好吧进入正题,下面简短的代码即展示了各种方式对楼主问题的解决方式,还对各方法性能进行了简单评测(注:我也只是个python新手,一直学习中,如有错误,请大家不吝赐教):
import collections
import numpy as np
import random
import time
def list_to_dict_count(lst):
dic = {}
for i in lst:
dic[i] = lst.count(i)
return dic
def list_to_dict(lst):
dic = {}
for i in lst:
if i not in dic:
dic[i] = 1
else:
dic[i] += 1
return dic
def collect(lst):
return dict(collections.Counter(lst))
def unique(lst):
return dict(zip(*np.unique(lst, return_counts=True)))
def generate_data(num=10000000):
return np.random.randint(num / 10, size=num)
if __name__ == "__main__":
t1 = time.time()
lst = list(generate_data())
t2 = time.time()
t1 = t2
d1 = unique(lst)
t2 = time.time()
print("unique took :%sms" % (t2 - t1)) # 本机实测2.31s
t1 = t2
d2 = collect(lst)
t2 = time.time()
print("collect took :%sms" % (t2 - t1)) # 本机实测6.05s
t1 = t2
d3 = list_to_dict(lst)
t2 = time.time()
print("list_to_dict took :%sms" % (t2 - t1)) # 本机实测4.73s
t1 = t2
d4 = list_to_dict_count(lst)
t2 = time.time()
print("list_to_dict_count took :%sms" % (t2 - t1)) # 本机实测...太慢了测不下去了
assert(d1 == d2)
assert(d1 == d3)
assert(d1 == d4)
结论:要善于利用大神们造的轮子(标准库 & 扩展库 and 其他包),虽然不造这个numpy是谁造的,但是真是太强大了,另附上numpy.unique这个函数的官方链接,还有其他妙用的哈哈:numpy.unique