python字典循环添加元素,在Python中使用循环在字典中计数元素的高效方法

I have a list of values. I wish to count during a loop the number of element for each class (i.e. 1,2,3,4,5)

mylist = [1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5]

mydict = dict()

for index in mylist:

mydict[index] = +1

mydict

Out[344]: {1: 1, 2: 1, 3: 1, 4: 1, 5: 1}

I wish to get this result

Out[344]: {1: 6, 2: 5, 3: 3, 4: 1, 5: 4}

解决方案

For your smaller example, with a limited diversity of elements, you can use a set and a dict comprehension:

>>> mylist = [1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5]

>>> {k:mylist.count(k) for k in set(mylist)}

{1: 6, 2: 5, 3: 3, 4: 1, 5: 4}

To break it down, set(mylist) uniquifies the list and makes it more compact:

>>> set(mylist)

set([1, 2, 3, 4, 5])

Then the dictionary comprehension steps through the unique values and sets the count from the list.

This also is significantly faster than using Counter and faster than using setdefault:

from __future__ import print_function

from collections import Counter

from collections import defaultdict

import random

mylist=[1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5]*10

def s1(mylist):

return {k:mylist.count(k) for k in set(mylist)}

def s2(mlist):

return Counter(mylist)

def s3(mylist):

mydict=dict()

for index in mylist:

mydict[index] = mydict.setdefault(index, 0) + 1

return mydict

def s4(mylist):

mydict={}.fromkeys(mylist,0)

for k in mydict:

mydict[k]=mylist.count(k)

return mydict

def s5(mylist):

mydict={}

for k in mylist:

mydict[k]=mydict.get(k,0)+1

return mydict

def s6(mylist):

mydict=defaultdict(int)

for i in mylist:

mydict[i] += 1

return mydict

def s7(mylist):

mydict={}.fromkeys(mylist,0)

for e in mylist:

mydict[e]+=1

return mydict

if __name__ == '__main__':

import timeit

n=1000000

print(timeit.timeit("s1(mylist)", setup="from __main__ import s1, mylist",number=n))

print(timeit.timeit("s2(mylist)", setup="from __main__ import s2, mylist, Counter",number=n))

print(timeit.timeit("s3(mylist)", setup="from __main__ import s3, mylist",number=n))

print(timeit.timeit("s4(mylist)", setup="from __main__ import s4, mylist",number=n))

print(timeit.timeit("s5(mylist)", setup="from __main__ import s5, mylist",number=n))

print(timeit.timeit("s6(mylist)", setup="from __main__ import s6, mylist, defaultdict",number=n))

print(timeit.timeit("s7(mylist)", setup="from __main__ import s7, mylist",number=n))

On my machine that prints (Python 3):

18.123854104997008 # set and dict comprehension

78.54796334600542 # Counter

33.98185228800867 # setdefault

19.0563529439969 # fromkeys / count

34.54294775899325 # dict.get

21.134678319009254 # defaultdict

22.760544238000875 # fromkeys / loop

For Larger lists, like 10 million integers, with more diverse elements (1,500 random ints), use defaultdict or fromkeys in a loop:

from __future__ import print_function

from collections import Counter

from collections import defaultdict

import random

mylist = [random.randint(0,1500) for _ in range(10000000)]

def s1(mylist):

return {k:mylist.count(k) for k in set(mylist)}

def s2(mlist):

return Counter(mylist)

def s3(mylist):

mydict=dict()

for index in mylist:

mydict[index] = mydict.setdefault(index, 0) + 1

return mydict

def s4(mylist):

mydict={}.fromkeys(mylist,0)

for k in mydict:

mydict[k]=mylist.count(k)

return mydict

def s5(mylist):

mydict={}

for k in mylist:

mydict[k]=mydict.get(k,0)+1

return mydict

def s6(mylist):

mydict=defaultdict(int)

for i in mylist:

mydict[i] += 1

return mydict

def s7(mylist):

mydict={}.fromkeys(mylist,0)

for e in mylist:

mydict[e]+=1

return mydict

if __name__ == '__main__':

import timeit

n=1

print(timeit.timeit("s1(mylist)", setup="from __main__ import s1, mylist",number=n))

print(timeit.timeit("s2(mylist)", setup="from __main__ import s2, mylist, Counter",number=n))

print(timeit.timeit("s3(mylist)", setup="from __main__ import s3, mylist",number=n))

print(timeit.timeit("s4(mylist)", setup="from __main__ import s4, mylist",number=n))

print(timeit.timeit("s5(mylist)", setup="from __main__ import s5, mylist",number=n))

print(timeit.timeit("s6(mylist)", setup="from __main__ import s6, mylist, defaultdict",number=n))

print(timeit.timeit("s7(mylist)", setup="from __main__ import s7, mylist",number=n))

Prints:

2825.2697427899984 # set and dict comprehension

42.607481333994656 # Counter

22.77713537499949 # setdefault

2853.11187016801 # fromkeys / count

23.241977066005347 # dict.get

15.023175164998975 # defaultdict

18.28165417900891 # fromkeys / loop

You can see that solutions that relay on count with a moderate number of times through the large list will suffer badly/catastrophically in comparison to other solutions.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值