python dict初始化大小_是否可以给出一个python dict的初始容量(并且它是有用的)...

I am filling a python dict with around 10,000,000 items. My understanding of dict (or hashtables) is that when too much elements get in them, the need to resize, an operation that cost quite some time.

Is there a way to say to a python dict that you will be storing at least n items in it, so that it can allocate memory from the start? Or will this optimization not do any good to my running speed?

(And no, I have not checked that the slowness of my small script is because of this, I actually wouldn't now how to do that. This is however something I would do in Java, set the initial capacity of the HashSet right)

解决方案

First off, I've heard rumor that you can set the size of a dictionary at initialization, but I have never seen any documentation or PEP describing how this would be done.

With this in mind I ran an analysis on your quantity of items, described below. While it may take some time to resize the dictionary each time I would recommend moving ahead without worrying about it, at least until you can test its performance.

The two rules that concern us in determining resizing is number of elements and factor of resizing. A dictionary will resize itself when it is 2/3 full on the addition of the element putting it over the 2/3 mark. Below 50,000 elements it will increase by a factor of 4, above that amount by a factor of 2. Using your estimate of 10,000,000 elements (between 2^23 and 2^24) your dictionary will resize itself 15 times (7 times below 50k, 8 times above). Another resize would occur just past 11,100,000.

Resizing and replacing the current elements in the hashtable does take some time, but I wonder if you'd notice it with whatever else you have going on in the code nearby. I just put together a timing suite comparing inserts at five places along each boundary from dictionary sizes of 2^3 through 2^24, and the "border" additions average 0.4 nanoseconds longer than the "non-border" additions. This is 0.17% longer... probably acceptable. The minimum for all operations was 0.2085 microseconds, and max was 0.2412 microseconds.

Hope this is insightful, and if you do check the performance of your code please follow-up with an edit! My primary resource for dictionary internals was the splendid talk given by Brandon Rhodes at PyCon 2010: The Mighty Dictionary

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值