python线程安全字典,优化大型python字典的解析,多线程

Let's take a small example python dictionary, where the values are lists of integers.

example_dict1 = {'key1':[367, 30, 847, 482, 887, 654, 347, 504, 413, 821],

'key2':[754, 915, 622, 149, 279, 192, 312, 203, 742, 846],

'key3':[586, 521, 470, 476, 693, 426, 746, 733, 528, 565]}

Let's say I need to parse the values of the lists, which I've implemented into the following function:

def manipulate_values(input_list):

return_values = []

for i in input_list:

new_value = i ** 2 - 13

return_values.append(new_value)

return return_values

Now, I can easily parse the values of this dictionary as follows:

for key, value in example_dict1.items():

example_dict1[key] = manipulate_values(value)

resulting in the following:

example_dict1 = {'key1': [134676, 887, 717396, 232311, 786756, 427703, 120396, 254003, 170556, 674028],

'key2': [568503, 837212, 386871, 22188, 77828, 36851, 97331, 41196, 550551, 715703],

'key3': [343383, 271428, 220887, 226563, 480236, 181463, 556503, 537276, 278771, 319212]}

That works very well for small dictionaries.

My problem is, I have a massive dictionary with millions of keys and long lists. If I were to apply the above approach, the algorithm would be prohibitively slow.

How could I optimize the above?

(1) Multithreading---are there more efficient options available for multithreading this for statement in the dictionary besides the traditional threading module?

(2) Would a better data structure be appropriate?

I'm asking this question as, I'm quite stuck how to best proceed in this case. I don't see a better data structure than a dictionary, but the for loops across the dictionary (and then across the value lists) is quite slow. There may be something here which has been designed to be faster.

EDIT: As you can imagine, this is somewhat of a toy example---the function in question is a bit more complicated than x**2-13.

I'm more interested in how to possibly worth with a dictionary with millions of keys, with long lists of values.

解决方案

If you can store everything inside a numpy array processing will be faster. I increased the size of each list by a factor of 0.5 millions to test scalability, and these are my results:

from timeit import timeit

import numpy as np

n = 500000

example_dict1 = {'key1':[367, 30, 847, 482, 887, 654, 347, 504, 413, 821]*n,

'key2':[754, 915, 622, 149, 279, 192, 312, 203, 742, 846]*n,

'key3':[586, 521, 470, 476, 693, 426, 746, 733, 528, 565]*n}

def manipulate_values(input_list):

return_values = []

for i in input_list:

new_value = i ** 2 - 13

return_values.append(new_value)

return return_values

With your method:

for_with_dictionary = timeit("""

for key, value in example_dict1.items():

example_dict1[key] = manipulate_values(value)

""", "from __main__ import example_dict1,manipulate_values ",number=5)

print(for_with_dictionary)

>>> 33.2095841

With numpy:

numpy_broadcasting = timeit("""

array = np.array(list(example_dict1.values()))

array = array ** 2 - 13

""", "from __main__ import example_dict1, np",number=5)

print(numpy_broadcasting)

>>> 5.039885

There is a significant upgrade in speed, at least 6 times.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值