python线程安全字典,优化大型python字典的解析，多线程

weixin_39610415

于 2021-02-21 11:53:32 发布

阅读量198

点赞数

文章标签： python线程安全字典

Let's take a small example python dictionary, where the values are lists of integers.

example_dict1 = {'key1':[367, 30, 847, 482, 887, 654, 347, 504, 413, 821],

'key2':[754, 915, 622, 149, 279, 192, 312, 203, 742, 846],

'key3':[586, 521, 470, 476, 693, 426, 746, 733, 528, 565]}

Let's say I need to parse the values of the lists, which I've implemented into the following function:

def manipulate_values(input_list):

return_values = []

for i in input_list:

new_value = i ** 2 - 13

return_values.append(new_value)

return return_values

Now, I can easily parse the values of this dictionary as follows:

for key, value in example_dict1.items():

example_dict1[key] = manipulate_values(value)

resulting in the following:

example_dict1 = {'key1': [134676, 887, 717396, 232311, 786756, 427703, 120396, 254003, 170556, 674028],

'key2': [568503, 837212, 386871, 22188, 77828, 36851, 97331, 41196, 550551, 715703],

'key3': [343383, 271428, 220887, 226563, 480236, 181463, 556503, 537276, 278771, 319212]}

That works very well for small dictionaries.

My problem is, I have a massive dictionary with millions of keys and long lists. If I were to apply the above approach, the algorithm would be prohibitively slow.

How could I optimize the above?

(1) Multithreading---are there more efficient options available for multithreading this for statement in the dictionary besides the traditional threading module?

(2) Would a better data structure be appropriate?

I'm asking this question as, I'm quite stuck how to best proceed in this case. I don't see a better data structure than a dictionary, but the for loops across the dictionary (and then across the value lists) is quite slow. There may be something here which has been designed to be faster.

EDIT: As you can imagine, this is somewhat of a toy example---the function in question is a bit more complicated than x**2-13.

I'm more interested in how to possibly worth with a dictionary with millions of keys, with long lists of values.

解决方案

If you can store everything inside a numpy array processing will be faster. I increased the size of each list by a factor of 0.5 millions to test scalability, and these are my results:

from timeit import timeit

import numpy as np

n = 500000

example_dict1 = {'key1':[367, 30, 847, 482, 887, 654, 347, 504, 413, 821]*n,

'key2':[754, 915, 622, 149, 279, 192, 312, 203, 742, 846]*n,

'key3':[586, 521, 470, 476, 693, 426, 746, 733, 528, 565]*n}

def manipulate_values(input_list):

return_values = []

for i in input_list:

new_value = i ** 2 - 13

return_values.append(new_value)

return return_values

With your method:

for_with_dictionary = timeit("""

for key, value in example_dict1.items():

example_dict1[key] = manipulate_values(value)

""", "from __main__ import example_dict1,manipulate_values ",number=5)

print(for_with_dictionary)

>>> 33.2095841

With numpy:

numpy_broadcasting = timeit("""

array = np.array(list(example_dict1.values()))

array = array ** 2 - 13

""", "from __main__ import example_dict1, np",number=5)

print(numpy_broadcasting)

>>> 5.039885

There is a significant upgrade in speed, at least 6 times.

weixin_39610415

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python线程安全字典,优化大型python字典的解析，多线程

Let's take a small example python dictionary, where the values are lists of integers.example_dict1 = {'key1':[367, 30, 847, 482, 887, 654, 347, 504, 413, 821],'key2':[754, 915, 622, 149, 279, 192, 312...
复制链接

扫一扫