python性能解决方案-Python多处理性能

This should be my third and final question regarding my attempts to increase performance on some statistical analysis that I am doing with python. I have 2 versions of my code (single core vs multiprocessing), I was expecting to gain performance by using multiple cores as I expect my code to uncompress/unpack quite a few binary strings , sadly I noticed that the performance actually decreased by using multiple cores.

I am wondering if anyone has a possible explanation for what I observe (scroll down to the April 16th update for more information)?

The key part of program is the function numpy_array (+ decode in multiprocessing), code snippet below (full code accessible via pastebin, further below):

def numpy_array(data, peaks):

rt_counter=0

for x in peaks:

if rt_counter %(len(peaks)/20) == 0:

update_progress()

peak_counter=0

data_buff=base64.b64decode(x)

buff_size=len(data_buff)/4

unpack_format=">%dL" % buff_size

index=0

for y in struct.unpack(unpack_format,data_buff):

buff1=struct.pack("I",y)

buff2=struct.unpack("f",buff1)[0]

if (index % 2 == 0):

data[rt_counter][1][peak_counter][0]=float(buff2)

else:

data[rt_counter][1][peak_counter][1]=float(buff2)

peak_counter+=1

index+=1

rt_counter+=1

The multiprocessing version performs this with a set of functions, I will display the key 2 below:

def tonumpyarray(mp_arr):

return np.frombuffer(mp_arr.get_obj())

def numpy_array(shared_arr,peaks):

processors=mp.cpu_count()

with contextlib.closing(mp.Pool(processes=processors,

initializer=pool_init,

initargs=(shared_arr, ))) as pool:

chunk_size=int(len(peaks)/processors)

map_parameters=[]

for i in range(processors):

counter = i*chunk_size

chunk=peaks[i*chunk_size:(i+1)*chunk_size]

map_parameters.append((chunk, counter))

pool.map(decode,map_parameters)

def decode ((chunk, counter)):

data=tonumpyarray(shared_arr).view(

[("f0","

for x in chunk:

peak_counter=0

data_buff=base64.b64decode(x)

buff_size=len(data_buff)/4

unpack_format=">%dL" % buff_size

index=0

for y in struct.unpack(unpack_format,data_buff):

buff1=struct.pack("I",y)

buff2=struct.unpack("f",buff1)[0]

#with shared_arr.get_lock():

if (index % 2 == 0):

data[counter][1][peak_counter][0]=float(buff2)

else:

data[counter][1][peak_counter][1]=float(buff2)

peak_counter+=1

index+=1

counter+=1

Full program codes can be accessed via these pastebin links

The performance that I am observing with a file containing 239 timepoints and ~ 180k measurement pairs per timepoint is ~2.5m for single core and ~3.5 for multiprocessing.

PS: The two previous questions (of my first ever attempts at paralellization):

-- April 16th --

I have been profiling my program with the cProfile library (having cProfile.run("main()") in the __main__, which shows that there is 1 step that is slowing everything down:

ncalls tottime percall cumtime percall filename:lineno(function)

23 85.859 3.733 85.859 3.733 {method "acquire" of "thread.lock" objects}

The thing that I do not understand here is that thread.lock objects are used in threading (to my understanding) but should not be used in multiprocessing as each core should run a single thread (besides having it"s own locking mechanism), so how is it that this occurs and why does a single call take 3.7 seconds?

解决方案

Shared data is a known case of slowdowns due to synchronization.

Can you split your data among processes, or give each process an independent copy? Then your processes would not need to synchronize anything up until the moment when all calculations are done.

Then I"d let the master process join the output of all worker processors into one coherent set.

The approach may take extra RAM, but RAM is cheap nowadays.

If you ask, I"m also puzzled by 3700 ms per thread lock acquisition. OTOH profiling may be mistaken about special calls like this.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值