python1002python,Python多处理进程过了一会儿就休眠

I have a script that runs through a directory and searches all files with a given ending (i.e. .xml) for given strings and replaces them. To achieve this I used the python multiprocessing library.

As an example I am using 1100 .xml files with around 200MB of data. The complete execution time is 8 minutes on my MBP '15 15".

But after some minutes, process for process is going to sleep which I see in "top" (here after 7m...).

top output

PID COMMAND %CPU TIME #TH #WQ #PORT MEM PURG CMPR PGRP PPID STATE BOOSTS %CPU_ME %CPU_OTHRS

1007 Python 0.0 07:03.51 1 0 7 5196K 0B 0B 998 998 sleeping *0[1] 0.00000 0.00000

1006 Python 99.8 07:29.07 1/1 0 7 4840K 0B 0B 998 998 running *0[1] 0.00000 0.00000

1005 Python 0.0 02:10.02 1 0 7 4380K 0B 0B 998 998 sleeping *0[1] 0.00000 0.00000

1004 Python 0.0 04:24.44 1 0 7 4624K 0B 0B 998 998 sleeping *0[1] 0.00000 0.00000

1003 Python 0.0 04:25.34 1 0 7 4572K 0B 0B 998 998 sleeping *0[1] 0.00000 0.00000

1002 Python 0.0 04:53.40 1 0 7 4612K 0B 0B 998 998 sleeping *0[1] 0.00000 0.00000

So now only one process is doing all the work while the others went asleep after 4 minutes.

Code snippet

# set cpu pool to cores in computer

pool_size = multiprocessing.cpu_count()

# create pool

pool = multiprocessing.Pool(processes=pool_size)

# give pool function and input data - here for each file in file_list

pool_outputs = pool.map(check_file, file_list)

# if no more tasks are available: close all

pool.close()

pool.join()

So why are all processes going asleep?

My guess: The file list is separated to all Workers in the Pool (same amount each) and a fews are just "lucky" to get the small files - and therefore finish earlier. Can this be true? I Was just thinking that it works more like a Queue so that every worker gets a new file when it is finished - until the list is empty.

解决方案

As @Felipe-Lema pointed out it is a classical RTFM.

I reworked the mentioned part of the script using a multiprocessing Queue instead of a Pool and improved the runtime:

def check_files(file_list):

"""Checks and replaces lines in files

@param file_list: list of files to search

@return counter: number of occurrence """

# as much workers as CPUs are available (HT included)

workers = multiprocessing.cpu_count()

# create two queues: one for files, one for results

work_queue = Queue()

done_queue = Queue()

processes = []

# add every file to work queue

for filename in file_list:

work_queue.put(filename)

# start processes

for w in xrange(workers):

p = Process(target=worker, args=(work_queue, done_queue))

p.start()

processes.append(p)

work_queue.put('STOP')

# wait until all processes finished

for p in processes:

p.join()

done_queue.put('STOP')

# beautify results and return them

results = []

for status in iter(done_queue.get, 'STOP'):

if status is not None:

results.append(status)

return results

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值