python创建excel重复写入,python multiprocessing:写入相同的excel文件

I am new to Python and I am trying to save the results of five different processes to one excel file (each process write to a different sheet). I have read different posts here, but still can't get it done as I'm very confused about pool.map, queues, and locks, and I'm not sure what is required here to fulfill this task.

This is my code so far:

list_of_days = ["2017.03.20", "2017.03.21", "2017.03.22", "2017.03.23", "2017.03.24"]

results = pd.DataFrame()

if __name__ == '__main__':

global list_of_days

writer = pd.ExcelWriter('myfile.xlsx', engine='xlsxwriter')

nr_of_cores = multiprocessing.cpu_count()

l = multiprocessing.Lock()

pool = multiprocessing.Pool(processes=nr_of_cores, initializer=init, initargs=(l,))

pool.map(f, range(len(list_of_days)))

pool.close()

pool.join()

def init(l):

global lock

lock = l

def f(k):

global results

*** DO SOME STUFF HERE***

results = results[ *** finished pandas dataframe *** ]

lock.acquire()

results.to_excel(writer, sheet_name=list_of_days[k])

writer.save()

lock.release()

The result is that only one sheet gets created in excel (I assume it is the process finishing last). Some questions about this code:

How to avoid defining global variables?

Is it even possible to pass around dataframes?

Should I move the locking to main instead?

Really appreciate some input here, as I consider mastering multiprocessing as instrumental. Thanks

解决方案1) Why did you implement time.sleep in several places in your 2nd method?

In __main__, time.sleep(0.1), to give the started process a timeslice to startup.

In f2(fq, q), to give the queue a timeslice to flushed all buffered data to the pipe and

as q.get_nowait() are used.

In w(q), are only for testing simulating long run of writer.to_excel(...),

i removed this one.

2) What is the difference between pool.map and pool = [mp.Process( . )]?

Using pool.map needs no Queue, no parameter passed, shorter code.

The worker_process have to return immediately the result and terminates.

pool.map starts a new process as long as all iteration are done.

The results have to be processed after that.

Using pool = [mp.Process( . )], starts n processes.

A process terminates on queue.Empty

Can you think of a situation where you would prefer one method over the other?

Methode 1: Quick setup, serialized, only interested in the result to continue.

Methode 2: If you want to do all workload parallel.

You could't use global writer in processes.

The writer instance has to belong to one process.

Usage of mp.Pool, for instance:

def f1(k):

# *** DO SOME STUFF HERE***

results = pd.DataFrame(df_)

return results

if __name__ == '__main__':

pool = mp.Pool()

results = pool.map(f1, range(len(list_of_days)))

writer = pd.ExcelWriter('../test/myfile.xlsx', engine='xlsxwriter')

for k, result in enumerate(results):

result.to_excel(writer, sheet_name=list_of_days[k])

writer.save()

pool.close()

This leads to .to_excel(...) are called in sequence in the __main__ process.

If you want parallel .to_excel(...) you have to use mp.Queue().

For instance:

The worker process:

# mp.Queue exeptions have to load from

try:

# Python3

import queue

except:

# Python 2

import Queue as queue

def f2(fq, q):

while True:

try:

k = fq.get_nowait()

except queue.Empty:

exit(0)

# *** DO SOME STUFF HERE***

results = pd.DataFrame(df_)

q.put( (list_of_days[k], results) )

time.sleep(0.1)

The writer process:

def w(q):

writer = pd.ExcelWriter('myfile.xlsx', engine='xlsxwriter')

while True:

try:

titel, result = q.get()

except ValueError:

writer.save()

exit(0)

result.to_excel(writer, sheet_name=titel)

The __main__ process:

if __name__ == '__main__':

w_q = mp.Queue()

w_p = mp.Process(target=w, args=(w_q,))

w_p.start()

time.sleep(0.1)

f_q = mp.Queue()

for i in range(len(list_of_days)):

f_q.put(i)

pool = [mp.Process(target=f2, args=(f_q, w_q,)) for p in range(os.cpu_count())]

for p in pool:

p.start()

time.sleep(0.1)

for p in pool:

p.join()

w_q.put('STOP')

w_p.join()

Tested with Python:3.4.2 - pandas:0.19.2 - xlsxwriter:0.9.6

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值