I'm trying to implement multiprocessing for this loop. It fails to modify the array or and does not seem to order the jobs correctly (returns array before last function done).
import multiprocessing
import numpy
def func(i, array):
array[i] = i**2
print(i**2)
def main(n):
array = numpy.zeros(n)
if __name__ == '__main__':
jobs = []
for i in range(0, n):
p = multiprocessing.Process(target=func, args=(i, array))
jobs.append(p)
p.start()
return array
print(main(10))
解决方案
Processes do not share memory, your program initially will create an array full of zeroes, then start 10 processes, which will call the func function on a copy of the array when it was first created, but never the original array.
It seems like what you're really trying to accomplish is this:
from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Array
def modify_array(index, sharedarray):
sharedarray[index] = index ** 2
print([x for x in sharedarray])
def main(n):
lock = Lock()
array = Array('i', 10, lock=lock)
if __name__ == '__main__':
for i in range(0, n):
p = Process(target=modify_array, args=(i, array))
p.start()
p.join()
return list(array)
main(10)
Output:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 1, 4, 0, 0, 0, 0, 0, 0, 0]
[0, 1, 4, 9, 0, 0, 0, 0, 0, 0]
[0, 1, 4, 9, 16, 0, 0, 0, 0, 0]
[0, 1, 4, 9, 16, 25, 0, 0, 0, 0]
[0, 1, 4, 9, 16, 25, 36, 0, 0, 0]
[0, 1, 4, 9, 16, 25, 36, 49, 0, 0]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 0]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
But the problem is, using multiprocessing is misguided. There's a lot of overhead in spawning an additional process, compared to a new thread, or even just staying single-threaded and utilizing an event loop to trigger actions.
An example of using concurrency, within a single-threaded, single process of Python may look like the following:
import numpy as np
from asyncio import get_event_loop, wait, ensure_future
def modify_array(index, array):
array[index] = index ** 2
print([x for x in array])
async def task(loop, function, index, array):
await loop.run_in_executor(None, function, index, array)
def main(n):
loop = get_event_loop()
jobs = list()
array = np.zeros(10)
for i in range(0, n):
jobs.append(
ensure_future(
task(loop, modify_array, i, array)
)
)
loop.run_until_complete(wait(jobs))
loop.close()
main(10)
This is a popular pattern these days, of using asyncio event loops to accomplish tasks in parallel. However, since you're using a library such as Numpy, I question how valuable this pattern may be to you.