While I was using multiprocessing, I found out that global variables are not shared between processes.
Example of the Issue
Let me first provide an example of the issue that I was facing.
I have 2 input lists, which 2 processes wil read from and append them to the final list and print the aggregated list to stdout
import multiprocessing
final_list = []
input_list_one = ['one', 'two', 'three', 'four', 'five']
input_list_two = ['six', 'seven', 'eight', 'nine', 'ten']
def worker(data):
for item in data:
final_list.append(item)
process1 = multiprocessing.Process(target=worker, args=[final_list_one])
process2 = multiprocessing.Process(target=worker, args=[final_list_two])
process1.start()
process2.start()
process1.join()
process2.join()
print(final_list)
When running the example:
$ python3 mp_list_issue.py
[]
As you can see the response from the list is still empty.
Resolution
We need to use multiprocessing.Manager.List.
From Python’s Documentation:
The multiprocessing.Manager returns a started SyncManager object which
can be used for sharing objects between processes. The returned
manager object corresponds to a spawned child process and has methods
which will create shared objects and return corresponding proxies.
import multiprocessing
manager = multiprocessing.Manager()
final_list = manager.list()
input_list_one = ['one', 'two', 'three', 'four', 'five']
input_list_two = ['six', 'seven', 'eight', 'nine', 'ten']
def worker(data):
for item in data:
final_list.append(item)
process1 = multiprocessing.Process(target=worker, args=[final_list_one])
process2 = multiprocessing.Process(target=worker, args=[final_list_two])
process1.start()
process2.start()
process1.join()
process2.join()
print(final_list)
Now when we run our script, we can see that our processes are aware of our defined list:
$ python3 mp_list.py
['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten']