url: https://www.linuxjournal.com/content/multiprocessing-python
1 Basic concepts
- Python’s “multiprocessing” module feels like threads, but actually launches processes.
- Downside to threads is the global interpreter lock (GIL). Because a thread cedes the GIL whenever it uses I/O, this means threads are good idea when dealing with I/O.
- When dealing with lots of I/O, it is prefered to take full advantage of a multicore system. And in Python, that means using processes.
- To the dilemma of launch easy-to-use threads even though they don’t really run in parallel, or launch new processes over which we have little control, the answer is somewhere in the middle. The Python standard library “multiprocessing” gives the feeling of working with threads, but that actually works with processes.
2 first example: same result
2.1 threading
The “multiprocessing” module is designed to look and feel like the “threading” module, and it largely succeeds in doing so. For example, the following is a simple example of a multithreaded program:
def hello(n):
time.sleep(random.randint(1,3))
print("[{0}] Hello!".format(n))
for i in range(10):
threading.Thread(target=hello, args=(i,)).start()
print("Done!")
But “Done!” is printed befor the threads. To correct this, we can use join to wait each thread be completed.
threads = [ ]
for i in range(10):
t = threading.Thread(target=hello, args=(i,))
threads.append(t)
t.start()
for one_thread in threads:
one_thread.join()
print("Done!")
1.2 multiprocessing
import multiprocessing
processes = [ ]
for i in range(10):
t = multiprocessing.Process(target=hello, args=(i,))
processes.append(t)
t.start()
for one_process in processes:
one_process.join()
print("Done!")
The result is the same
What’s the Difference between threading and multiprocessing?
3 The second example shows difference
Perhaps the biggest difference is that threads share global variables while separate processes don’t.
3.1 threading share global variable
Here’s a simple example of how a function running in a thread can modify a global variable.
just to prove a point; if we really want to modify global variables from within a thread, we should use a lock.
import threading
mylist = [ ]
def hello(n):
time.sleep(random.randint(1,3))
mylist.append(threading.get_ident()) # bad in real code!
print("[{0}] Hello!".format(n))
threads = [ ]
for i in range(10):
t = threading.Thread(target=hello, args=(i,))
threads.append(t)
t.start()
for one_thread in threads:
one_thread.join()
print("Done!")
print(len(mylist))
print(mylist)
The function appends its ID to that list and then returns.
Dont do that in real code, because Python data structures ARENOT thread-safe!
output:
Done!
10
[123145344081920, 123145354592256, 123145375612928, …] #mylist
The global variable mylist is shared by the threads.
3.2 multiprocessing don’t share global variables
import multiprocessing
mylist = [ ]
def hello(n):
time.sleep(random.randint(1,3))
mylist.append(os.getpid())
print("[{0}] Hello!".format(n))
processes = [ ]
for i in range(10):
t = multiprocessing.Process(target=hello, args=(i,))
processes.append(t)
t.start()
for one_process in processes:
one_process.join()
print("Done!")
print(len(mylist))
print(mylist)
The output from this program is as follows:
0
[] #mylist is empty
Each time a new process with “multiprocessing” is created, the process has its own value of the global mylist list, which goes away when the processes are joined.
3.3 Queues for multiprocessing to share global
In multiprocessing, queues can bridge the gap among processes.
import multiprocessing
from multiprocessing import Queue
q = Queue()
def hello(n):
time.sleep(random.randint(1,3))
q.put(os.getpid())
print("[{0}] Hello!".format(n))
processes = [ ]
for i in range(10):
t = multiprocessing.Process(target=hello, args=(i,))
processes.append(t)
t.start()
for one_process in processes:
one_process.join()
mylist = [ ]
while not q.empty():
mylist.append(q.get())
print("Done!")
print(len(mylist))
print(mylist)
The Queue instance is designed to be shared across the different processes. Moreover, it can handle any type of Python data.
4 Conclusion
Threading is easy to work with, but threads don’t truly execute in parallel. Multiprocessing is a module that provides an API that’s almost identical to that of threads.