原文:Multiprocessing in Python | Set 1 (Introduction)
原文:Multiprocessing in Python | Set 2 (Communication between processes)
原文:Synchronization and Pooling of processes in Python
文章目录
what is multiprocessing?
multiprocessing指计算机能够支持多个处理器同时运作。multiprocessing系统的应用被划分为多个小程序独立运行。
why multiprocessing?
单进程的系统在处理多进程时,只能将进程悬挂,轮时间片。multiprocessing系统可以是多处理器,也可以是多核处理器。CPU可以同时执行多个任务,每个任务有自己独立的处理器。
multiprocessing模块
创建首个程序
# importing the multiprocessing module
import multiprocessing
def print_cube(num):
"""
function to print cube of given num
"""
print("Cube: {}".format(num * num * num))
def print_square(num):
"""
function to print square of given num
"""
print("Square: {}".format(num * num))
if __name__ == "__main__":
# creating processes
p1 = multiprocessing.Process(target=print_square, args=(10, ))
p2 = multiprocessing.Process(target=print_cube, args=(10, ))
# starting process 1
p1.start()
# starting process 2
p2.start()
# wait until process 1 is finished
p1.join()
# wait until process 2 is finished
p2.join()
# both processes finished
print("Done!")
Done!
Square: 100
Cube: 1000
- 导入multiprocess
- 创建进程Process()
- target: 进程执行的函数
- arge: target函数的参数
- 开启进程:start()
- 直到进程p执行完成再执行当前程序:p.join()
进程ID
# importing the multiprocessing module
import multiprocessing
import os
def worker1():
# printing process id
print("ID of process running worker1: {}".format(os.getpid()))
def worker2():
# printing process id
print("ID of process running worker2: {}".format(os.getpid()))
if __name__ == "__main__":
# printing main program process id
print("ID of main process: {}".format(os.getpid()))
# creating processes
p1 = multiprocessing.Process(target=worker1)
p2 = multiprocessing.Process(target=worker2)
# starting processes
p1.start()
p2.start()
# process IDs
print("ID of process p1: {}".format(p1.pid))
print("ID of process p2: {}".format(p2.pid))
# wait until processes are finished
p1.join()
p2.join()
# both processes finished
print("Both processes finished execution!")
# check if processes are alive
print("Process p1 is alive: {}".format(p1.is_alive()))
print("Process p2 is alive: {}".format(p2.is_alive()))
ID of main process: 22008
ID of process p1: 5336
ID of process p2: 25660
ID of process running worker1: 5336
ID of process running worker2: 25660
Both processes finished execution!
Process p1 is alive: False
Process p2 is alive: False
- 主python脚本的进程id和multiprocessing模块产出的新的进程的id不同,os.getpid()可以获取当前运行的进程的pid
- 每个进程独立运行,有自己不同的内存空间
- 一旦子进程执行完目标函数,进程就会立即终止,is_alive返回进程当前是否存活
独立内存空间
import multiprocessing
# empty list with global scope
result = []
def square_list(mylist):
"""
function to square a given list
"""
global result
# append squares of mylist to global list result
for num in mylist:
result.append(num * num)
# print global list result
print("Result(in process p1): {}".format(result))
if __name__ == "__main__":
# input list
mylist = [1,2,3,4]
# creating new process
p1 = multiprocessing.Process(target=square_list, args=(mylist,))
# starting process
p1.start()
# wait until process is finished
p1.join()
# print global result list
print("Result(in main program): {}".format(result))
Result(in process p1): [1, 4, 9, 16]
Result(in main program): []
在上面的程序中,有两个地方打印了result这个全局变量:
- square_list函数,进程p1调用该函数,只有p1进程的内存空间中的result变量的值会改变
- 在主程序执行完p1进程后打印result. 因为主程序运行在不同的进程中,该进程中的result变量还是一个空列表。
多进程共享数据
shared memory object
multiprocessing模块提供Array 和Value对象来实现进程间的数据共享
- Array: shared memory分配的ctype数组
- Value: shared memory 分配的ctype对象
import multiprocessing
def square_list(mylist, result, square_sum):
"""
function to square a given list
"""
# append squares of mylist to result array
for idx, num in enumerate(mylist):
result[idx] = num * num
# square_sum value
square_sum.value = sum(result)
# print result Array
print("Result(in process p1): {}".format(result[:]))
# print square_sum Value
print("Sum of squares(in process p1): {}".format(square_sum.value))
if __name__ == "__main__":
# input list
mylist = [1,2,3,4]
# creating Array of int data type with space for 4 integers
result = multiprocessing.Array('i', 4)
# creating Value of int data type
square_sum = multiprocessing.Value('i')
# creating new process
p1 = multiprocessing.Process(target=square_list, args=(mylist, result, square_sum))
# starting process
p1.start()
# wait until process is finished
p1.join()
# print result array
print("Result(in main program): {}".format(result[:]))
# print square_sum Value
print("Sum of squares(in main program): {}".format(square_sum.value))
Result(in process p1): [1, 4, 9, 16]
Sum of squares(in process p1): 30
Result(in main program): [1, 4, 9, 16]
Sum of squares(in main program): 30
- 创建multiprocessing.Array对象result:
result = multiprocessing.Array('i', 4)
第一个参数为data type,‘i’为integer, 'd’为float data type。
第二个参数为数组的size。
- 创建multiprocessing.Value对象square_sum:
square_sum = multiprocessing.Value('i', 10)
- 将Array,Value对象传递给子进程
p1 = multiprocessing.Process(target=square_list, args=(mylist, result, square_sum))
- 给Array,Value对象赋值
for idx, num in enumerate(mylist):
result[idx] = num * num
square_sum.value = sum(result)
Server Process
当python程序开始时,同时也开始了一个server process进程。这时,当需要开一个新进程时,父进程会与server process通信并要求fork一个新的进程。
server process容纳所有的python对象,并且允许其他的进程通过代理操作这些python对象。
multiprocessing模块提供了Manager类来控制server process。managers提供了创建进程共享的数据方法。
server process管理器比shared memory对象更灵活,其支持任意的数据类型,比如列表,字典,队列,Value,Array等,同一个网络中的不同计算机之间通过进程可以共享一个manager,但是它的速度比shared memory对象慢。
import multiprocessing
def print_records(records):
"""
function to print record(tuples) in records(list)
"""
for record in records:
print("Name: {0}\nScore: {1}\n".format(record[0], record[1]))
def insert_record(record, records):
"""
function to add a new record to records(list)
"""
records.append(record)
print("New record added!\n")
if __name__ == '__main__':
with multiprocessing.Manager() as manager:
# creating a list in server process memory
records = manager.list([('Sam', 10), ('Adam', 9), ('Kevin',9)])
# new record to be inserted in records
new_record = ('Jeff', 8)
# creating new processes
p1 = multiprocessing.Process(target=insert_record, args=(new_record, records))
p2 = multiprocessing.Process(target=print_records, args=(records,))
# running process p1 to insert new record
p1.start()
p1.join()
# running process p2 to print records
p2.start()
p2.join()
New record added!
Name: Sam
Score: 10
Name: Adam
Score: 9
Name: Kevin
Score: 9
Name: Jeff
Score: 8
进程间通信
高效的多进程程序通常需要进程间的通信,以便划分任务整合结果,multiprocessing支持两种进程间的通信模式(communocation channel):
- Queue
- Pipe
Queue
使用Queue在进程间来回的传递python object。multiprocessing.Queue类似于queue.Queue。
import multiprocessing
def square_list(mylist, q):
"""
function to square a given list
"""
# append squares of mylist to queue
for num in mylist:
q.put(num * num)
def print_queue(q):
"""
function to print queue elements
"""
print("Queue elements:")
while not q.empty():
print(q.get())
print("Queue is now empty!")
if __name__ == "__main__":
# input list
mylist = [1,2,3,4]
# creating multiprocessing Queue
q = multiprocessing.Queue()
# creating new processes
p1 = multiprocessing.Process(target=square_list, args=(mylist, q))
p2 = multiprocessing.Process(target=print_queue, args=(q,))
# running process p1 to square list
p1.start()
p1.join()
# running process p2 to get queue elements
p2.start()
p2.join()
Queue elements:
1
4
9
16
Queue is now empty!
Pipe
pipe有两个通信端,因此在双向通信中更倾向使用Pipe而非Queue。Pipe()函数返回一对connection对象,分别表示通信的两端,每个connection对象都有send()和recv()方法。
import multiprocessing
def sender(conn, msgs):
"""
function to send messages to other end of pipe
"""
for msg in msgs:
conn.send(msg)
print("Sent the message: {}".format(msg))
conn.close()
def receiver(conn):
"""
function to print the messages received from other
end of pipe
"""
while 1:
msg = conn.recv()
if msg == "END":
break
print("Received the message: {}".format(msg))
if __name__ == "__main__":
# messages to be sent
msgs = ["hello", "hey", "hru?", "END"]
# creating a pipe
parent_conn, child_conn = multiprocessing.Pipe()
# creating new processes
p1 = multiprocessing.Process(target=sender, args=(parent_conn,msgs))
p2 = multiprocessing.Process(target=receiver, args=(child_conn,))
# running processes
p1.start()
p2.start()
# wait until processes finish
p1.join()
p2.join()
Sent the message: hello
Sent the message: hey
Sent the message: hru?
Sent the message: END
Received the message: hello
Received the message: hey
Received the message: hru?
如果两个进程(线程)同时从pipe的一端读数据或者写数据,可能会使pipe中的数据出现错误,而Queues在进程之间进行了同步,代价是增加复杂性。因此,队列被称为线程和进程安全的!
进程间的同步
进程间的同步机制能够保证两个及两个以上的并发进程不能同时对临界区(Critical Section)操作,临界区是每个进程中访问临界资源的代码片段。
对共享资源的并发访问会导致race condition,两个或多个进程访问共享资源并且同时试图改变共享资源时,会发生race condition,变量值会变得不可测。
# Python program to illustrate
# the concept of race condition
# in multiprocessing
import multiprocessing
# function to withdraw from account
def withdraw(balance):
for _ in range(10000):
balance.value = balance.value - 1
# function to deposit to account
def deposit(balance):
for _ in range(10000):
balance.value = balance.value + 1
def perform_transactions():
# initial balance (in shared memory)
balance = multiprocessing.Value('i', 100)
# creating new processes
p1 = multiprocessing.Process(target=withdraw, args=(balance,))
p2 = multiprocessing.Process(target=deposit, args=(balance,))
# starting processes
p1.start()
p2.start()
# wait until processes are finished
p1.join()
p2.join()
# print final balance
print("Final balance = {}".format(balance.value))
if __name__ == "__main__":
for _ in range(10):
# perform same transaction process 10 times
perform_transactions()
运行代码,可能会得到如下不可测的结果
Final balance = 1311
Final balance = 199
Final balance = 558
Final balance = -2265
Final balance = 1371
Final balance = 1158
Final balance = -577
Final balance = -1300
Final balance = -341
Final balance = 157
上面这段代码对100元存款进行1000次存款和1000次取款,预想中的最终结果应当认为100元,但是执行10次却得到了不一样的值,因为进程并发访问了shared data balance.
进程p1,p2可能的执行顺序和我们期待的顺序,如下图:
Lock
multiprocessing提供了Lock类来处理race conditions. Lock的实现采用了操作系统提供的Semaphore对象。信号量是一个同步对象,它控制多个进程对并行编程环境中的公共资源的访问。它只是操作系统(或内核)存储在指定位置的一个值,每个进程都可以检查并更改该值。根据信号量的值,进程可以使用该资源,或者会发现该资源已在使用中,必须等待一段时间才能重试。信号量可以是二进制的(0或1),也可以有其他值。通常,一个进程如果它使用了某个资源,则更改该量信号值,以便后续的检查该信号量的进程知道要等待该资源释放。
# Python program to illustrate
# the concept of locks
# in multiprocessing
import multiprocessing
# function to withdraw from account
def withdraw(balance, lock):
for _ in range(10000):
lock.acquire()
balance.value = balance.value - 1
lock.release()
# function to deposit to account
def deposit(balance, lock):
for _ in range(10000):
lock.acquire()
balance.value = balance.value + 1
lock.release()
def perform_transactions():
# initial balance (in shared memory)
balance = multiprocessing.Value('i', 100)
# creating a lock object
lock = multiprocessing.Lock()
# creating new processes
p1 = multiprocessing.Process(target=withdraw, args=(balance,lock))
p2 = multiprocessing.Process(target=deposit, args=(balance,lock))
# starting processes
p1.start()
p2.start()
# wait until processes are finished
p1.join()
p2.join()
# print final balance
print("Final balance = {}".format(balance.value))
if __name__ == "__main__":
for _ in range(10):
# perform same transaction process 10 times
perform_transactions()
# Python program to illustrate
# the concept of locks
# in multiprocessing
import multiprocessing
# function to withdraw from account
def withdraw(balance, lock):
for _ in range(10000):
lock.acquire()
balance.value = balance.value - 1
lock.release()
# function to deposit to account
def deposit(balance, lock):
for _ in range(10000):
lock.acquire()
balance.value = balance.value + 1
lock.release()
def perform_transactions():
# initial balance (in shared memory)
balance = multiprocessing.Value('i', 100)
# creating a lock object
lock = multiprocessing.Lock()
# creating new processes
p1 = multiprocessing.Process(target=withdraw, args=(balance,lock))
p2 = multiprocessing.Process(target=deposit, args=(balance,lock))
# starting processes
p1.start()
p2.start()
# wait until processes are finished
p1.join()
p2.join()
# print final balance
print("Final balance = {}".format(balance.value))
if __name__ == "__main__":
for _ in range(10):
# perform same transaction process 10 times
perform_transactions()
Pooling between processes
计算列表中各个值的平方:
# Python program to find
# squares of numbers in a given list
def square(n):
return (n*n)
if __name__ == "__main__":
# input list
mylist = [1,2,3,4,5]
# empty list to store result
result = []
for num in mylist:
result.append(square(num))
print(result)
上述代码值使用了一个CPU核而其他的CPU核可能都空闲着,为了利用所有的CPU核,multiprocessing提供了Pool类来表示一个进程池。
上图中,Pool自动地将任务分配到不同的核或进程中,用户不用显示的创建进程。
# Python program to understand
# the concept of pool
import multiprocessing
import os
def square(n):
print("Worker process id for {0}: {1}".format(n, os.getpid()))
return (n*n)
if __name__ == "__main__":
# input list
mylist = [1,2,3,4,5]
# creating a pool object
p = multiprocessing.Pool()
# map list to target function
result = p.map(square, mylist)
print(result)
Worker process id for 2: 4152
Worker process id for 1: 4151
Worker process id for 4: 4151
Worker process id for 3: 4153
Worker process id for 5: 4152
[1, 4, 9, 16, 25]
- 创建进程池:p = multiprocessing.Pool()
- 参数processes:指定工作进程数目
- 参数maxtasksperchild:指定每个子进程最大的任务数
- 参数initializer:指定工作进程的初始函数
- 参数initargs:initializer参数