Multiprcessing in Python

原文:Multiprocessing in Python | Set 1 (Introduction)
原文:Multiprocessing in Python | Set 2 (Communication between processes)
原文:Synchronization and Pooling of processes in Python

what is multiprocessing?

multiprocessing指计算机能够支持多个处理器同时运作。multiprocessing系统的应用被划分为多个小程序独立运行。

why multiprocessing?

单进程的系统在处理多进程时,只能将进程悬挂,轮时间片。multiprocessing系统可以是多处理器,也可以是多核处理器。CPU可以同时执行多个任务,每个任务有自己独立的处理器。

multiprocessing模块

创建首个程序

# importing the multiprocessing module 
import multiprocessing 

def print_cube(num): 
	""" 
	function to print cube of given num 
	"""
	print("Cube: {}".format(num * num * num)) 

def print_square(num): 
	""" 
	function to print square of given num 
	"""
	print("Square: {}".format(num * num)) 

if __name__ == "__main__": 
	# creating processes 
	p1 = multiprocessing.Process(target=print_square, args=(10, )) 
	p2 = multiprocessing.Process(target=print_cube, args=(10, )) 

	# starting process 1 
	p1.start() 
	# starting process 2 
	p2.start() 

	# wait until process 1 is finished 
	p1.join() 
	# wait until process 2 is finished 
	p2.join() 

	# both processes finished 
	print("Done!") 

Done!
Square: 100
Cube: 1000
  • 导入multiprocess
  • 创建进程Process()
    • target: 进程执行的函数
    • arge: target函数的参数
  • 开启进程:start()
  • 直到进程p执行完成再执行当前程序:p.join()

进程ID

# importing the multiprocessing module 
import multiprocessing
import os

def worker1():
    # printing process id
    print("ID of process running worker1: {}".format(os.getpid()))

def worker2():
    # printing process id
    print("ID of process running worker2: {}".format(os.getpid()))

if __name__ == "__main__":
    # printing main program process id
    print("ID of main process: {}".format(os.getpid()))

    # creating processes
    p1 = multiprocessing.Process(target=worker1)
    p2 = multiprocessing.Process(target=worker2)

    # starting processes
    p1.start()
    p2.start()

    # process IDs
    print("ID of process p1: {}".format(p1.pid))
    print("ID of process p2: {}".format(p2.pid))

    # wait until processes are finished
    p1.join()
    p2.join()

    # both processes finished
    print("Both processes finished execution!")

    # check if processes are alive
    print("Process p1 is alive: {}".format(p1.is_alive()))
    print("Process p2 is alive: {}".format(p2.is_alive()))

ID of main process: 22008
ID of process p1: 5336
ID of process p2: 25660
ID of process running worker1: 5336
ID of process running worker2: 25660
Both processes finished execution!
Process p1 is alive: False
Process p2 is alive: False
  • 主python脚本的进程id和multiprocessing模块产出的新的进程的id不同,os.getpid()可以获取当前运行的进程的pid
  • 每个进程独立运行,有自己不同的内存空间
  • 一旦子进程执行完目标函数,进程就会立即终止,is_alive返回进程当前是否存活

独立内存空间

import multiprocessing

# empty list with global scope
result = []

def square_list(mylist):
    """
    function to square a given list
    """
    global result
    # append squares of mylist to global list result
    for num in mylist:
        result.append(num * num)
    # print global list result
    print("Result(in process p1): {}".format(result))

if __name__ == "__main__":
    # input list
    mylist = [1,2,3,4]

    # creating new process
    p1 = multiprocessing.Process(target=square_list, args=(mylist,))
    # starting process
    p1.start()
    # wait until process is finished
    p1.join()

    # print global result list
    print("Result(in main program): {}".format(result))

Result(in process p1): [1, 4, 9, 16]
Result(in main program): []

在上面的程序中,有两个地方打印了result这个全局变量:

  • square_list函数,进程p1调用该函数,只有p1进程的内存空间中的result变量的值会改变
  • 在主程序执行完p1进程后打印result. 因为主程序运行在不同的进程中,该进程中的result变量还是一个空列表。
    sharing data between processes

多进程共享数据

shared memory object

multiprocessing模块提供Array 和Value对象来实现进程间的数据共享

  • Array: shared memory分配的ctype数组
  • Value: shared memory 分配的ctype对象
import multiprocessing

def square_list(mylist, result, square_sum):
    """
    function to square a given list
    """
    # append squares of mylist to result array
    for idx, num in enumerate(mylist):
        result[idx] = num * num

    # square_sum value
    square_sum.value = sum(result)

    # print result Array
    print("Result(in process p1): {}".format(result[:]))

    # print square_sum Value
    print("Sum of squares(in process p1): {}".format(square_sum.value))

if __name__ == "__main__":
    # input list
    mylist = [1,2,3,4]

    # creating Array of int data type with space for 4 integers
    result = multiprocessing.Array('i', 4)

    # creating Value of int data type
    square_sum = multiprocessing.Value('i')

    # creating new process
    p1 = multiprocessing.Process(target=square_list, args=(mylist, result, square_sum))

    # starting process
    p1.start()

    # wait until process is finished
    p1.join()

    # print result array
    print("Result(in main program): {}".format(result[:]))

    # print square_sum Value
    print("Sum of squares(in main program): {}".format(square_sum.value))
Result(in process p1): [1, 4, 9, 16]
Sum of squares(in process p1): 30
Result(in main program): [1, 4, 9, 16]
Sum of squares(in main program): 30
  • 创建multiprocessing.Array对象result:
  result = multiprocessing.Array('i', 4)

第一个参数为data type,‘i’为integer, 'd’为float data type。
第二个参数为数组的size。

  • 创建multiprocessing.Value对象square_sum:
  square_sum = multiprocessing.Value('i', 10)
  • 将Array,Value对象传递给子进程
p1 = multiprocessing.Process(target=square_list, args=(mylist, result, square_sum))
  • 给Array,Value对象赋值
 for idx, num in enumerate(mylist):
      result[idx] = num * num
 square_sum.value = sum(result)

shared memory

Server Process

当python程序开始时,同时也开始了一个server process进程。这时,当需要开一个新进程时,父进程会与server process通信并要求fork一个新的进程。
server process容纳所有的python对象,并且允许其他的进程通过代理操作这些python对象。
multiprocessing模块提供了Manager类来控制server process。managers提供了创建进程共享的数据方法。

server process管理器比shared memory对象更灵活,其支持任意的数据类型,比如列表,字典,队列,Value,Array等,同一个网络中的不同计算机之间通过进程可以共享一个manager,但是它的速度比shared memory对象慢。

import multiprocessing

def print_records(records):
    """
    function to print record(tuples) in records(list)
    """
    for record in records:
        print("Name: {0}\nScore: {1}\n".format(record[0], record[1]))

def insert_record(record, records):
    """
    function to add a new record to records(list)
    """
    records.append(record)
    print("New record added!\n")

if __name__ == '__main__':
    with multiprocessing.Manager() as manager:
        # creating a list in server process memory
        records = manager.list([('Sam', 10), ('Adam', 9), ('Kevin',9)])
        # new record to be inserted in records
        new_record = ('Jeff', 8)

        # creating new processes
        p1 = multiprocessing.Process(target=insert_record, args=(new_record, records))
        p2 = multiprocessing.Process(target=print_records, args=(records,))

        # running process p1 to insert new record
        p1.start()
        p1.join()

        # running process p2 to print records
        p2.start()
        p2.join()

New record added!
Name: Sam
Score: 10
Name: Adam
Score: 9
Name: Kevin
Score: 9
Name: Jeff
Score: 8

在这里插入图片描述

进程间通信

高效的多进程程序通常需要进程间的通信,以便划分任务整合结果,multiprocessing支持两种进程间的通信模式(communocation channel):

  • Queue
  • Pipe

Queue

使用Queue在进程间来回的传递python object。multiprocessing.Queue类似于queue.Queue。

import multiprocessing

def square_list(mylist, q):
    """
    function to square a given list
    """
    # append squares of mylist to queue
    for num in mylist:
        q.put(num * num)

def print_queue(q):
    """
    function to print queue elements
    """
    print("Queue elements:")
    while not q.empty():
        print(q.get())
    print("Queue is now empty!")

if __name__ == "__main__":
    # input list
    mylist = [1,2,3,4]

    # creating multiprocessing Queue
    q = multiprocessing.Queue()

    # creating new processes
    p1 = multiprocessing.Process(target=square_list, args=(mylist, q))
    p2 = multiprocessing.Process(target=print_queue, args=(q,))

    # running process p1 to square list
    p1.start()
    p1.join()

    # running process p2 to get queue elements
    p2.start()
    p2.join()
Queue elements:
1
4
9
16
Queue is now empty!

在这里插入图片描述

Pipe

pipe有两个通信端,因此在双向通信中更倾向使用Pipe而非Queue。Pipe()函数返回一对connection对象,分别表示通信的两端,每个connection对象都有send()和recv()方法。

import multiprocessing

def sender(conn, msgs):
    """
    function to send messages to other end of pipe
    """
    for msg in msgs:
        conn.send(msg)
        print("Sent the message: {}".format(msg))
    conn.close()

def receiver(conn):
    """
    function to print the messages received from other
    end of pipe
    """
    while 1:
        msg = conn.recv()
        if msg == "END":
            break
        print("Received the message: {}".format(msg))

if __name__ == "__main__":
    # messages to be sent
    msgs = ["hello", "hey", "hru?", "END"]

    # creating a pipe
    parent_conn, child_conn = multiprocessing.Pipe()

    # creating new processes
    p1 = multiprocessing.Process(target=sender, args=(parent_conn,msgs))
    p2 = multiprocessing.Process(target=receiver, args=(child_conn,))

    # running processes
    p1.start()
    p2.start()

    # wait until processes finish
    p1.join()
    p2.join()

Sent the message: hello
Sent the message: hey
Sent the message: hru?
Sent the message: END
Received the message: hello
Received the message: hey
Received the message: hru?

在这里插入图片描述
如果两个进程(线程)同时从pipe的一端读数据或者写数据,可能会使pipe中的数据出现错误,而Queues在进程之间进行了同步,代价是增加复杂性。因此,队列被称为线程和进程安全的!

进程间的同步

进程间的同步机制能够保证两个及两个以上的并发进程不能同时对临界区(Critical Section)操作,临界区是每个进程中访问临界资源的代码片段。
在这里插入图片描述
对共享资源的并发访问会导致race condition,两个或多个进程访问共享资源并且同时试图改变共享资源时,会发生race condition,变量值会变得不可测。

# Python program to illustrate 
# the concept of race condition 
# in multiprocessing 
import multiprocessing 

# function to withdraw from account 
def withdraw(balance):	 
	for _ in range(10000): 
		balance.value = balance.value - 1

# function to deposit to account 
def deposit(balance):	 
	for _ in range(10000): 
		balance.value = balance.value + 1

def perform_transactions(): 

	# initial balance (in shared memory) 
	balance = multiprocessing.Value('i', 100) 

	# creating new processes 
	p1 = multiprocessing.Process(target=withdraw, args=(balance,)) 
	p2 = multiprocessing.Process(target=deposit, args=(balance,)) 

	# starting processes 
	p1.start() 
	p2.start() 

	# wait until processes are finished 
	p1.join() 
	p2.join() 

	# print final balance 
	print("Final balance = {}".format(balance.value)) 

if __name__ == "__main__": 
	for _ in range(10): 

		# perform same transaction process 10 times 
		perform_transactions() 

运行代码,可能会得到如下不可测的结果

Final balance = 1311
Final balance = 199
Final balance = 558
Final balance = -2265
Final balance = 1371
Final balance = 1158
Final balance = -577
Final balance = -1300
Final balance = -341
Final balance = 157

上面这段代码对100元存款进行1000次存款和1000次取款,预想中的最终结果应当认为100元,但是执行10次却得到了不一样的值,因为进程并发访问了shared data balance.
进程p1,p2可能的执行顺序和我们期待的顺序,如下图:
在这里插入图片描述
在这里插入图片描述

Lock

multiprocessing提供了Lock类来处理race conditions. Lock的实现采用了操作系统提供的Semaphore对象。信号量是一个同步对象,它控制多个进程对并行编程环境中的公共资源的访问。它只是操作系统(或内核)存储在指定位置的一个值,每个进程都可以检查并更改该值。根据信号量的值,进程可以使用该资源,或者会发现该资源已在使用中,必须等待一段时间才能重试。信号量可以是二进制的(0或1),也可以有其他值。通常,一个进程如果它使用了某个资源,则更改该量信号值,以便后续的检查该信号量的进程知道要等待该资源释放。

# Python program to illustrate 
# the concept of locks 
# in multiprocessing 
import multiprocessing 

# function to withdraw from account 
def withdraw(balance, lock):	 
	for _ in range(10000): 
		lock.acquire() 
		balance.value = balance.value - 1
		lock.release() 

# function to deposit to account 
def deposit(balance, lock):	 
	for _ in range(10000): 
		lock.acquire() 
		balance.value = balance.value + 1
		lock.release() 

def perform_transactions(): 

	# initial balance (in shared memory) 
	balance = multiprocessing.Value('i', 100) 

	# creating a lock object 
	lock = multiprocessing.Lock() 

	# creating new processes 
	p1 = multiprocessing.Process(target=withdraw, args=(balance,lock)) 
	p2 = multiprocessing.Process(target=deposit, args=(balance,lock)) 

	# starting processes 
	p1.start() 
	p2.start() 

	# wait until processes are finished 
	p1.join() 
	p2.join() 

	# print final balance 
	print("Final balance = {}".format(balance.value)) 

if __name__ == "__main__": 
	for _ in range(10): 

		# perform same transaction process 10 times 
		perform_transactions() 

# Python program to illustrate 
# the concept of locks 
# in multiprocessing 
import multiprocessing 

# function to withdraw from account 
def withdraw(balance, lock):	 
	for _ in range(10000): 
		lock.acquire() 
		balance.value = balance.value - 1
		lock.release() 

# function to deposit to account 
def deposit(balance, lock):	 
	for _ in range(10000): 
		lock.acquire() 
		balance.value = balance.value + 1
		lock.release() 

def perform_transactions(): 

	# initial balance (in shared memory) 
	balance = multiprocessing.Value('i', 100) 

	# creating a lock object 
	lock = multiprocessing.Lock() 

	# creating new processes 
	p1 = multiprocessing.Process(target=withdraw, args=(balance,lock)) 
	p2 = multiprocessing.Process(target=deposit, args=(balance,lock)) 

	# starting processes 
	p1.start() 
	p2.start() 

	# wait until processes are finished 
	p1.join() 
	p2.join() 

	# print final balance 
	print("Final balance = {}".format(balance.value)) 

if __name__ == "__main__": 
	for _ in range(10): 

		# perform same transaction process 10 times 
		perform_transactions() 

Pooling between processes

计算列表中各个值的平方:

# Python program to find 
# squares of numbers in a given list 
def square(n): 
	return (n*n) 

if __name__ == "__main__": 

	# input list 
	mylist = [1,2,3,4,5] 

	# empty list to store result 
	result = [] 

	for num in mylist: 
		result.append(square(num)) 

	print(result) 

在这里插入图片描述
上述代码值使用了一个CPU核而其他的CPU核可能都空闲着,为了利用所有的CPU核,multiprocessing提供了Pool类来表示一个进程池。
在这里插入图片描述
上图中,Pool自动地将任务分配到不同的核或进程中,用户不用显示的创建进程。

# Python program to understand 
# the concept of pool 
import multiprocessing 
import os 

def square(n): 
	print("Worker process id for {0}: {1}".format(n, os.getpid())) 
	return (n*n) 

if __name__ == "__main__": 
	# input list 
	mylist = [1,2,3,4,5] 

	# creating a pool object 
	p = multiprocessing.Pool() 

	# map list to target function 
	result = p.map(square, mylist) 

	print(result) 

Worker process id for 2: 4152
Worker process id for 1: 4151
Worker process id for 4: 4151
Worker process id for 3: 4153
Worker process id for 5: 4152
[1, 4, 9, 16, 25]
  • 创建进程池:p = multiprocessing.Pool()
    • 参数processes:指定工作进程数目
    • 参数maxtasksperchild:指定每个子进程最大的任务数
    • 参数initializer:指定工作进程的初始函数
    • 参数initargs:initializer参数
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值