python进阶——多进程

最新推荐文章于 2024-03-15 06:27:21 发布

happy__19

最新推荐文章于 2024-03-15 06:27:21 发布

阅读量3k

点赞数 1

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/scorpion_zs/article/details/53212954

版权

python 专栏收录该内容

17 篇文章 1 订阅

订阅专栏

因为GIL（全局解释器锁）的限制（GIL是用来保证在任意时刻只能有一个控制线程在执行），所以python中的多线程并非真正的多线程。只有python程序是I/O密集型应用时，多线程才会对运行效率有显著提高（因在等待I/O的时，会释放GIL允许其他线程继续执行），而在计算密集型应用中，多线程并没有什么用处。考虑到要充分利用多核CPU的资源，允许python可以并行处理一些任务，这里就用到了python多进程编程了。multiprocessing是python中的多进程模块，使用这个模块可以方便地进行多进程应用程序开发。multiprocessing模块中提供了：Process、Pool、Queue、Manager等组件。

1 Process类

1.1 构造方法

def __init__(self, group=None, target=None, name=None, args=(), kwargs={})

group：进程所属组，基本不用
target：进程调用对象（可以是一个函数名，也可以是一个可调用的对象（实现了__call__方法的类））
args：调用对象的位置参数元组
name：别名
kwargs：调用对象的关键字参数字典

1.2 实例方法

is_alive()：返回进程是否在运行
start()：启动进程，等待CPU调度
join([timeout])：阻塞当前上下文环境，直到调用此方法的进程终止或者到达指定timeout
terminate()：不管任务是否完成，立即停止该进程
run()：start()调用该方法，当实例进程没有传入target参数，stat()将执行默认的run()方法

1.3 属性

authkey：
daemon：守护进程标识，在start()调用之前可以对其进行修改
exitcode：进程的退出状态码
name：进程名
pid：进程id

1.4 实例

实例一：传入的target为一个函数

#!/usr/bin/python
#coding=utf-8

import time
import random
from multiprocessing import Process

def foo(i):
    print time.ctime(), "process the %d begin ......" %i
    time.sleep(random.uniform(1,3))
    print time.ctime(), "process the %d end !!!!!!" %i

if __name__ == "__main__":
    print time.ctime(), "process begin......"

    p_lst = list()
    for i in range(4):
        p_lst.append(Process(target=foo, args=(i,)))    #创建4个子进程
    #启动子进程
    for p in p_lst:
        p.start()
    #等待子进程全部结束
    for p in p_lst:
        p.join()

    print time.ctime(), "process end!!!!!"

实例二：传入的target为一个可调用对象

#!/usr/bin/python
#coding=utf-8

import time
import random
from multiprocessing import Process

class Foo(object):
    def __init__(self, i):
        self.i = i

    def __call__(self):
        '''
        使Foo的实例对象成为可调用对象
        '''                                                                                                        
        print time.ctime(), "process the %d begin ......" %self.i
        time.sleep(random.uniform(1,3))
        print time.ctime(), "process the %d end !!!!!!" %self.i

if __name__ == "__main__":
    print time.ctime(), "process begin......"

    p_lst = list()
    for i in range(4):
        p_lst.append(Process(target=Foo(i)))    #创建4个子进程
    #启动子进程
    for p in p_lst:
        p.start()
    #等待子进程全部结束
    for p in p_lst:
        p.join()

    print time.ctime(), "process end!!!!!"

实例三：派生Process子类，并创建子类的实例

#!/usr/bin/python                                                                                                  
#coding=utf-8

import time
import random
from multiprocessing import Process

class MyProcess(Process):

    def __init__(self, i):
        Process.__init__(self)
        self.i = i

    def run(self):
        '''
        #重写run方法，当调用start方法时，就会调用当前重写的run方法中的程序
        '''
        print time.ctime(), "process the %d begin ......" %self.i
        time.sleep(random.uniform(1,3))
        print time.ctime(), "process the %d end !!!!!!" %self.i

if __name__ == "__main__":
    print time.ctime(), "process begin......"

    p_lst = list()
    for i in range(4):
        p_lst.append(MyProcess(i))  #创建4个子进程
    #启动子进程
    for p in p_lst:
        p.start()
    #等待子进程全部结束
    for p in p_lst:
        p.join()

    print time.ctime(), "process end!!!!!"

2 Pool类

当使用Process类管理非常多（几十上百个）的进程时，就会显得比较繁琐，这是就可以使用Pool（进程池）来对进程进行统一管理。当池中进程已满时，有新进程请求执行时，就会被阻塞，直到池中有进程执行结束，新的进程请求才会被放入池中并执行。

2.1 构造方法

def __init__(self, processes=None, initializer=None, initargs=(),                 maxtasksperchild=None)

processes：池中可容纳的工作进程数量，默认情况使用os.cpu_count()返回的数值，一般默认即可
其他参数暂不清楚有什么用处……

2.2 实例方法

apply(self, func, args=(), kwds={})：阻塞型进程池，会阻塞主进程，直到工作进程全部退出，一般不用这个
apply_async(self, func, args=(), kwds={}, callback=None)：非阻塞型进程池
map(self, func, iterable, chunksize=None)：与内置map行为一致，它会阻塞主进程，直到map运行结束
map_async(self, func, iterable, chunksize=None, callback=None)：非阻塞版本的map
close()：关闭进程池，不在接受新任务
terminate()：结束工作进程
join()：阻塞主进程等待子进程退出，该方法必须在close或terminate之后执行

2.3 实例

#!/usr/bin/python
#coding=utf-8

import time
import random
from multiprocessing import Pool

def foo(i):
    print time.ctime(), "process the %d begin ......" %i
    time.sleep(random.uniform(1,3))
    print time.ctime(), "process the %d end !!!!!!" %i

if __name__ == "__main__":

    print time.ctime(), "process begin......"
    pool = Pool(processes = 2)  #设置进程池中最大并行工作进程数为2                                                 
    for i in range(4):
        pool.apply_async(foo, args=(i,))    #提交4个子进程任务

    pool.close()
    pool.join()

    print time.ctime(), "process end!!!!!"

结果：

Fri Nov 18 13:57:22 2016 process begin......
Fri Nov 18 13:57:22 2016 process the 0 begin ......
Fri Nov 18 13:57:22 2016 process the 1 begin ......
Fri Nov 18 13:57:23 2016 process the 1 end !!!!!!
Fri Nov 18 13:57:23 2016 process the 2 begin ......
Fri Nov 18 13:57:24 2016 process the 0 end !!!!!!
Fri Nov 18 13:57:24 2016 process the 3 begin ......
Fri Nov 18 13:57:25 2016 process the 2 end !!!!!!
Fri Nov 18 13:57:25 2016 process the 3 end !!!!!!
Fri Nov 18 13:57:25 2016 process end!!!!!

3 Queue类

Queue主要提供进程间通信以及共享数据等功能。除Queue外还可以使用Pipes实现进程间通信（Pipes是两个进程间进行通信）

3.1 构造方法

def __init__(self, maxsize=0)

maxsize：用于设置队列最大长度，当为maxsize<=0时，队列的最大长度会被设置为一个非常大的值（我的系统中队列最大长度被设置为2147483647）

3.2 实例方法

put(self, obj, block=True, timeout=None)

1、block为True，若队列已满，并且timeout为正值，该方法会阻塞timeout指定的时间，直到队列中有出现剩余空间，如果超时，会抛出Queue.Full异常
2、block为False，若队列已满，立即抛出Queue.Full异常

get(self, block=True, timeout=None)

block为True，若队列为空，并且timeout为正值，该方法会阻塞timeout指定的时间，直到队列中有出现新的数据，如果超时，会抛出Queue.Empty异常
block为False，若队列为空，立即抛出Queue.Empty异常

3.3 实例

#!/usr/bin/python
#coding=utf-8

import time
import random
from multiprocessing import Process, Queue

def write(q):
    for value in "abcd":
        print time.ctime(), "put %s to queue" %value
        q.put(value)
        time.sleep(random.random())

def read(q):
    while True:
        value = q.get()
        print time.ctime(), "get %s from queue" %value

if __name__ == "__main__":
    #主进程创建Queue，并作为参数传递给子进程
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    #启动子进程pw，往Queue中写入
    pw.start()
    #启动子进程pr，从Queue中读取
    pr.start()
    #等待写进程执行结束
    pw.join()
    #终止读取进程                                                                                                  
    pr.terminate()

运行结果：

Fri Nov 18 15:04:13 2016 put a to queue
Fri Nov 18 15:04:13 2016 get a from queue
Fri Nov 18 15:04:13 2016 put b to queue
Fri Nov 18 15:04:13 2016 get b from queue
Fri Nov 18 15:04:13 2016 put c to queue
Fri Nov 18 15:04:13 2016 get c from queue
Fri Nov 18 15:04:13 2016 put d to queue
Fri Nov 18 15:04:13 2016 get d from queue

4 Manager类

Manager是进程间数据共享的高级接口。
Manager()返回的manager对象控制了一个server进程，此进程包含的python对象可以被其他的进程通过proxies来访问。从而达到多进程间数据通信且安全。Manager支持的类型有list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value和Array。
如下是使用Manager管理一个用于多进程共享的dict数据

#!/usr/bin/python
#coding=utf-8

import time
import random
from multiprocessing import Manager, Pool

def worker(d, key, value):
    print time.ctime(), "insert the k-v pair to dict begin: {%d: %d}" %(key, value)
    time.sleep(random.uniform(1,2))
    d[key] = value  #访问共享数据
    print time.ctime(), "insert the k-v pair to dict end: {%d: %d}" %(key, value)


if __name__ == "__main__":
    print time.ctime(), "process for manager begin"
    mgr = Manager()
    d = mgr.dict()
    pool = Pool(processes=4)                                                                                       
    for i in range(10):
        pool.apply_async(worker, args=(d, i, i*i))

    pool.close()
    pool.join()
    print "Result:"
    print d
    print time.ctime(), "process for manager end"

运行结果

Fri Nov 18 16:36:19 2016 process for manager begin
Fri Nov 18 16:36:19 2016 insert the k-v pair to dict begin: {0: 0}
Fri Nov 18 16:36:19 2016 insert the k-v pair to dict begin: {1: 1}
Fri Nov 18 16:36:19 2016 insert the k-v pair to dict begin: {2: 4}
Fri Nov 18 16:36:19 2016 insert the k-v pair to dict begin: {3: 9}
Fri Nov 18 16:36:20 2016 insert the k-v pair to dict end: {3: 9}
Fri Nov 18 16:36:20 2016 insert the k-v pair to dict begin: {4: 16}
Fri Nov 18 16:36:20 2016 insert the k-v pair to dict end: {0: 0}
Fri Nov 18 16:36:20 2016 insert the k-v pair to dict begin: {5: 25}
Fri Nov 18 16:36:21 2016 insert the k-v pair to dict end: {2: 4}
Fri Nov 18 16:36:21 2016 insert the k-v pair to dict begin: {6: 36}
Fri Nov 18 16:36:21 2016 insert the k-v pair to dict end: {1: 1}
Fri Nov 18 16:36:21 2016 insert the k-v pair to dict begin: {7: 49}
Fri Nov 18 16:36:21 2016 insert the k-v pair to dict end: {5: 25}
Fri Nov 18 16:36:21 2016 insert the k-v pair to dict begin: {8: 64}
Fri Nov 18 16:36:22 2016 insert the k-v pair to dict end: {4: 16}
Fri Nov 18 16:36:22 2016 insert the k-v pair to dict begin: {9: 81}
Fri Nov 18 16:36:23 2016 insert the k-v pair to dict end: {8: 64}
Fri Nov 18 16:36:23 2016 insert the k-v pair to dict end: {6: 36}
Fri Nov 18 16:36:23 2016 insert the k-v pair to dict end: {7: 49}
Fri Nov 18 16:36:23 2016 insert the k-v pair to dict end: {9: 81}
Result:
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
Fri Nov 18 16:36:23 2016 process for manager end