多进程
由于Python的GIL全局解释器锁的存在,多线程未必是CPU密集型程序的最好的选择多进程可以完全独立的进程环境中运行程序,可以较充分地利用多处理器。但是进程本身的隔离带来的数据不共享也是一个问题。而且线程比进程轻量级。
multiprocessing
process 类
- 在multiprocessing模块中,通过创建一个Process对象,然后调用它的start()方法来生成进程,Process和threading.Thread API 相同
- 多进程的代码块要放在__name__ == "__ main__"语句块内.
下例为简单的创建一个进程
from multiprocessing import Process
def fn(name):
print('hello',name)
if __name__ == '__main__':
p = Process(target=fn,args=('bob',))
p.start()
p.join()
-------------------------------------------------
hello bob
from multiprocessing import Process
import datetime
def calc(i):
sum = 0
for x in range(10000000):
sum += 1
print(sum)
return i,sum
if __name__ == '__main__':
start = datetime.datetime.now()
ps = []
for i in range(3):
p = Process(target=calc,args=(i,),name='calc{}'.format(i+1))
ps.append(p)
p.start()
for p in ps:
p.join()
print(p.name,p.exitcode) #进程名和进程退出状态码 0为正常退出,非0为非正常退出
delta = (datetime.datetime.now()-start).total_seconds()
print(delta)
print('===end===')
---------------------------------------------------------------
10000000 #每个进程都执行了10000000的递增
calc1 0
10000000
10000000`
calc2 0
calc3 0
1.468405
===end===
名称 | 说明 |
---|---|
pid | 进程id |
exitcode | 进程的状态退退出码 |
terminate() | 终止指定的进程 |
使用方法:
p1.terminate() #p1进程终止结束
print(p1.name,p1.exitcode)
进程间同步
python在进程间同步提供了和线程同步一样的类,使用方法一样,使用的效果也类似
不过进程间代价要高于线程,而且系统底层实现是不同的,只不过python屏蔽了这些不同之处.
multiprocessing还提供了共享内存,服务器进程来共享数据,还挺了用于进程间通信的Queue队列,P撇管道
- 通信方式不同
- 多进程就是启动多个解释器进程,进程间通信必须序列化、反序列化
- 数据的线程安全性问题,如果每个进程中没有实现多线程,GIL可以说没什么用了
进程池
multiprocessing.pool是进程池类
用法:
pool = multiprocessing.Pool(5) #新建进程池,包含5个进程资源
用进程池,进程资源的复用,目的就是为了减少频繁的创建,销毁进程.
名称 | 说明 |
---|---|
apply(self,func,args=(),kwds={}) | 阻塞执行,导致主进程执行其它子进程一个个执行 |
apply_async(self,func,args=(),kwds={},clallback=None,error_callback=None) | 与apply方法用法一致,非阻塞异步执行,得到结果后会执行回调 |
closs() | 关闭池,池不能再接收新任务,所有任务完成后退出进程 |
terminate() | 立即接收工作进程,不再处理未处理的任务 |
join() | 主进程阻塞等待子进程的退出,join方法要在close或terminate之后使用 |
使用方法:
for i in range(3):
ret = pool.apply(calc,args=(i,))
ret = pool.apply_async(calc,args=(i,))
ret = pool.apply_async(calc,args=(i,),callback=lambda ret:logging.info("{} in callback".format(ret)))
import multiprocessing
import datetime
import logging
logging.basicConfig(level=logging.INFO, format="%(process)d %(processName)s %(thread)d %(message)s")
def calc(i):
sum = 0
for x in range(1000000):
sum += 1
logging.info(sum)
return sum,i
if __name__=='__main__':
start = datetime.datetime.now()
pool = multiprocessing.Pool(3)
for i in range(3):
ret = pool.apply(calc,args=(i,)) #apply 同步阻塞,等待第一个进程计算结束后,第二个进程才运行
# print(ret,'++++++++++++++++++++')
pool.close() # close会等到所有任务结束后再关闭进程池.
pool.join()
# for i in range(3):
# ret = pool.apply_async(calc,args=(i,)) #异步不阻塞
# print(ret,'~~~~~~~~~~~~~~~~~~~~~~~~~`')
# pool.close() # close会等到所有任务结束后再关闭进程池.
# pool.join()
# for i in range(3):
# ret = pool.apply_async(calc,args=(i,),callback=lambda ret:logging.info("{} in callback".format(ret))) #异步不阻塞,回调函数,回调是在主进程中回调,得到的结果是工作进程结束后的返回值
# # print(ret,'*******************')
# pool.close() # close会等到所有任务结束后再关闭进程池.
# pool.join()
delta = (datetime.datetime.now()-start).total_seconds()
print(delta)
print('===========================')
----------------------------------------------------------------
8944 SpawnPoolWorker-2 7908 1000000
6836 SpawnPoolWorker-3 15236 1000000
9572 SpawnPoolWorker-1 7012 1000000
0.461763
===========================
多进程,多线程的选择
1,CPU密集型:CPython中使用到了GIL,多线程的时候锁相互竞争,且多核优势不能发挥,选用Python多进程效率更高.(以计算为主)
2,IO密集型:在Python中适合是用多线程,可以减少多进程间IO的序列化开销。且在IO等待的时候,切换到其他线程继续执行,效率不错.
- 应用
- 请求/应答模型:WEB应用中常见的处理模型
- master启动多个worker工作进程,一般和CPU数目相同。发挥多核优势。
- worker工作进程中,往往需要操作网络IO和磁盘IO,启动多线程,提高并发处理能力。worker处理用户的请求,往往需要等待数据,处理完请求还要通过网络IO返回响应。这就是nginx工作模式
concurrent包
concurrent.futures 模块
异步并行任务编程模块,提供一个高级的异步执行的便利接口
提供了2个池执行器
- ThreadPoolExecutor 异步调用的线程池Executor
- ProcessPoolExecutor 异步调用的进程池的Executor
ThreadPoolExecutor对象
- 首先要定义一个池的执行器对象,Executor类子类对象
方法 | 含义 |
---|---|
ThreadPoolExecutor(max_workers=1) | 池中至多创建max_workers个线程的池来同时异步执行,返回Executor实例 |
submit(fn,*args,**kwargs) | 提交执行的函数及其参数,返回Future类的实例 |
shutdown(wait=True) | 清理池 |
使用方法:
executer = futures.ThreadPoolExecutor(max_workers=3) #创建3个线程的池
future =executer.submit(fnuc,x) # 提交执行的函数及其参数,返回Future类的实例
executer.shutdown() #清理池,回收内存
Future类
方法 | 含义 |
---|---|
done() | 如果调用被成功的取消或者被执行完成,返回True |
cancelled() | 如果调用被成功的取消,返回True |
running() | 如果正在运行,且不能取消,返回Ture |
cancel() | 尝试取消调用,如果已经执行,且不能取消,返回False,否则返回True |
result(timeout=None) | 取返回的结果,timeout为None,一直等待返回,timeout设置到期,抛出councurrent.futures.TtimeoutError异常 |
import threading
from concurrent import futures
import logging
import time
FORMAT = '%(asctime)-15s\t [%(processName)s:%(threadName)s, %(process)d:%(thread)8d] %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
def worker(n):
logging.info('begin to work-{}'.format(n))
time.sleep(2)
logging.info('finished {}'.format(n))
executer = futures.ThreadPoolExecutor(max_workers=2)
fs =[]
for i in range(2):
future =executer.submit(worker,i)
fs.append(future)
for i in range(2,4):
future = executer.submit(worker,i)
fs.append(future)
while True:
time.sleep(2)
logging.info((threading.enumerate(),'+++++++++++++++++++++++++')) #活着的线程数,(包括主线程和线程池类的数)
flag = True
for f in fs:
logging.info((f.done,'~~~~~~~~~~~~~~~`'))
flag = flag and f.done
print('---------------')
if flag:
executer.shutdown()
logging.info((threading.enumerate(),'==================================='))
break
------------------------------------------------------
2019-06-11 15:27:01,277 [MainProcess:ThreadPoolExecutor-0_0, 17652: 12020] begin to work-0
2019-06-11 15:27:01,278 [MainProcess:ThreadPoolExecutor-0_1, 17652: 13024] begin to work-1
---------------
2019-06-11 15:27:03,287 [MainProcess:ThreadPoolExecutor-0_1, 17652: 13024] finished 1
2019-06-11 15:27:03,288 [MainProcess:MainThread, 17652: 16656] ([<_MainThread(MainThread, started 16656)>, <Thread(ThreadPoolExecutor-0_0, started daemon 12020)>, <Thread(ThreadPoolExecutor-0_1, started daemon 13024)>], '+++++++++++++++++++++++++')
2019-06-11 15:27:03,288 [MainProcess:ThreadPoolExecutor-0_0, 17652: 12020] finished 0
2019-06-11 15:27:03,288 [MainProcess:ThreadPoolExecutor-0_1, 17652: 13024] begin to work-2
2019-06-11 15:27:03,289 [MainProcess:MainThread, 17652: 16656] (<bound method Future.done of <Future at 0x12bd5073d68 state=finished returned NoneType>>, '~~~~~~~~~~~~~~~`')
2019-06-11 15:27:03,290 [MainProcess:ThreadPoolExecutor-0_0, 17652: 12020] begin to work-3
2019-06-11 15:27:03,290 [MainProcess:MainThread, 17652: 16656] (<bound method Future.done of <Future at 0x12bd532e198 state=finished returned NoneType>>, '~~~~~~~~~~~~~~~`')
2019-06-11 15:27:03,291 [MainProcess:MainThread, 17652: 16656] (<bound method Future.done of <Future at 0x12bd53393c8 state=running>>, '~~~~~~~~~~~~~~~`')
2019-06-11 15:27:03,291 [MainProcess:MainThread, 17652: 16656] (<bound method Future.done of <Future at 0x12bd5339400 state=running>>, '~~~~~~~~~~~~~~~`')
2019-06-11 15:27:05,301 [MainProcess:ThreadPoolExecutor-0_1, 17652: 13024] finished 2
2019-06-11 15:27:05,301 [MainProcess:ThreadPoolExecutor-0_0, 17652: 12020] finished 3
2019-06-11 15:27:05,301 [MainProcess:MainThread, 17652: 16656] ([<_MainThread(MainThread, started 16656)>], '===================================')
Process finished with exit code 0
ProcessPoolExecutor 对象
方法和ProcessPoolExeutor一样,就是使用多进程完成
import threading
from concurrent import futures
import logging
import time
FORMAT = '%(asctime)-15s\t [%(processName)s:%(threadName)s, %(process)d:%(thread)8d] %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
def worker(n):
logging.info('begin to work-{}'.format(n))
time.sleep(2)
logging.info('finished {}'.format(n))
if __name__ == '__main__':
executer = futures.ProcessPoolExecutor(max_workers=2)
fs =[]
for i in range(2):
future =executer.submit(worker,i)
fs.append(future)
for i in range(2,4):
future = executer.submit(worker,i)
fs.append(future)
while True:
time.sleep(2)
logging.info((threading.enumerate(),'+++++++++++++++++++++++++'))
flag = True
for f in fs:
logging.info((f.done,'~~~~~~~~~~~~~~~`'))
flag = flag and f.done
print('---------------')
if flag:
executer.shutdown()
logging.info((threading.enumerate(),'==================================='))
break
-------------------------------------------------------------------
2019-06-11 15:21:09,553 [Process-2:MainThread, 8036: 3356] begin to work-0
2019-06-11 15:21:09,553 [Process-1:MainThread, 8064: 13140] begin to work-1
2019-06-11 15:21:11,400 [MainProcess:MainThread, 17288: 8956] ([<_MainThread(MainThread, started 8956)>, <Thread(Thread-1, started daemon 17900)>, <Thread(QueueFeederThread, started daemon 12648)>], '+++++++++++++++++++++++++')
2019-06-11 15:21:11,400 [MainProcess:MainThread, 17288: 8956] (<bound method Future.done of <Future at 0x25ee72866d8 state=running>>, '~~~~~~~~~~~~~~~`')
2019-06-11 15:21:11,400 [MainProcess:MainThread, 17288: 8956] (<bound method Future.done of <Future at 0x25ee74b0ac8 state=running>>, '~~~~~~~~~~~~~~~`')
2019-06-11 15:21:11,400 [MainProcess:MainThread, 17288: 8956] (<bound method Future.done of <Future at 0x25ee74e0c50 state=running>>, '~~~~~~~~~~~~~~~`')
2019-06-11 15:21:11,401 [MainProcess:MainThread, 17288: 8956] (<bound method Future.done of <Future at 0x25ee74e0d68 state=pending>>, '~~~~~~~~~~~~~~~`')
---------------
2019-06-11 15:21:11,553 [Process-2:MainThread, 8036: 3356] finished 0
2019-06-11 15:21:11,553 [Process-2:MainThread, 8036: 3356] begin to work-2
2019-06-11 15:21:11,553 [Process-1:MainThread, 8064: 13140] finished 1
2019-06-11 15:21:11,554 [Process-1:MainThread, 8064: 13140] begin to work-3
2019-06-11 15:21:13,554 [Process-2:MainThread, 8036: 3356] finished 2
2019-06-11 15:21:13,554 [Process-1:MainThread, 8064: 13140] finished 3
2019-06-11 15:21:13,583 [MainProcess:MainThread, 17288: 8956] ([<_MainThread(MainThread, started 8956)>], '===================================')
Process finished with exit code 0
支持上下文管理
- concurrent.futures.ProcessPoolExecutor继承自concurrent.futures._base.Executor,而父类有__ enter__,__exit__方法,支持上下文管理,可以使用with语句.
- __exit__方法本质还是调用的shutdown(wait=True),就是一直阻塞到所有运行的任务完成
使用方法如下:
with ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(func,x)
print(future,result())
将上面的例子改成with语法
import threading
from concurrent import futures
import logging
import time
FORMAT = '%(asctime)-15s\t [%(processName)s:%(threadName)s, %(process)d:%(thread)8d] %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT)
def worker(n):
logging.info('begin to work-{}'.format(n))
time.sleep(2)
logging.info('finished {}'.format(n))
if __name__ == '__main__':
executer = futures.ProcessPoolExecutor(max_workers=2)
with executer:
fs =[]
for i in range(2):
future =executer.submit(worker,i)
fs.append(future)
for i in range(2,4):
future = executer.submit(worker,i)
fs.append(future)
while True:
time.sleep(2)
logging.info((threading.enumerate(),'+++++++++++++++++++++++++'))
flag = True
for f in fs: #判断是否还有任务未完成
logging.info((f.done,'~~~~~~~~~~~~~~~`'))
flag = flag and f.done
print('---------------')
if flag:
break
logging.info('======end======')
----------------------------------------------------------
2019-06-11 15:39:40,498 [Process-2:MainThread, 10760: 7960] begin to work-0
2019-06-11 15:39:40,498 [Process-1:MainThread, 16784: 17292] begin to work-1
2019-06-11 15:39:42,346 [MainProcess:MainThread, 18356: 5464] ([<_MainThread(MainThread, started 5464)>, <Thread(Thread-1, started daemon 7376)>, <Thread(QueueFeederThread, started daemon 13928)>], '+++++++++++++++++++++++++')
---------------
2019-06-11 15:39:42,346 [MainProcess:MainThread, 18356: 5464] (<bound method Future.done of <Future at 0x21a893c5780 state=running>>, '~~~~~~~~~~~~~~~`')
2019-06-11 15:39:42,346 [MainProcess:MainThread, 18356: 5464] (<bound method Future.done of <Future at 0x21a893cfb70 state=running>>, '~~~~~~~~~~~~~~~`')
2019-06-11 15:39:42,347 [MainProcess:MainThread, 18356: 5464] (<bound method Future.done of <Future at 0x21a8961ccf8 state=running>>, '~~~~~~~~~~~~~~~`')
2019-06-11 15:39:42,347 [MainProcess:MainThread, 18356: 5464] (<bound method Future.done of <Future at 0x21a8962d358 state=pending>>, '~~~~~~~~~~~~~~~`')
2019-06-11 15:39:42,500 [Process-2:MainThread, 10760: 7960] finished 0
2019-06-11 15:39:42,500 [Process-1:MainThread, 16784: 17292] finished 1
2019-06-11 15:39:42,500 [Process-2:MainThread, 10760: 7960] begin to work-2
2019-06-11 15:39:42,500 [Process-1:MainThread, 16784: 17292] begin to work-3
2019-06-11 15:39:44,509 [Process-2:MainThread, 10760: 7960] finished 2
2019-06-11 15:39:44,509 [Process-1:MainThread, 16784: 17292] finished 3
2019-06-11 15:39:44,540 [MainProcess:MainThread, 18356: 5464] ======end======
Process finished with exit code 0
总结
该库统一了线程池、进程池调用,简化了编程