背景
对于java的多线程问题,可以方便利用disrupt架构。在python则一般是使用multiprocessing和multithread两种解决方案,前者是基于cpu; 后者基于多线程。
具体方案
Process 方案
def f(x, ret):
ret[x] = x*x
def task_multiprocessing_get():
ret = {}
jobs =[]
manager = Manager()
return_dict = manager.dict()
for i in range(1,8):
p = Process(target=f, args=(i,return_dict))
jobs.append(p)
p.start()
for p in jobs:
p.join()
print(return_dict)
task_multiprocessing_get()
pool,map方案
from multiprocessing import Pool,Process,Manager
def f(x):
return x*x
def task_multiprocessing():
with Pool(3) as p:
ret = p.map(f,[1,2,3,4,5,6,7,8])
print(ret)
task_multiprocessing()
多线程
# multi-threading
import urllib.request
from concurrent.futures import ThreadPoolExecutor
urls = [
'http://www.python.org',
'https://docs.python.org/3/',
'https://docs.python.org/3/whatsnew/3.7.html',
'https://docs.python.org/3/tutorial/index.html',
'https://docs.python.org/3/library/index.html',
'https://docs.python.org/3/reference/index.html',
'https://docs.python.org/3/using/index.html',
'https://docs.python.org/3/howto/index.html',
'https://docs.python.org/3/installing/index.html',
'https://docs.python.org/3/distributing/index.html',
'https://docs.python.org/3/extending/index.html',
'https://docs.python.org/3/c-api/index.html',
'https://docs.python.org/3/faq/index.html'
]
%%time
with ThreadPoolExecutor(4) as executor:
results = executor.map(urllib.request.urlopen, urls)
print(results)
summary
- process 方案,join之后,利用manager的共享内容获取数据
- Pool方案,比较容易理解, pool设置数量,然后map(function,task)
- ThreadPoolExcutor执行,也是map。比较容易理解。
最好的方法还是实操联系一下,或者在项目里解决一个问题,就很快吸收理解了。
最终选择
- For IO-bound tasks, using multithreading can improve performance. For
- IO-bound tasks, using multiprocessing can also improve performance,
but the overhead tends to be higher than using multithreading. The - Python GIL means that only onethread can be executed at any given
time in a Python program. - For CPU bound tasks, using multithreading
can actually worsen the performance. - For CPU bound tasks, using
multiprocessing can improve performance. Wizards are awesome!
一句话总结:process倾向于cpu密集型的任务。
而threadpool更适合io的任务并发。注意选择。
参考文件
- https://medium.com/@urban_institute/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba
- https://stackoverflow.com/questions/29089282/multiprocessing-more-processes-than-cpu-count
- https://docs.python.org/3.7/library/multiprocessing.html?highlight=process
https://www.ellicium.com/python-multiprocessing-pool-process/ - https://medium.com/towards-artificial-intelligence/the-why-when-and-how-of-using-python-multi-threading-and-multi-processing-afd1b8a8ecca
- https://docs.python.org/zh-cn/3/library/concurrent.futures.html#threadpoolexecutor