python并行加速之: multiprocessing, multithread

背景

对于java的多线程问题,可以方便利用disrupt架构。在python则一般是使用multiprocessing和multithread两种解决方案,前者是基于cpu; 后者基于多线程。

具体方案

Process 方案

def f(x, ret):
    ret[x] = x*x

    
def task_multiprocessing_get():
    ret = {}
    jobs =[]
    manager = Manager()
    return_dict = manager.dict()
    for i in range(1,8):
        p = Process(target=f, args=(i,return_dict))
        jobs.append(p)
        p.start()
    
    for p in jobs:
        p.join()
    
    print(return_dict)

task_multiprocessing_get()


pool,map方案

from multiprocessing import Pool,Process,Manager

def f(x):
    return x*x
def task_multiprocessing():
    with Pool(3) as p:
        ret = p.map(f,[1,2,3,4,5,6,7,8])
        print(ret)


task_multiprocessing()

多线程

# multi-threading 

import urllib.request

from concurrent.futures import ThreadPoolExecutor
urls = [
  'http://www.python.org',
  'https://docs.python.org/3/',
  'https://docs.python.org/3/whatsnew/3.7.html',
  'https://docs.python.org/3/tutorial/index.html',
  'https://docs.python.org/3/library/index.html',
  'https://docs.python.org/3/reference/index.html',
  'https://docs.python.org/3/using/index.html',
  'https://docs.python.org/3/howto/index.html',
  'https://docs.python.org/3/installing/index.html',
  'https://docs.python.org/3/distributing/index.html',
  'https://docs.python.org/3/extending/index.html',
  'https://docs.python.org/3/c-api/index.html',
  'https://docs.python.org/3/faq/index.html'
  ]
  %%time

with ThreadPoolExecutor(4) as executor:
    results = executor.map(urllib.request.urlopen, urls)
    print(results)

summary

  1. process 方案,join之后,利用manager的共享内容获取数据
  2. Pool方案,比较容易理解, pool设置数量,然后map(function,task)
  3. ThreadPoolExcutor执行,也是map。比较容易理解。
    最好的方法还是实操联系一下,或者在项目里解决一个问题,就很快吸收理解了。

最终选择

  • For IO-bound tasks, using multithreading can improve performance. For
  • IO-bound tasks, using multiprocessing can also improve performance,
    but the overhead tends to be higher than using multithreading. The
  • Python GIL means that only onethread can be executed at any given
    time in a Python program.
  • For CPU bound tasks, using multithreading
    can actually worsen the performance.
  • For CPU bound tasks, using
    multiprocessing can improve performance. Wizards are awesome!

一句话总结:process倾向于cpu密集型的任务。
而threadpool更适合io的任务并发。注意选择。

参考文件

  1. https://medium.com/@urban_institute/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba
  2. https://stackoverflow.com/questions/29089282/multiprocessing-more-processes-than-cpu-count
  3. https://docs.python.org/3.7/library/multiprocessing.html?highlight=process
    https://www.ellicium.com/python-multiprocessing-pool-process/
  4. https://medium.com/towards-artificial-intelligence/the-why-when-and-how-of-using-python-multi-threading-and-multi-processing-afd1b8a8ecca
  5. https://docs.python.org/zh-cn/3/library/concurrent.futures.html#threadpoolexecutor
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值