python并行加速之: multiprocessing, multithread

最新推荐文章于 2024-08-13 07:30:00 发布

iterate7

最新推荐文章于 2024-08-13 07:30:00 发布

阅读量2.4k

点赞数 1

分类专栏：算法基础知识文章标签： multiprocessing multithread threadpoolexcutor process join

本文链接：https://blog.csdn.net/iterate7/article/details/102883956

版权

算法同时被 2 个专栏收录

30 篇文章 0 订阅

订阅专栏

基础知识

23 篇文章 0 订阅

订阅专栏

背景

对于java的多线程问题，可以方便利用disrupt架构。在python则一般是使用multiprocessing和multithread两种解决方案，前者是基于cpu；后者基于多线程。

具体方案

Process 方案

def f(x, ret):
    ret[x] = x*x

    
def task_multiprocessing_get():
    ret = {}
    jobs =[]
    manager = Manager()
    return_dict = manager.dict()
    for i in range(1,8):
        p = Process(target=f, args=(i,return_dict))
        jobs.append(p)
        p.start()
    
    for p in jobs:
        p.join()
    
    print(return_dict)

task_multiprocessing_get()

pool，map方案

from multiprocessing import Pool,Process,Manager

def f(x):
    return x*x
def task_multiprocessing():
    with Pool(3) as p:
        ret = p.map(f,[1,2,3,4,5,6,7,8])
        print(ret)


task_multiprocessing()

多线程

# multi-threading 

import urllib.request

from concurrent.futures import ThreadPoolExecutor
urls = [
  'http://www.python.org',
  'https://docs.python.org/3/',
  'https://docs.python.org/3/whatsnew/3.7.html',
  'https://docs.python.org/3/tutorial/index.html',
  'https://docs.python.org/3/library/index.html',
  'https://docs.python.org/3/reference/index.html',
  'https://docs.python.org/3/using/index.html',
  'https://docs.python.org/3/howto/index.html',
  'https://docs.python.org/3/installing/index.html',
  'https://docs.python.org/3/distributing/index.html',
  'https://docs.python.org/3/extending/index.html',
  'https://docs.python.org/3/c-api/index.html',
  'https://docs.python.org/3/faq/index.html'
  ]
  %%time

with ThreadPoolExecutor(4) as executor:
    results = executor.map(urllib.request.urlopen, urls)
    print(results)

summary

process 方案，join之后，利用manager的共享内容获取数据
Pool方案，比较容易理解， pool设置数量，然后map（function，task）
ThreadPoolExcutor执行，也是map。比较容易理解。
最好的方法还是实操联系一下，或者在项目里解决一个问题，就很快吸收理解了。

最终选择

For IO-bound tasks, using multithreading can improve performance. For
IO-bound tasks, using multiprocessing can also improve performance,
but the overhead tends to be higher than using multithreading. The
Python GIL means that only onethread can be executed at any given
time in a Python program.
For CPU bound tasks, using multithreading
can actually worsen the performance.
For CPU bound tasks, using
multiprocessing can improve performance. Wizards are awesome!

一句话总结：process倾向于cpu密集型的任务。
而threadpool更适合io的任务并发。注意选择。

参考文件

https://medium.com/@urban_institute/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba
https://stackoverflow.com/questions/29089282/multiprocessing-more-processes-than-cpu-count
https://docs.python.org/3.7/library/multiprocessing.html?highlight=process
https://www.ellicium.com/python-multiprocessing-pool-process/
https://medium.com/towards-artificial-intelligence/the-why-when-and-how-of-using-python-multi-threading-and-multi-processing-afd1b8a8ecca
https://docs.python.org/zh-cn/3/library/concurrent.futures.html#threadpoolexecutor