Python多进程多线程详细剖析

主要涉及的模块:
threading
multiprocessing
concurrent.futures

使用threading

使用multiprocessing

进程池

# coding=utf-8
import time
from multiprocessing import Pool as ProcessPool


def task(t=5):
    print('Executing task...')
    time.sleep(t)
    print('Task finished.')
    return time.asctime()


p = ProcessPool(processes=3)

output = p.apply(task, (5,))

print(output)
Executing task...
Task finished.
Mon Feb 18 17:18:28 2019

如果参数processes不设,则默认是CPU的核数。

常用函数:

apply(func, args=(), kwds={})
阻塞型,也就是子进程执行完才返回。

apply_async(func, args=(), kwds={})
非阻塞型,立刻返回一个ApplyResult对象,可以通过ApplyResult.get()得到返回结果,注意get方法是阻塞型的。

ar = p.map_async(task, (5,))
print(ar.get())

Out:
Mon Feb 18 17:31:20 2019

map(func, iterable, chunksize=None)
把参数通过一个列表传递进去。

outputs = p.map(task, [2, 3, 4, 5])
print(outputs)

Out:
['Mon Feb 18 17:28:15 2019', 'Mon Feb 18 17:28:16 2019', 'Mon Feb 18 17:28:17 2019', 'Mon Feb 18 17:28:20 2019']

map_async(func, iterable, chunksize=None)
非阻塞型,立即返回一个MapResult对象。

mr = p.map_async(task, [2, 3, 4, 5])
print(mr.get())

Out:
['Mon Feb 18 17:31:20 2019', 'Mon Feb 18 17:31:21 2019', 'Mon Feb 18 17:31:22 2019', 'Mon Feb 18 17:31:25 2019']

imap(func, iterable, chunksize=1)
非阻塞型,立即返回一个IMapIterator对象。

# coding=utf-8
import random
import time
from multiprocessing.dummy import Pool as ThreadPool

import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - (%(process)d->%(thread)d): %(message)s')


def task(t=5):
    logging.info(f'Executing task -> {t}')
    time.sleep(random.randint(1, 10))
    logging.info(f'Task finished = {t}')
    return f"Task {t} {time.asctime()}"


p = ThreadPool(processes=3)

outputs = p.imap(task, [1, 2, 3, 4, 5, 6, 7, 8, 9]) # IMapIterator

for idx, o in enumerate(outputs):
    print(o)
2020-05-15 16:01:37,393 - INFO - (27104->28196): Executing task -> 1
2020-05-15 16:01:37,393 - INFO - (27104->17552): Executing task -> 2
2020-05-15 16:01:37,394 - INFO - (27104->10172): Executing task -> 3
2020-05-15 16:01:39,395 - INFO - (27104->10172): Task finished = 3
2020-05-15 16:01:39,395 - INFO - (27104->10172): Executing task -> 4
2020-05-15 16:01:44,394 - INFO - (27104->17552): Task finished = 2
2020-05-15 16:01:44,394 - INFO - (27104->17552): Executing task -> 5
2020-05-15 16:01:46,396 - INFO - (27104->10172): Task finished = 4
2020-05-15 16:01:46,396 - INFO - (27104->10172): Executing task -> 6
Task 1 Fri May 15 16:01:47 2020
Task 2 Fri May 15 16:01:44 2020
Task 3 Fri May 15 16:01:39 2020
Task 4 Fri May 15 16:01:46 2020
2020-05-15 16:01:47,394 - INFO - (27104->28196): Task finished = 1
2020-05-15 16:01:47,394 - INFO - (27104->28196): Executing task -> 7
2020-05-15 16:01:48,395 - INFO - (27104->17552): Task finished = 5
2020-05-15 16:01:48,395 - INFO - (27104->17552): Executing task -> 8
Task 5 Fri May 15 16:01:48 2020
2020-05-15 16:01:51,395 - INFO - (27104->28196): Task finished = 7
2020-05-15 16:01:51,395 - INFO - (27104->28196): Executing task -> 9
2020-05-15 16:01:51,396 - INFO - (27104->17552): Task finished = 8
Task 6 Fri May 15 16:01:53 2020
Task 7 Fri May 15 16:01:51 2020
Task 8 Fri May 15 16:01:51 2020
2020-05-15 16:01:53,397 - INFO - (27104->10172): Task finished = 6
2020-05-15 16:01:58,396 - INFO - (27104->28196): Task finished = 9
Task 9 Fri May 15 16:01:58 2020

imap_unordered(func, iterable, chunksize=1)
跟imap基本一样,不同的是返回结果的顺序,有结果则迭代对象立刻就有数据。

2020-05-15 16:09:44,882 - INFO - (24492->22428): Executing task -> 1
2020-05-15 16:09:44,882 - INFO - (24492->22876): Executing task -> 2
2020-05-15 16:09:44,882 - INFO - (24492->6888): Executing task -> 3
Task 2 Fri May 15 16:09:47 2020
2020-05-15 16:09:47,883 - INFO - (24492->22876): Task finished = 2
2020-05-15 16:09:47,883 - INFO - (24492->22876): Executing task -> 4
Task 1 Fri May 15 16:09:47 2020
2020-05-15 16:09:47,883 - INFO - (24492->22428): Task finished = 1
2020-05-15 16:09:47,883 - INFO - (24492->22428): Executing task -> 5
2020-05-15 16:09:48,884 - INFO - (24492->22428): Task finished = 5
Task 5 Fri May 15 16:09:48 2020
2020-05-15 16:09:48,884 - INFO - (24492->22428): Executing task -> 6
Task 6 Fri May 15 16:09:50 2020
2020-05-15 16:09:50,885 - INFO - (24492->22428): Task finished = 6
2020-05-15 16:09:50,885 - INFO - (24492->22428): Executing task -> 7
Task 4 Fri May 15 16:09:51 2020
2020-05-15 16:09:51,884 - INFO - (24492->22876): Task finished = 4
2020-05-15 16:09:51,884 - INFO - (24492->22876): Executing task -> 8
2020-05-15 16:09:54,883 - INFO - (24492->6888): Task finished = 3
2020-05-15 16:09:54,883 - INFO - (24492->6888): Executing task -> 9
2020-05-15 16:09:54,885 - INFO - (24492->22876): Task finished = 8
Task 3 Fri May 15 16:09:54 2020
2020-05-15 16:09:54,886 - INFO - (24492->22428): Task finished = 7
Task 8 Fri May 15 16:09:54 2020
Task 7 Fri May 15 16:09:54 2020
2020-05-15 16:10:00,884 - INFO - (24492->6888): Task finished = 9
Task 9 Fri May 15 16:10:00 2020

线程池
用法跟进程池几乎一模一样,导入时加入dummy即可。

from multiprocessing.dummy import Pool as ThreadPool

p = ThreadPool(processes=3)

使用concurrent.futures

线程安全

Python的中的list、tuple、dict是线程安全的吗?
由于GIL的存在,同一时刻只有一个线程在执行,但是,这并不能保证线程安全。比如一些不是原子操作的语句,具体可参看[1]。

[1] Are lists thread-safe?

已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 精致技术 设计师:CSDN官方博客 返回首页