主要涉及的模块:
threading
multiprocessing
concurrent.futures
使用threading
使用multiprocessing
进程池
# coding=utf-8
import time
from multiprocessing import Pool as ProcessPool
def task(t=5):
print('Executing task...')
time.sleep(t)
print('Task finished.')
return time.asctime()
p = ProcessPool(processes=3)
output = p.apply(task, (5,))
print(output)
Executing task...
Task finished.
Mon Feb 18 17:18:28 2019
如果参数processes不设,则默认是CPU的核数。
常用函数:
apply(func, args=(), kwds={})
阻塞型,也就是子进程执行完才返回。
apply_async(func, args=(), kwds={})
非阻塞型,立刻返回一个ApplyResult对象,可以通过ApplyResult.get()得到返回结果,注意get方法是阻塞型的。
ar = p.map_async(task, (5,))
print(ar.get())
Out:
Mon Feb 18 17:31:20 2019
map(func, iterable, chunksize=None)
把参数通过一个列表传递进去。
outputs = p.map(task, [2, 3, 4, 5])
print(outputs)
Out:
['Mon Feb 18 17:28:15 2019', 'Mon Feb 18 17:28:16 2019', 'Mon Feb 18 17:28:17 2019', 'Mon Feb 18 17:28:20 2019']
map_async(func, iterable, chunksize=None)
非阻塞型,立即返回一个MapResult对象。
mr = p.map_async(task, [2, 3, 4, 5])
print(mr.get())
Out:
['Mon Feb 18 17:31:20 2019', 'Mon Feb 18 17:31:21 2019', 'Mon Feb 18 17:31:22 2019', 'Mon Feb 18 17:31:25 2019']
imap(func, iterable, chunksize=1)
非阻塞型,立即返回一个IMapIterator对象。
# coding=utf-8
import random
import time
from multiprocessing.dummy import Pool as ThreadPool
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - (%(process)d->%(thread)d): %(message)s')
def task(t=5):
logging.info(f'Executing task -> {t}')
time.sleep(random.randint(1, 10))
logging.info(f'Task finished = {t}')
return f"Task {t} {time.asctime()}"
p = ThreadPool(processes=3)
outputs = p.imap(task, [1, 2, 3, 4, 5, 6, 7, 8, 9]) # IMapIterator
for idx, o in enumerate(outputs):
print(o)
2020-05-15 16:01:37,393 - INFO - (27104->28196): Executing task -> 1
2020-05-15 16:01:37,393 - INFO - (27104->17552): Executing task -> 2
2020-05-15 16:01:37,394 - INFO - (27104->10172): Executing task -> 3
2020-05-15 16:01:39,395 - INFO - (27104->10172): Task finished = 3
2020-05-15 16:01:39,395 - INFO - (27104->10172): Executing task -> 4
2020-05-15 16:01:44,394 - INFO - (27104->17552): Task finished = 2
2020-05-15 16:01:44,394 - INFO - (27104->17552): Executing task -> 5
2020-05-15 16:01:46,396 - INFO - (27104->10172): Task finished = 4
2020-05-15 16:01:46,396 - INFO - (27104->10172): Executing task -> 6
Task 1 Fri May 15 16:01:47 2020
Task 2 Fri May 15 16:01:44 2020
Task 3 Fri May 15 16:01:39 2020
Task 4 Fri May 15 16:01:46 2020
2020-05-15 16:01:47,394 - INFO - (27104->28196): Task finished = 1
2020-05-15 16:01:47,394 - INFO - (27104->28196): Executing task -> 7
2020-05-15 16:01:48,395 - INFO - (27104->17552): Task finished = 5
2020-05-15 16:01:48,395 - INFO - (27104->17552): Executing task -> 8
Task 5 Fri May 15 16:01:48 2020
2020-05-15 16:01:51,395 - INFO - (27104->28196): Task finished = 7
2020-05-15 16:01:51,395 - INFO - (27104->28196): Executing task -> 9
2020-05-15 16:01:51,396 - INFO - (27104->17552): Task finished = 8
Task 6 Fri May 15 16:01:53 2020
Task 7 Fri May 15 16:01:51 2020
Task 8 Fri May 15 16:01:51 2020
2020-05-15 16:01:53,397 - INFO - (27104->10172): Task finished = 6
2020-05-15 16:01:58,396 - INFO - (27104->28196): Task finished = 9
Task 9 Fri May 15 16:01:58 2020
imap_unordered(func, iterable, chunksize=1)
跟imap基本一样,不同的是返回结果的顺序,有结果则迭代对象立刻就有数据。
2020-05-15 16:09:44,882 - INFO - (24492->22428): Executing task -> 1
2020-05-15 16:09:44,882 - INFO - (24492->22876): Executing task -> 2
2020-05-15 16:09:44,882 - INFO - (24492->6888): Executing task -> 3
Task 2 Fri May 15 16:09:47 2020
2020-05-15 16:09:47,883 - INFO - (24492->22876): Task finished = 2
2020-05-15 16:09:47,883 - INFO - (24492->22876): Executing task -> 4
Task 1 Fri May 15 16:09:47 2020
2020-05-15 16:09:47,883 - INFO - (24492->22428): Task finished = 1
2020-05-15 16:09:47,883 - INFO - (24492->22428): Executing task -> 5
2020-05-15 16:09:48,884 - INFO - (24492->22428): Task finished = 5
Task 5 Fri May 15 16:09:48 2020
2020-05-15 16:09:48,884 - INFO - (24492->22428): Executing task -> 6
Task 6 Fri May 15 16:09:50 2020
2020-05-15 16:09:50,885 - INFO - (24492->22428): Task finished = 6
2020-05-15 16:09:50,885 - INFO - (24492->22428): Executing task -> 7
Task 4 Fri May 15 16:09:51 2020
2020-05-15 16:09:51,884 - INFO - (24492->22876): Task finished = 4
2020-05-15 16:09:51,884 - INFO - (24492->22876): Executing task -> 8
2020-05-15 16:09:54,883 - INFO - (24492->6888): Task finished = 3
2020-05-15 16:09:54,883 - INFO - (24492->6888): Executing task -> 9
2020-05-15 16:09:54,885 - INFO - (24492->22876): Task finished = 8
Task 3 Fri May 15 16:09:54 2020
2020-05-15 16:09:54,886 - INFO - (24492->22428): Task finished = 7
Task 8 Fri May 15 16:09:54 2020
Task 7 Fri May 15 16:09:54 2020
2020-05-15 16:10:00,884 - INFO - (24492->6888): Task finished = 9
Task 9 Fri May 15 16:10:00 2020
线程池
用法跟进程池几乎一模一样,导入时加入dummy即可。
from multiprocessing.dummy import Pool as ThreadPool
p = ThreadPool(processes=3)
使用concurrent.futures
线程安全
Python的中的list、tuple、dict是线程安全的吗?
由于GIL的存在,同一时刻只有一个线程在执行,但是,这并不能保证线程安全。比如一些不是原子操作的语句,具体可参看[1]。