您使用asyncio没有任何问题.
为了证明这一点,让我们试试你的脚本的简化版本 – 没有paramiko,只是纯Python.
import asyncio, functools, sys, time
START_TIME = time.monotonic()
def log(msg):
print('{:>7.3f} {}'.format(time.monotonic() - START_TIME, msg))
def dummy(thread_id):
log('Thread {} started'.format(thread_id))
time.sleep(1)
log('Thread {} finished'.format(thread_id))
loop = asyncio.get_event_loop()
tasks = []
for i in range(0, int(sys.argv[1])):
task = loop.run_in_executor(None, functools.partial(dummy, thread_id=i))
tasks.append(task)
loop.run_until_complete(asyncio.gather(*tasks))
loop.close()
有两个线程,这将打印:
$python3 async.py 2
0.001 Thread 0 started
0.002 Thread 1 started
1.003 Thread 0 finished
1.003 Thread 1 finished
这种并发最多可扩展到5个线程:
$python3 async.py 5
0.001 Thread 0 started
...
0.003 Thread 4 started
1.002 Thread 0 finished
...
1.005 Thread 4 finished
如果我们再添加一个线程,我们就会达到线程池限制:
$python3 async.py 6
0.001 Thread 0 started
0.001 Thread 1 started
0.002 Thread 2 started
0.003 Thread 3 started
0.003 Thread 4 started
1.002 Thread 0 finished
1.003 Thread 5 started
1.003 Thread 1 finished
1.004 Thread 2 finished
1.004 Thread 3 finished
1.004 Thread 4 finished
2.005 Thread 5 finished
一切都按预期进行,每5件物品的总时间增加1秒.魔术数字5记录在ThreadPoolExecutor文档中:
Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor.
第三方库如何阻止我的ThreadPoolExecutor?
> Library使用某种全局锁.这意味着库不支持多线程.尝试使用ProcessPoolExecutor,但要小心:库可能包含其他反模式,例如使用相同的硬编码临时文件名.
>函数执行很长时间并且不释放GIL.它可能表示C扩展代码中存在错误,但持有GIL的最常见原因是进行一些CPU密集型计算.同样,您可以尝试ProcessPoolExecutor,因为它不受GIL的影响.
预计这些都不会像paramiko这样的库发生.
第三方库如何阻止我的ProcessPoolExecutor?
它通常不能.您的任务在不同的进程中执行.如果您发现ProcessPoolExecutor中的两个任务花费了两倍的时间,则怀疑资源瓶颈(例如占用100%的网络带宽).