总所周知Python由于GIL的问题,使用多线程时同一时刻只有一个线程在工作。故Python会在所有线程之间不断的切换,每切换到一个线程会执行一段字节码指令然后切换到另一个线程。如果开启了很多线程,且只有小部分线程在工作,如果不休眠部分线程,那么每次切换到非工作线程时就会一直空转浪费资源,从而拖慢了整体效率。例如下面示例代码,总共启动了20个线程,随机分发100个计算10000阶乘的任务。
import time
import random
import threading
from queue import Queue
random.seed(1234)
count = 0
lock = threading.Lock()
def task(v: int):
res = 1
for i in range(1, v + 1):
res = res * i
def worker(input_queue: Queue):
global count
while True:
if input_queue.empty():
continue
v = input_queue.get()
task(v)
with lock:
count += 1
if __name__ == '__main__':
num_workers = 20
num_tasks = 100
queues = [Queue() for _ in range(num_workers)]
threads = [threading.Thread(target=worker, args=(queues[i],)) for i in range(num_workers)]
for thread in threads:
thread.daemon = True
for thread in threads:
thread.start()
time.sleep(1)
t0 = time.perf_counter()
for _ in range(num_tasks):
idx = random.randint(0, num_workers - 1)
queues[idx].put(10000)
t1 = time.perf_counter()
print(f"put time: {t1 - t0:.5f}")
while count != num_tasks:
continue
t2 = time.perf_counter()
print(f"total time: {t2 - t0:.5f}")
终端输出如下:
put time: 24.91427
total time: 26.17514
如果将worker
中的continue
换成time.sleep(0.02)
,再次执行终端输出如下:
put time: 0.00038
total time: 1.03202
可以看到,通过time.sleep
方法让暂时没工作的线程休眠一会,将更多的工作机会提供给真正需要工作的线程,从而提升了整体效率。