python多线程、多进程（GIL、Queue、threading、multiprocessing、锁、信号量）

置顶老糊涂Lion

已于 2023-02-22 10:02:32 修改

阅读量7.5k

点赞数

分类专栏： python学习笔记文章标签：多线程多进程 thread 队列

于 2020-07-10 09:20:50 首次发布

本文链接：https://blog.csdn.net/jiatong151/article/details/107242886

版权

python学习笔记专栏收录该内容

12 篇文章 0 订阅

订阅专栏

一、python的GIL(global interpreter Lock)全局解释器锁（cpython）

概述：python中的一个线程对应于C语言中的一个线程，GIL使得同一时刻只有一个线程在一个cpu上执行字节码，无法将多个线程映射到多个cpu上执行。
GIL释放：GIL会根据执行的字节码行数（执行多少行释放）以及时间片（执行多少毫秒释放）释放GIL，GIL在遇到IO的操作时主动释放。

让我们来举个例子看一下吧？

number = 0

def add():
	global number
	for i in range(100000):
		number += 1

def desc():
	global number
	for i in range(100000):
		number -= 1
import threading
func1 = threading.Thread(target=add)
func2 = threading.Thread(target=desc)

func1.start()
func2.start()

func1.join()
func2.join()

print(num)

可以看到得出的结果是非常不稳定的，也就是说python在运行的时候，不是将一个函数完全运行完再去运行下一个函数。

二、多线程编程-threading

概述：对于IO操作来说，多线程和多进程性能差别不大。

1、通过实例化Thread类实现多线程编程

thread1.setDaemon(True) # setDaemon子进程成为守护进程，当主进程关闭，守护进程关闭。
thread1.join() # join线程阻塞，等线程运行完毕再结束退出。

来吧，让我们看一下例子：

# 多线程实现的一种方式,一下例子是模拟爬虫的

import threading
import time

def get_url(url):
	"""打开url"""
	print('start get url')
	time.sleep(2)
	print('end get url')


def get_detail(name):
	"""获取网页的数据"""
	print('start get detail')
	time.sleep(4)
	print('end get detail')

if __name__ == '__main__':
	thread1 = threading.Thread(target=get_url, args=('',))  # args是函数的参数，此处是元组
	thread2 = threading.Thread(target=get_detail, args=('',)) # 当参数只有一个时，需要加‘，’

	thread1.setDaemon(True)  # 此时thread1是守护进程，当主进程结束，不管子进程是否结束都会退出
	thread2.setDaemon(True)
	start_time = time.time()
	thread1.start()
	thread2.start()

	thread1.join()
	thread2.join()

	print('ltime: {}'.format(time.time() - start_time))

2、通过继承Thread来实现多线程编程

class GetDetailHtml(threading.Thread):
    def __init__(self, name):
        super().__init(name=name)
    def run(self):
        print("start")
        time.sleep(2)
        print("end")
if __name__ == "__main__":
    thraed1 = GetDetailHtml("test1")
    thread1.start()
    thread1.join()

三、线程间的通信-共享变量和Queue

1、共享变量global()，与函数间的调用相似

2、queue队列-线程间的通信更加安全

def get_detail_html(queue): # 爬取文章详情
    while True:
        url = queue.get(queue) # 从队列中取出URL，进行操作
        print("url start")
        time.sleep(2)
        print("url end")
    
def get_detail_url(queue): # 爬取文章列表页
    while True:
        print("html start")
        for i in range(20):
            queue.put("http://projectdu.com/{id}".format(id=i)) # 将爬取的url加入到空队列中
        time.sleep(4)
        print("html end")
    
if __name__ == "__main__":
    detail_url_queue = Queue(maxsize = 1000)  # maxsize队列最大值，对内存有影响
    thread_detail_url = threading.Thread(target=get_detail_url, args=(detail_url_queue,))
    thread_detail_url.start()
    for i in range(10):
        html_thread = threading.Thread(target=get_detail_html, args = (detail_url_queue,))
        html_thread.start()
     detail_url_queue.task_done()  # 发送停止信号
     detail_url_queue.join # 线程结束，与task_done成对使用

四、线程同步- Lock 、RLock

1、Lock

(1)加锁会使运行时间增加，影响一定的性能
(2) 锁会引起死锁

from tareading import Lock
total = 0
lock = Lock()
def add():
    global total
    global lock
    for i in range(10000):
        lock.acquire()  # 先获取锁
        total += 1 # 与desc中的减total互相竞争，所以加锁防止错乱
        lock.release()  # 一定要释放锁,不释放会导致其他竞争无法获取锁，导致线程停滞
def desc():
    global lock
    global total
    for i in range(10000):
        lock.acquire()  # 先获取锁
        total -= 1
        lock.release()  # 一定要释放锁
import threading
thread1 = threading.Thread(target=add)
thread2 = threading.Thread(target=desc)
thread1.start()
thread2.start()
thread1.join()
thread2.join()

2、使用RLock

在同一线程中，可以多次调用acquire，但是也要调用与acquire次数相同的release。

lock = RLock()
def add():
    global total
    global lock
    for i in range(10000):
        lock.acquire()  # 先获取锁
        lock.acquire()
        total += 1 # 与desc中的减total互相竞争，所以加锁防止错乱
        lock.release()
        lock.release()

3、condition使用（条件变量，用于复杂的线程间通信（同步））

(1) xiaoai.start()
tianmao.start()启动顺序很重要，一定要先启动最先处于等待状态的，不能先启动发送通知状态的，因为condition.wait()方法只能有condition.notify()方法能唤醒（若先启动发送通知的类会一直处于等待状态）
（2）在调用with condition 之后才能调用wait或notify方法
（3）condition有两层锁，一把底层锁会在线程调用了wait方法的时候释放，上面的锁会在每次调用wait的时候分配一把并放入到condition的等待队列中，等待notify方法的唤醒。
（4）分析文章请参考：https://www.cnblogs.com/yoyoketang/p/8337118.html

class Xiaoai(threading.Thread):
    def __init__(self, lock):
        super()__init__(name="小爱")
        self.condition = condition
    def run(self):
        with self.condition:
            self.condition.wait() # 等待天猫的一个通知，接到通知开始说活
            print("{}: 在 ".format(self.name))
            self.condition.notify() # 小爱回答完，发出一个通知给天猫
            
            self.condition.wait()
            print("{}: 好啊 ".format(self.name))
            self.condition.notify()
class Tianmao(threading.Thread):
    def __init__(self, lock):
        super()__init__(name="天猫精灵")
        self.condition = condition
    def run(self):
        with self.condition:
            print("{}: 小爱同学 ".format(self.name))
            self.condition.notify() # 天猫说完话，进行一个通知给小爱
            
            self.condition.wait() 等待小爱的通知进行回答
            print("{}: 我们来对诗吧 ".format(self.name))
            self.condition.notify()
            self.condition.wait()
            
if __name__ == "__main__":
    condition = threading.condition()
    xiaoai = Xiaoai(condition)
    tianmao = Tianmao(condition)
    xiaoai.start()
    tianmao.start()

对话顺序详解：

对话以此类推：天猫讲话完毕发通知给小爱，此时的小爱处于等待状态，接收到天猫的通知后小爱开始讲话，讲话完毕发出通知给天猫，此时的天猫处于等待状态，接收到小爱的通知后开始
讲话。。。。

4、Semaphore信号量

概述：是用于控制进入数量的锁
例：读写，写入时候只允许一个写，读的时候可以有多个读。

class HtmlSpider(threading.Thread):
    def__init__(self, sem)
        super().__init__()
        self.sem = sem
    def run(self):
        time.sleep(2)
        print("got html text success")
        self.sem.release()
        
class UrlProducer(threading.Thread):
    def __init__(self, sem)
        super().__init__()
        self.sem = sem
    
    def run(self):
        for i in range(20):
            self.sem.acquire() # 对应一个release方法，acquire控制线程的数量，并且使用时会线程会-1，相应的release释放后会线程+1
            html_thread = HtmlSpider("http://baidu.com/{}".format(i), self.sem)
            html_thread.start()
            
if __name__ == "__main__":
    sem = threading.Semaphore(3) # 开启三个线程
    url_prodecer = UrlProducer()
    url_prodecer.start()

5、线程池ThreadPoolExecutor

(1)重要的包

from concurrent import futures, as_completed，wait

（2）主线程中可以获取某一个线程的状态或者某一任务的状态，以及返回值。当一个线程完成的时候我们的主线程能立即知道，futures可以让多线程和多进程编码接口一致。
(3)详细讲解：https://www.jianshu.com/p/b9b3d66aa0be

import time
def get_html(times):
    time,sleep(times)
    print("get page {} success".format(times))
    return times
    
executor = ThreadPoolExecutor(max_workers=2)


# 通过submit函数提交执行的函数到线程池中，submnit是立即执行
task1 = executor.submit(get_html, (3)) # 参数一是函数名称，参数二是函数的参数
task2 = executor.submit(get_html, (2))
# 要获取已经成功的task的返回
urls = [3,2,4]

(2)
all_task = [executor.submit(get_html,(url)) for url in urls]
wait(all_task) # wait方法可以控制等待某个事件完成后开始其他操作，return_when
print("all_task完成，开始其他操作")

for future in as_completed(all_task): # as_completed只返回完成的task
    data = future.result()
    print("get {} page".format(data))


(3) 通过executor 的map获取已经完成的task的值
for data in executor.map(get_html, urls): # map方法，就是要urls中的每个数去执行get_html方法
    print("get {} page".format(data))



（1）
print(task1.done()) # done用于判定某个任务是否完成
print(task1.result()) # result方法可以获取task的执行结果

五、多线程和多进程对比

1、多进程编程消耗cpu的操作（计算），用多进程编程，对于io操作来说，进程切换代价要高于线程切换。

2、对耗费cpu的操作，多进程优于多线程。(验证多进程对于消耗cpu的操作时的性能是优于多线程的。)

from concurrent.futures import ThreadPoolExecutor,as_completed
from concurrent.futures import ProcessPoolExecutor

def fib():
    if n<=2:
        return 1
    return fib(n-1)+fib(n-2)
# 如下多线程运行
with ThreadPoolExecutor(3) as executor:
    all_task = [executor.submit(fib, (num) for num in range(25,35)]
    start_time = time.time()
    for future in as_completed(all_task):
        data = future.result()
        print("exe result: {} ".format(data))
    print("last time is : {}".format(time.time()-start_time))
    
# 如下多进程运行
"""
注意：A.多进程运行下若不在if __name__ == "__main__":下运行会报错，仅在windows中
报错，linux不会。
"""
if __name__ == "__main__":
    with ProcessPoolExecutor(3) as executor:
        all_task = [executor.submit(fib, (num) for num in range(25,35)]
        start_time = time.time()
        for future in as_completed(all_task):
            data = future.result()
            print("exe result: {} ".format(data))
        print("last time is : {}".format(time.time()-start_time))
        

3、对于IO操作来说，多线程优于多进程（使用sleep模拟io）
def random_sleep(n):
    time.sleep(n)
    return n

if __name__ == "__main__":
    with ThreadPoolExecutor(3) as executor:
        all_task = [executor.submit(random_sleep, (num) for num in [2]*30]
        start_time = time.time()
        for future in as_completed(all_task):
            data = future.result()
            print("exe result: {} ".format(data))
        print("last time is : {}".format(time.time()-start_time))

六、multiprocessing多进程编程

1、多进程编程（与多线程相似）

import time
import multiprocessing

def get_html(n)
    time.sleep(n)
    print("process run success")
    return n 

if __name__ == "__main__":
    process = multiprocessing.Process(target=get_html, args=(2,))
    process.start()
    process.join()
    print("process run end")

2、使用线程池

pool = multiprocessing.Pool(multiprocessing.cpu_count()) # cpu_count获取cup个数，以cpu个数作为开启的进程个数
result = pool.apply_async(get_html)

~~未完，待续。。。~~

老糊涂Lion

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
打赏
1
评论
python多线程、多进程（GIL、Queue、threading、multiprocessing、锁、信号量）

1、GIL全局解释器锁2、线程的使用方法3、线程间的通信
复制链接

扫一扫

专栏目录