python队列线程池_Python多线程与线程池编程

最新推荐文章于 2021-03-05 20:16:42 发布

博他一年

最新推荐文章于 2021-03-05 20:16:42 发布

阅读量184

点赞数

文章标签： python队列线程池

本文链接：https://blog.csdn.net/weixin_35544512/article/details/112406493

版权

本文介绍了Python中的线程和线程池概念，包括线程的状态、GIL全局解释器锁、线程的创建与管理，以及锁机制如Lock和RLock。此外，还讨论了Queue队列的使用方法和concurrent.futures模块中的ThreadPoolExecutor和ProcessPoolExecutor，它们简化了线程池和进程池的管理，实现了异步操作。

摘要由CSDN通过智能技术生成

1.多线程简介

线程(Thread)也称轻量级进程，是操作系统能够进行运算调度的最小单位，它被包涵在进程之中，是进程中的实际运作单位。
线程自身不拥有资源，只拥有一些在运行中必不可少的资源，但他可与同属一个进程的其他线程共享进程所拥有的全部资源。
一个线程可以创建和撤销另一个线程，同一进程中的多个线程之间可以并发执行。
线程有就绪/阻塞/运行三种基本状态:

　　(1)就绪状态是指线程具备运行的所有条件，逻辑上可以运行，在等待处理机；　　(2)运行状态是指线程占有处理机正在运行；　　(3)阻塞状态是指线程在等待一个事件(如某个信号量)，逻辑上不可执行；结论：在python中，对于计算密集型任务，多进程占优势；对于I/O密集型任务，多线程占优势 2. Python中的GIL

GIL的全称是Global Interpreter Lock(全局解释器锁)，来源是python设计之初的考虑，为了数据安全所做的决定；
每个CPU在同一时间只能执行一个线程(在单核CPU下的多线程其实都只是并发，不是并行，并发和并行从宏观上来讲都是同时处理多路请求的概念。但并发和并行又有区别，并行是指两个或者多个事件在同一时刻发生；而并发是指两个或多个事件在同一时间间隔内发生)；
在Python多线程下，每个线程的执行方式：

(1)获取GIL(Global Interpreter Lock(全局解释器锁)) (2)执行代码直到sleep或者是Python虚拟机将其挂起 (3)释放GI L 注意： (1)Python中一个线程对应于c语言中的一个线程，gil使得同一时刻只有一个线程在一个CPU上执行字节码，无法将多个线程映射到多个CPU上执行； (2)GIL会根据执行的字节码行数以及时间片释放GILL，GIL在遇到IO的操作时候主动释放 import threading total = 0 def add(): global total for i in range(1000000): total += 1 def desc(): global total for i in range(1000000): total += 1 thread1 = threading.Thread(target=add) thread2 = threading.Thread(target=desc) thread1.start() thread2.start() thread1.join() thread2.join() print(total) 3. python多线程介绍 (1) 多线程threading模块 方法一：创建threading.Thread类的实例，调用其start()方法

- threading.currentThread(): 返回当前的线程变量。
- threading.enumerate(): 返回一个包含正在运行的线程的list。正在运行指线程启动后、结束前，不包括启动前和终止后的线程。
- threading.activeCount(): 返回正在运行的线程数量，与len(threading.enumerate())有相同的结果。
  在线程里，传递参数有三种方法
- 使用元组传递 threading.Thread(target=方法名，args=(参数1,参数2, ...)
- 使用字典传递 threading.Thread(target=方法名, kwargs={"参数名": 参数1, "参数名": 参数2, ...})
- 混合使用元组和字典 threading.Thread(target=方法名，args=(参数1, 参数2, ...), kwargs={"参数名": 参数1,"参数名": 参数2, ...})

方法二：继承Thread类，在子类中重写run()和init()方法 Thread类提供了以下方法:

- run(): 用以表示线程活动的方法。
- start():启动线程活动。
- join([time]): 等待至线程中止。这阻塞调用线程直至线程的join() 方法被调用中止-正常退出或者抛出未处理的异常-或者是可选的超时发生。
- isAlive(): 返回线程是否活动的。
- getName(): 返回线程名。
- setName(): 设置线程名。

(2)多线程同步Lock(互诉锁)与RLock(可重入锁) 如果多个线程共同对某个数据修改，则可能出现不可预料的结果，这个时候就需要使用互诉锁来进行同步，解决多线程安全问题 #调用锁 lock = threading.Lock() #获取锁，用户线程同步 lock.acquire() import threading from threading import Lock total = 0 lock = Lock() def add(): global total global lock for i in range(1000000): lock.acquire() total += 1 lock.release() def desc(): global total global lock for i in range(1000000): lock.acquire() total -= 1 lock.release() thread1 = threading.Thread(target=add) thread2 = threading.Thread(target=desc) thread1.start() thread2.start() thread1.join() thread2.join() print(total) 可重入锁 RLock # RLock可重入的锁, 在同一个线程里面，可以连续多次acquire,一定要注意a cquire的次数要和release的次数相等 import threading from threading import RLock total = 0 lock = RLock() def add(): global total global lock for i in range(1000000): lock.acquire() total += 1 lock.release() def desc(): global total global lock for i in range(1000000): lock.acquire() total -= 1 lock.release() thread1 = threading.Thread(target=add) thread2 = threading.Thread(target=desc) thread1.start() thread2.start() thread1.join() thread2.join() print(total) Lock 和 RLock 的区别如下：

- threading.Lock：它是一个基本的锁对象，每次只能锁定一次，其余的锁请求，需等待锁释放后才能获取。
- threading.RLock：它代表可重入锁(Reentrant Lock)。对于可重入锁，在同一个线程中可以对它进行多次锁定，也可以多次释放。如果使用 RLock，那么 acquire() 和 release() 方法必须成对出现。如果调用了 n 次 acquire() 加锁，则必须调用 n 次 release() 才能释放锁。
- 由此可见，RLock 锁具有可重入性。也就是说，同一个线程可以对已被加锁的 RLock 锁再次加锁，RLock 对象会维持一个计数器来追踪 acquire() 方法的嵌套调用，线程在每次调用 acquire() 加锁后，都必须显式调用 release() 方法来释放锁。所以，一段被锁保护的方法可以调用另一个被相同锁保护的方法。
- Lock 是控制多个线程对共享资源进行访问的工具。通常，锁提供了对共享资源的独占访问，每次只能有一个线程对 Lock 对象加锁，线程在开始访问共享资源之前应先请求获得 Lock 对象。当对共享资源访问完成后，程序释放对 Lock 对象的锁定。

(3)多线程同步Semaphore(信号量) 互诉锁是只允许一个线程访问共享数据，而信号量是同时允许一定数量的线程访问共享数据 semaphore = threading.BoundedSemaphore() 线程同步 - Semaphore 使用以及源码分析 # Semaphore 是用于控制进入数量的锁 # 文件，读、写，写一般只是用于一个线程写，读可以允许有多个 # 做爬虫 import threading import time class HtmlSpider(threading.Thread): def __init__(self, url, sem): super().__init__() self.url = url self.sem = sem def run(self): time.sleep(2) print("got html text success") self.sem.release() class UrlProducer(threading.Thread): def __init__(self, sem): super().__init__() self.sem = sem def run(self): for i in range(20): self.sem.acquire() html_thread = HtmlSpider("https://baidu.com/{}".format(i), self.sem) html_thread.start() if __name__ == "__main__": sem = threading.Semaphore(2) url_producer = UrlProducer(sem) url_producer.start() (4)多线程同步Condition 使用以及源码分析 条件对象Condition能让一个线程A停下来，等待其它线程B，线程B满足了某个条件后通知(notify)线程A继续运行。线程首先获取一个条件变量锁，如果条件不足，则该线程等待(wait)并释放条件变量锁；如果条件满足，就继续执行线程，执行完成后可以通知(notify)其它状态为wait的线程执行。其它处于wait状态的线程接到通知后会重新判断条件以确定是否继续执行。 cond = threading.Condition() # 通过condition完成协同读诗 class XiaoAi(threading.Thread): def __init__(self, cond): super().__init__(name="小爱") self.cond = cond def run(self): with self.cond: self.cond.wait() print("{} : 在 ".format(self.name)) self.cond.notify() self.cond.wait() print("{} : 好啊 ".format(self.name)) self.cond.notify() self.cond.wait() print("{} : 君住长江尾 ".format(self.name)) self.cond.notify() self.cond.wait() print("{} : 共饮长江水 ".format(self.name)) self.cond.notify() self.cond.wait() print("{} : 此恨何时已 ".format(self.name)) self.cond.notify() self.cond.wait() print("{} : 定不负相思意 ".format(self.name)) self.cond.notify() class TianMao(threading.Thread): def __init__(self, cond): super().__init__(name="天猫精灵") self.cond = cond def run(self): with self.cond: print("{} : 小爱同学 ".format(self.name)) self.cond.notify() self.cond.wait() print("{} : 我们来对古诗吧 ".format(self.name)) self.cond.notify() self.cond.wait() print("{} : 我住长江头 ".format(self.name)) self.cond.notify() self.cond.wait() print("{} : 日日思君不见君 ".format(self.name)) self.cond.notify() self.cond.wait() print("{} : 此水几时休 ".format(self.name)) self.cond.notify() self.cond.wait() print("{} : 只愿君心似我心 ".format(self.name)) self.cond.notify() self.cond.wait() if __name__ == "__main__": cond = threading.Condition() xiaoai = XiaoAi(cond) tianmao = TianMao(cond) # 启动顺序很重要 # 在调用with cond之后才能调用wait或者notify方法 # condition有两层锁，一把底层锁会在线程调用了wait方法的时候释放，上面的锁会在每次调用wait的时候分配一把并放入到cond的等待队列中，等到notify方法的唤醒 xiaoai.start() tianmao.start() 天猫精灵 : 小爱同学小爱 : 在天猫精灵 : 我们来对古诗吧小爱 : 好啊天猫精灵 : 我住长江头小爱 : 君住长江尾天猫精灵 : 日日思君不见君小爱 : 共饮长江水天猫精灵 : 此水几时休小爱 : 此恨何时已天猫精灵 : 只愿君心似我心小爱 : 定不负相思意 (5) 线程间通信-Queue Python 的 Queue 模块中提供了同步的、线程安全的队列类，包括FIFO(先入先出)队列Queue，LIFO(后入先出)队列LifoQueue，和优先级队列 PriorityQueue。这些队列都实现了锁原语，能够在多线程中直接使用，可以使用队列来实现线程间的同步。 Queue 模块中的常用方法:

- Queue.qsize() 返回队列的大小
- Queue.empty() 如果队列为空，返回True,反之False
- Queue.full() 如果队列满了，返回True,反之False
- Queue.full 与 maxsize 大小对应
- Queue.get([block[, timeout]])获取队列，timeout等待时间
- Queue.get_nowait() 相当Queue.get(False)
- Queue.put(item) 写入队列，timeout等待时间
- Queue.put_nowait(item) 相当Queue.put(item, False)
- Queue.task_done() 在完成一项工作之后，Queue.task_done()函数向任务已经完成的队列发送一个信号
- Queue.join() 实际上意味着等到队列为空，再执行别的操作

import time import threading from queue import Queue def get_html(queue): while True: ur = queue.get() print("get html started") time.sleep(2) print("get html end") def get_url(queue): while True: print("get url started") time.sleep(5) for i in range(20): queue.put("http://www.baidu.com/{id}".format(id=i)) print("get url end") if __name__ == '__main__': detail_url_queue = Queue(maxsize=1000) thread_detail_url = threading.Thread(target=get_url, args=(detail_url_queue,)) for i in range(10): html_thread = threading.Thread(target=get_html, args=(detail_url_queue,)) html_thread.start() start_time = time.time() detail_url_queue.task_done() detail_url_queue.join() print("last time:{}".format(time.time() - start_time) (6)concurrent.futures线程池 Python标准库为我们提供了threading和multiprocessing模块编写相应的多线程/多进程代码，但是当项目达到一定的规模，频繁创建/销毁进程或者线程是非常消耗资源的，这个时候我们就要编写自己的线程池/进程池，以空间换时间。但从Python3.2开始，标准库为我们提供了concurrent.futures模块，它提供了ThreadPoolExecutor和ProcessPoolExecutor两个类，实现了对threading和multiprocessing的进一步抽象，对编写线程池/进程池提供了直接的支持。 Executor和Future：

concurrent.futures模块的基础是Exectuor，Executor是一个抽象类，它不能被直接使用。但是它提供的两个子类ThreadPoolExecutor和ProcessPoolExecutor却是非常有用，两者分别被用来创建线程池和进程池的代码。可以将相应的tasks直接放入线程池/进程池，不需要维护Queue来操心死锁的问题，线程池/进程池会自动帮我们调度。
Future可以把它理解为一个在未来完成的操作，这是异步编程的基础，传统编程模式下比如我们操作queue.get的时候，在等待返回结果之前会产生阻塞，cpu不能让出来做其他事情，而Future的引入帮助我们在等待的这段时间可以完成其他的操作。

from concurrent.futures import ThreadPoolExecutor, as_completed, wait, FIRST_COMPLETED from concurrent.futures import Future from multiprocessing import Pool # futures 未来对象，task的返回容器 # 线程池，为什么要线程池 # 主线程中可以获取某一个线程的状态或者某一个任务的状态，以及返回值 # 当一个线程完成的时候我们主线程能立即知道 # futures可以让多线程和多进程编码接口一致 import time def get_html(times): time.sleep(times) print("get page {} success".format(times)) return times executor = ThreadPoolExecutor(max_workers=2) # 通过submit函数提交执行的函数到线程池中, submit 是立即返回 # task1 = executor.submit(get_html, (3)) # task2 = executor.submit(get_html, (2)) # done方法用于判定某个任务是否完成 # print(task1.done()) # print(task2.cancel()) # time.sleep(3) # print(task1.done()) # result方法可以获取task的执行结果 # print(task1.result()) # 要获取已经成功的task的返回 urls = [3, 2, 4] all_task = [executor.submit(get_html, (url)) for url in urls] wait(all_task, return_when=FIRST_COMPLETED) print("main") # for future in as_completed(all_task): # data = future.result() # print("get {} page".format(data)) # 通过executor的map获取已经完成的task的值 # for data in executor.map(get_html, urls): # print("get {} page".format(data)) 总结：Python多线程适合用再I/O密集型任务中。I/O密集型任务较小时间用在CPU计算上，较多时间用在I/O上，如文件读写/web请求/数据库请求等；而对于计算密集型任务，应该使用多进程。