快速掌握使用 python多线程、多进程

最新推荐文章于 2024-04-19 14:05:53 发布

达达爱吃肉

最新推荐文章于 2024-04-19 14:05:53 发布

阅读量320

点赞数

分类专栏： Python 进阶之路文章标签： python 多线程多进程

本文链接：https://blog.csdn.net/weixin_44706915/article/details/111876921

版权

Python 进阶之路专栏收录该内容

8 篇文章 1 订阅

订阅专栏

前言

随着计算机硬件的发展，不再是以前单核时代。我相信任何一门语言，多线程和多进程都是重中之重。

这篇文章将全面系统的对python多进程与多线程进行理解及应用。

gil锁

对于线程和进程的基本知识大家应当都足够清楚了。

由于这篇文章着重是对应用的讲解，所以这里就不过多进行介绍。

什么是gil 锁

其实gil 锁这种叫法并不合理。因为gil本身就是锁。 gloabal interperter lock 则是gil的缩写，直译过来就是全局翻译锁我们这里叫全局解释锁更合理一点。

gil 有什么特点

gil 使得同一个时刻只有一个线程在一个cpu上运行, 无法将多个线程映射到多个cpu上执行，言外之意就是多核无意义。
gil会根据执行的字节码的行数以及时间片释放gil
gil在遇到io操作时候主动释放

而同时线程的开销较小，且切换较为快捷。这样我们也就知道了多进程和多线程的使用场景。

多线程：用于io较多的使用场景，如数据库交互、接口调用、文件写入、爬虫等。
多进程：用于复杂的计算。

这些都是理论知识，下面就将介绍相关多线程及多进程的用法。

多线程

使用`threading.Thread` 实现多线程

import time
import threading


def start_model_first(line):
    print("模型1开始启动")
    time.sleep(2)
    print("模型1启动完成")


def start_model_second(line):
    print("模型2开始启动")
    time.sleep(3)
    print("模型2启动完成")


if __name__ == '__main__':
    thread1 = threading.Thread(target=start_model_first, args=("", ))
    thread2 = threading.Thread(target=start_model_second, args=("", ))
    start_time = time.time()
    thread1.start()
    thread2.start()
    # thread1.join()
    thread2.join()  # 主线程等待thread2运行完才开始运行
    print("last time: {}".format(time.time() - start_time))

模型1开始启动
模型2开始启动
模型1启动完成
模型2启动完成
last time: 3.0019595623016357

如果我们选择的是阻塞线程thread1则会

thread1.start()
    thread2.start()
    thread1.join()
    # thread2.join()  # 主线程等待thread2运行完才开始运行
    print("last time: {}".format(time.time() - start_time))

模型1开始启动
模型2开始启动
模型1启动完成
last time: 2.000525712966919
模型2启动完成

这时候主线程只会等待线程1执行完成就会执行，线程1执行完之后是两秒

继承`threading.Thread` 实现多线程

class StartModelFirst(threading.Thread):
    def __init__(self, name):
        super(StartModelFirst, self).__init__(name=name)

    def run(self) -> None:
        print("模型1开始启动")
        time.sleep(2)
        print("模型1启动完成")


class StartModelSecond(threading.Thread):
    def __init__(self, name):
        super(StartModelSecond, self).__init__(name=name)

    def run(self) -> None:
        print("模型2开始启动")
        time.sleep(3)
        print("模型2启动完成")


if __name__ == '__main__':
    thread1 = StartModelFirst("模型1")
    thread2 = StartModelSecond("模型2")
    start_time = time.time()
    thread1.start()
    thread2.start()
    # thread1.join()
    thread2.join()  # 主线程等待thread2运行完才开始运行
    print("last time: {}".format(time.time() - start_time))

继承我们只需要将逻辑写入run方法即可，因为大部分编程逻辑比较复杂，所以继承的方法使用的更多。

使用`Queue`实现线程间通信

使用线程间通信有很多方式，如共享变量，传递参数，外部消息队列如redis等方式。在python多线程编程中使用较多的为python中的Queue队列

import time
from queue import Queue
import threading


def start_model_first(queue):
    while True:
        print("模型1开始启动")
        time.sleep(2)
        for first in range(2):
            queue.put("需要再次考虑的数据{}".format(first))
        print("模型1启动完成")


def start_model_second(queue):
    while True:
        result = queue.get()
        print(result + "正在模型2中考虑")
        print("模型2开始启动")
        time.sleep(3)
        print("模型2启动完成")


if __name__ == '__main__':
    the_queue = Queue(maxsize=1000)
    thread1 = threading.Thread(target=start_model_first, args=(the_queue, ))
    thread1.start()
    for i in range(3):
        second_thread_model = threading.Thread(target=start_model_second, args=(the_queue, ))
        second_thread_model.start()

模型1开始启动
模型1启动完成需要再次考虑的数据0正在模型2中考虑
模型2开始启动
需要再次考虑的数据1正在模型2中考虑
模型2开始启动

假设现在有一批数据流我们需要模型1看过之后才交给模型2去处理。这个时候我们可以使用Queue实现线程间的通信。

使用`ThreadPoolExecutor`实现线程池

简单使用

from concurrent.futures import ThreadPoolExecutor, as_completed
import time


def start_work(index):
    time.sleep(1)
    print("这是线程任务 {}".format(index))
    return index+1


if __name__ == '__main__':
    executor = ThreadPoolExecutor(max_workers=2) # 定义线程池，并规定最大线程数当前是2。这样我们只需要往里面添加任务就行，系统会保持该线程数运行。
    task1 = executor.submit(start_work, (1))
    task2 = executor.submit(start_work, (2))
    # 获得线程的返回值
    print("task1任务的返回值为{}".format(task1.result()))
    print("task2任务的返回值为{}".format(task2.result()))

这是线程任务 1
这是线程任务 2
task1任务的返回值为2 # 打印出线程的返回结果
task2任务的返回值为3

更便捷的使用方式

真正的开发过程中我们不会也不大可能去实现这样的线程池使用，我们往往会使用更简单的方式。

    task_list = list(range(5))
    all_task = [executor.submit(start_work, (t)) for t in task_list]
    for future in as_completed(all_task):
        data = future.result()
        print("task{}任务的返回值为{}".format(data, data))

这是线程任务 0
task1任务的返回值为1
这是线程任务 1
task2任务的返回值为2
这是线程任务 2
task3任务的返回值为3
这是线程任务 3
task4任务的返回值为4
这是线程任务 4
task5任务的返回值为5

这样我们只需要将我们的任务写进列表里，在使用列表推导式巧妙的使用即可。其实还有更简单的方式，不得不说这个包是非常优秀。

    for data in executor.map(start_work, task_list):
        print("task{}任务的返回值为{}".format(data, data))

多进程

使用`multiprocessing`简单实现多进程

import multiprocessing
import time


def start_work(index):
    time.sleep(1)
    print("这是进程任务 {}".format(index))
    return index+1


if __name__ == '__main__':
    progress = multiprocessing.Process(target=start_work, args=(1,) )
    progress2 = multiprocessing.Process(target=start_work, args=(2,))
    progress.start()
    progress2.start()
    print(progress.pid)
    print(progress2.pid)
    print("main progress end")

15236
3096
main progress end
这是进程任务 1
这是进程任务 2

进程池实现

    pool = multiprocessing.Pool()
    result = pool.apply_async(start_work, args=(1,))
    result2 = pool.apply_async(start_work, args=(2,))
    pool.close()
    pool.join()
    print(result.get())
    print(result2.get())

这是进程任务 1
这是进程任务 2
2
3

当然这种进程任务在现实中大都是批量提交的。

    for result in pool.imap(start_work, [1, 2, 3]):
        print("{} sleep success".format(result))

这是进程任务 1
work 2 success end
这是进程任务 2
work 3 success end
这是进程任务 3
work 4 success end

这里面打印的结果其实是我们提交的任务为顺序的，如果我们想成功一个打印一个，则可以用方法。

    for result in pool.imap_unordered(start_work, [1, 2, 3]):
        print("{} sleep success".format(result))

进程间通信

使用Queue

因为多进程之间是资源隔离的，所以不能使用queue里的Queue，可以使用multiprocessing提供的Queue

import multiprocessing
import time


def producer(queue):
    queue.put("pro")
    time.sleep(2)


def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)


if __name__ == '__main__':
    queue = multiprocessing.Queue()
    my_producer = multiprocessing.Process(target=producer, args=(queue, ))
    my_consumer = multiprocessing.Process(target=consumer, args=(queue, ))
    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()

pro

但这种无法在进程池之间使用。进程池之间我们可以使用另一种对象。

使用Manager中的Queue实现进程间的通信(进程池)

    queue = multiprocessing.Manager().Queue(10)
    pool = multiprocessing.Pool()
    pool.apply_async(producer, args=(queue, ))
    pool.apply_async(consumer, args=(queue, ))
    pool.close()
    pool.join()

pro

其实我们进程间的通信还有很多，如管道、缓存、消息队列等等。具体根据实际业务和软硬件环境去选择。上述就是比较基本的进程通信方法。

使用`concurrent`实现多进程

这个包我们上面也提过，可以轻松实现多线程。其实同样也可以轻松实现多进程，他比我们上面使用的multiprocessing要更加便捷，使用如下。

from concurrent.futures import ThreadPoolExecutor, as_completed, ProcessPoolExecutor
import time


def start_work(index):
    time.sleep(1)
    print("这是线程任务 {}".format(index))
    return index+1


if __name__ == '__main__':
    executor = ProcessPoolExecutor(max_workers=2)

    task_list = range(5)
    for data in executor.map(start_work, task_list):
        print("task{}任务的返回值为{}".format(data, data))

这是进程任务 0
task1任务的返回值为1
这是进程任务 1
task2任务的返回值为2
这是进程任务 2
task3任务的返回值为3
这是进程任务 3
task4任务的返回值为4
这是进程任务 4
task5任务的返回值为5

达达爱吃肉

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
快速掌握使用 python多线程、多进程

前言随着计算机硬件的发展，不再是以前单核时代。我相信任何一门语言，多线程和多进程都是重中之重。这篇文章将全面系统的对python多进程与多线程进行理解及应用。gil锁对于线程和进程的基本知识大家应当都足够清楚了。由于这篇文章着重是对应用的讲解，所以这里就不过多进行介绍。什么是gil 锁其实gil 锁这种叫法并不合理。 gloabal interperter lock 则是gil的缩写，他是一...
复制链接

扫一扫