python多线程、多进程和线程池编程

最新推荐文章于 2024-06-17 17:35:06 发布

NickDeCodes

最新推荐文章于 2024-06-17 17:35:06 发布

阅读量655

点赞数 10

分类专栏： python 文章标签： python 多线程多进程

本文链接：https://blog.csdn.net/NickDeCodes/article/details/138294760

版权

python 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

python多线程、多进程和线程池编程

python 中的 GIL

在 Python 中，GIL 或全局解释器锁（Global Interpreter Lock）是一个在 CPython 解释器中实现的机制，它确保任何时候只有一个线程可以执行 Python 字节码。由于 GIL 的存在，即使在多核处理器上，使用多线程的 Python 程序也不能实现真正的并行执行。

GIL 的目的和影响

目的：

内存管理安全：Python 使用引用计数来管理内存，这种机制在多线程环境下容易出错。GIL 确保在执行 Python 对象的创建和销毁时不会发生竞态条件。
简化 CPython 实现：GIL 使得 CPython 的实现更简单，因为开发者不需要在每个对象操作上实现锁。

影响：

多线程性能限制：在 CPU 密集型应用中，多线程的 Python 程序可能无法有效利用多核处理器的优势，因为 GIL 会导致线程序列化执行。
适用性：GIL 的存在使得 Python 更适合于 I/O 密集型任务而不是 CPU 密集型任务。

如何应对 GIL

尽管 GIL 存在一些限制，但仍有几种方法可以优化 Python 应用的性能或绕过 GIL 的限制：

多进程：
- 使用 multiprocessing 模块可以创建多个进程，每个进程拥有自己的 Python 解释器和内存空间，因此不受 GIL 的限制。
- 适合 CPU 密集型任务。
替代解释器：
- 使用不具有 GIL 的 Python 解释器，如 PyPy（部分实现了无 GIL）、Jython 和 IronPython。
使用 C 扩展：
- 编写 C 扩展可以允许你在 C 语言级别管理锁，从而在执行密集计算时释放 GIL。
- 使用像 Cython 这样的工具可以更容易地编写 C 扩展。
并发库：
- 利用像 concurrent.futures 这样的库来简化线程和进程的管理。
- 使用异步编程模型（如 asyncio），这对于 I/O 密集型应用非常有效。

GIL 的未来

GIL 一直是 Python 社区中的一个热门话题。虽然完全移除 GIL 对现有的 CPython 生态系统是一个巨大的挑战，但社区不断探索改进的方法。例如，Larry Hastings 在 PyCon 2016 上提出的 Gilectomy（尝试去除 GIL）项目，尽管它没有成为主流，但它展示了去除 GIL 的可能性和挑战。

总之，虽然 GIL 有其局限性，但通过适当的工具和策略，可以有效地管理和优化 Python 应用的并行性能。

多线程编程 - threading

在 Python 中，threading 模块提供了一个高级的接口来创建和管理线程。使用 threading 模块，你可以在同一个程序中运行多个线程，这对于执行 I/O 绑定或其他阻塞操作的任务非常有用，因为它可以提高程序的整体效率和响应性。

基本概念

线程是操作系统能够进行运算调度的最小单位。它被包含在进程之中，是进程中的实际运作单位。在 Python 中，threading 模块允许你创建线程、同步多个线程的执行，并共享数据。

创建线程

使用 threading.Thread 类来创建一个新的线程。你可以通过将一个可调用的类或函数传给 Thread 的构造器来指定线程执行的任务。

示例：使用函数创建线程

import threading

def print_numbers():
    for i in range(1, 11):
        print(i)

# 创建线程
thread = threading.Thread(target=print_numbers)

# 启动线程
thread.start()

# 等待线程完成
thread.join()

print("Finished all threads")

示例：使用类创建线程

import threading

class MyThread(threading.Thread):
    def run(self):
        for i in range(1, 11):
            print(i)

# 创建线程实例
thread = MyThread()

# 启动线程
thread.start()

# 等待线程完成
thread.join()

print("Finished all threads")

线程同步

在多线程环境中，同步是非常重要的，因为它可以防止多个线程写入同一片内存区域。常见的同步机制包括锁（Lock）、事件（Event）、条件（Condition）和信号量（Semaphore）。

使用锁（Lock）

import threading

# 创建一个锁对象
lock = threading.Lock()

def print_numbers():
    lock.acquire()  # 获取锁
    try:
        for i in range(1, 11):
            print(i)
    finally:
        lock.release()  # 释放锁

# 创建线程
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_numbers)

# 启动线程
thread1.start()
thread2.start()

# 等待线程完成
thread1.join()
thread2.join()

print("Finished all threads")

注意事项

避免死锁：确保每个锁最终都会被释放，否则可能导致程序挂起。
GIL 的影响：在 CPython 中，由于 GIL 的存在，即使使用多线程，CPU 密集型任务的性能也可能不会提升。对于这类任务，考虑使用多进程（multiprocessing 模块）。

通过合理使用 threading 模块，你可以在 Python 中有效地管理线程，提升程序的并发性和响应速度。

线程间通信 - 共享变量和 Queue

在多线程编程中，线程间通信是一个重要的话题。Python 提供了多种方式来实现线程间的通信，其中最常见的是通过共享变量和使用 queue.Queue 类。

共享变量

线程可以通过共享变量来交换信息。由于线程在同一个进程内运行，它们共享相同的内存空间，因此可以直接访问相同的数据结构。然而，访问共享变量时需要特别注意同步问题，以避免竞态条件和数据不一致。

示例：使用共享变量

import threading

# 共享变量
shared_data = []

# 线程函数
def append_to_list(value):
    global shared_data
    shared_data.append(value)

# 创建线程
thread1 = threading.Thread(target=append_to_list, args=(1,))
thread2 = threading.Thread(target=append_to_list, args=(2,))

# 启动线程
thread1.start()
thread2.start()

# 等待线程结束
thread1.join()
thread2.join()

print(shared_data)  # 输出可能不是 [1, 2] 或 [2, 1]，取决于线程执行的顺序和时间

在使用共享变量时，可能需要使用锁（如 threading.Lock）来确保线程安全。

使用 `queue.Queue`

queue.Queue 是一个线程安全的队列实现，用于多线程编程，可以用来安全地从一个线程向另一个线程发送数据。它提供了基本的 FIFO（先进先出）功能，还有 LIFO（后进先出，即栈）和优先级队列等变体。

示例：使用 `queue.Queue` 进行线程间通信

import threading
import queue

# 创建队列
q = queue.Queue()

# 生产者线程函数
def producer():
    for i in range(5):
        q.put(i)
        print(f"Produced {i}")

# 消费者线程函数
def consumer():
    while True:
        item = q.get()
        print(f"Consumed {item}")
        q.task_done()

# 创建线程
producer_thread = threading.Thread(target=producer)
consumer_thread = threading.Thread(target=consumer)

# 启动线程
producer_thread.start()
consumer_thread.start()

# 等待生产者线程结束
producer_thread.join()

# 等待队列被消费完
q.join()

# 停止消费者线程
consumer_thread.join()

在这个例子中，生产者线程向队列中放入数据，而消费者线程从队列中取出数据。q.get() 方法在队列为空时会阻塞，直到队列中有数据可用。

小结

共享变量：简单直接，但需要处理同步和线程安全问题。
queue.Queue：提供了一个线程安全的队列实现，适合作为线程间通信的首选方法，尤其是在生产者-消费者模型中。

使用适当的同步机制或线程安全的数据结构可以有效地解决多线程编程中的线程间通信问题。

线程同步 - Lock、RLock

在 Python 多线程编程中，线程同步是确保多个线程能够安全地访问共享资源或执行关联操作的一种机制。threading 模块提供了多种同步原语，其中 Lock 和 RLock（可重入锁）是最基本的同步机制。

Lock（互斥锁）

Lock 是一个基础的同步原语，用于在多个线程间实现互斥访问（即一次只有一个线程可以访问共享资源）。Lock 对象有两种状态：锁定和未锁定。它支持以下两个主要方法：

acquire()：如果锁是未锁定的，那么将其状态改为锁定并立即返回。如果锁已经被锁定，那么该方法会阻塞调用线程，直到锁被释放。
release()：将锁的状态改为未锁定并返回。只有锁定锁的线程才可以释放它。

示例：使用 `Lock`

import threading

# 创建一个锁对象
lock = threading.Lock()

def function_with_lock():
    lock.acquire()
    try:
        # 执行需要互斥访问的代码
        print(f"{threading.current_thread().name} has acquired the lock.")
    finally:
        lock.release()
        print(f"{threading.current_thread().name} has released the lock.")

# 创建线程
thread1 = threading.Thread(target=function_with_lock)
thread2 = threading.Thread(target=function_with_lock)

# 启动线程
thread1.start()
thread2.start()

# 等待线程完成
thread1.join()
thread2.join()

RLock（可重入锁）

RLock（Reentrant Lock）是一种特殊类型的锁，允许同一个线程多次获得锁。这是通过维护一个锁定计数和所有者线程的标识来实现的。如果当前线程试图再次获得锁，计数将增加，线程不会被阻塞。当线程完成一次释放操作时，计数减少。只有当计数达到零时，锁才真正被释放。

示例：使用 `RLock`

import threading

# 创建一个可重入锁
rlock = threading.RLock()

def recursive_locking(n):
    rlock.acquire()
    try:
        print(f"Recursion level {n}, {threading.current_thread().name} has acquired the lock.")
        if n < 3:
            recursive_locking(n + 1)
    finally:
        rlock.release()
        print(f"Recursion level {n}, {threading.current_thread().name} has released the lock.")

# 创建线程
thread = threading.Thread(target=recursive_locking, args=(1,))

# 启动线程
thread.start()

# 等待线程完成
thread.join()

总结

Lock：适用于需要简单互斥的场景。
RLock：适用于同一个线程需要多次获取同一锁的场景，例如递归调用。

正确使用这些锁可以帮助避免多线程编程中的竞态条件和死锁问题，保证数据的一致性和程序的稳定性。

线程同步 - condition 使用以及源码分析

Condition 对象是 Python threading 模块中一个用于复杂线程同步的工具，它在内部维护一个锁（通常是 Lock 或 RLock），并支持在这个锁上等待或发出通知。Condition 对象允许一个或多个线程等待某些条件，而另一些线程可以通知这些等待的线程条件已经满足。

Condition 使用

Condition 主要提供以下方法：

acquire() 和 release()：用于获取和释放内部的锁。
wait(timeout=None)：使线程进入等待状态，直到被通知或发生超时。
notify(n=1)：唤醒一个或多个正在等待这个条件变量的线程。
notify_all()：唤醒所有等待这个条件变量的线程。

示例：使用 `Condition`

假设有一个生产者-消费者场景，其中生产者生产数据并将其放入缓冲区，消费者从缓冲区中取出数据。

import threading
import time

# 创建一个 Condition 对象
condition = threading.Condition()
buffer = []

def producer():
    with condition:
        print("Producer is producing data...")
        buffer.append("Data")
        print("Producer added data to buffer.")
        condition.notify()

def consumer():
    with condition:
        if not buffer:
            print("Consumer is waiting for data...")
            condition.wait()
        data = buffer.pop()
        print(f"Consumer consumed {data}.")

# 创建线程
producer_thread = threading.Thread(target=producer)
consumer_thread = threading.Thread(target=consumer)

# 启动线程
consumer_thread.start()
time.sleep(2)  # 确保消费者线程先运行并等待
producer_thread.start()

# 等待线程完成
producer_thread.join()
consumer_thread.join()

源码分析

Condition 的实现基于低级的锁（Lock 或 RLock），并使用一个或多个等待队列来管理等待这个条件的线程。下面是一个简化的 Condition 类的实现概览：

class Condition:
    def __init__(self, lock=None):
        if lock is None:
            lock = RLock()
        self._lock = lock
        self._waiters = []

    def acquire(self, *args):
        return self._lock.acquire(*args)

    def release(self):
        return self._lock.release()

    def wait(self, timeout=None):
        waiter = _allocate_lock()
        waiter.acquire()
        self._waiters.append(waiter)
        self.release()
        try:
            if timeout is None:
                waiter.acquire()
            else:
                waiter.acquire(timeout=timeout)
        finally:
            self.acquire()

    def notify(self, n=1):
        all_waiters = self._waiters
        waiters_to_notify = all_waiters[:n]
        for waiter in waiters_to_notify:
            waiter.release()
            all_waiters.remove(waiter)

    def notify_all(self):
        self.notify(len(self._waiters))

这里的 _waiters 列表包含了所有等待条件的线程。当条件变量的 wait() 方法被调用时，线程会释放锁并将自己加入到等待队列。当 notify() 或 notify_all() 被调用时，等待队列中的线程会被唤醒。

通过这种方式，Condition 提供了一种机制，允许线程等待某些条件的发生，并在条件满足时接收通知，从而继续执行。这是实现复杂线程间同步和通信的有效工具。

线程同步 - Semaphore 使用以及源码分析

Semaphore（信号量）

信号量是一种高级的同步机制，用于控制对共享资源的访问数量。在 Python 中，Semaphore 类是通过 threading 模块提供的，它可以帮助实现对有限数量资源的访问控制。

使用 `Semaphore`

信号量维护一个内部计数器，该计数器由每个 acquire() 调用减一，每个 release() 调用加一。当计数器值为零时，acquire() 调用会阻塞，直到其他线程调用 release()。

import threading
import time

# 创建一个信号量，最大允许3个线程同时访问
semaphore = threading.Semaphore(3)

def access_resource(i):
    print(f"Thread {i} is trying to access the resource.")
    semaphore.acquire()
    print(f"Thread {i} has accessed the resource.")
    time.sleep(2)  # 模拟资源使用时间
    print(f"Thread {i} is releasing the resource.")
    semaphore.release()

# 创建多个线程
threads = [threading.Thread(target=access_resource, args=(i,)) for i in range(5)]

# 启动所有线程
for thread in threads:
    thread.start()

# 等待所有线程完成
for thread in threads:
    thread.join()

在这个例子中，最多有三个线程可以同时访问某个资源（例如，数据库连接、文件等）。

源码分析

Semaphore 的实现基于较低级别的同步原语。下面是一个简化的 Semaphore 类的实现概览：

class Semaphore:
    def __init__(self, value=1):
        if value < 0:
            raise ValueError("semaphore initial value must be >= 0")
        self._value = value
        self._lock = threading.Lock()
        self._waiters = []

    def acquire(self, blocking=True, timeout=None):
        with self._lock:
            if self._value > 0:
                self._value -= 1
                return True
            if not blocking:
                return False
            waiter = threading.Condition(self._lock)
            self._waiters.append(waiter)
            waiter.wait(timeout)
            if waiter in self._waiters:
                self._waiters.remove(waiter)
                return False
            self._value -= 1
            return True

    def release(self):
        with self._lock:
            if self._waiters:
                waiter = self._waiters.pop(0)
                waiter.notify()
            else:
                self._value += 1

在这个简化版本中：

_value 是信号量的当前值。
_lock 是一个锁，用于保护对信号量值的访问。
_waiters 是一个等待队列，存储等待信号量的线程。

当线程尝试通过 acquire() 方法获取信号量时，如果当前信号量的值大于零，则直接减少该值并允许线程继续执行。如果信号量的值为零，则线程会被添加到等待队列 _waiters 中，并在条件变量上等待。当其他线程调用 release() 方法时，它会检查等待队列，如果队列不为空，则唤醒队列中的一个线程。

通过这种方式，Semaphore 提供了一种有效的机制来限制对共享资源的并发访问，这对于保护资源免受过度使用或潜在冲突非常有用。

ThreadPoolExecutor 线程池

ThreadPoolExecutor 是 Python concurrent.futures 模块中的一个类，它提供了一个高级的异步执行接口，用于执行可调用的对象。通过使用线程池，ThreadPoolExecutor 允许你管理并发执行的线程，从而优化和简化线程的使用和管理。

使用 ThreadPoolExecutor

ThreadPoolExecutor 主要通过两个方法来执行任务：submit() 和 map()。submit() 方法用于提交一个可调用对象和参数到线程池，并返回一个 Future 对象，该对象代表了异步执行的操作。map() 方法则用于对给定的可迭代对象中的每个元素应用一个函数。

以下是使用 ThreadPoolExecutor 的一个基本示例：

import concurrent.futures
import time

def task(n):
    print(f"Processing {n}")
    time.sleep(n)
    return f"Task {n} completed"

# 创建 ThreadPoolExecutor
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    # 使用 submit 提交单个任务
    future = executor.submit(task, (2))
    print(future.result())

    # 使用 map 处理一系列的任务
    results = executor.map(task, [1, 3, 2])
    for result in results:
        print(result)

在这个例子中，max_workers=3 指定了线程池中最多可同时运行的线程数量。submit() 方法提交了一个任务，而 map() 方法则对一个列表中的每个元素执行了相同的任务。

ThreadPoolExecutor 的工作原理

ThreadPoolExecutor 内部维护了一个线程池和一个任务队列。当你提交任务时，这些任务会被放入队列中。线程池中的线程会从队列中取出任务并执行。

线程池：线程池是一组预先分配的线程，它们可以被用来执行任意数量的任务。
任务队列：这是一个线程安全的队列，用于存储等待执行的任务。

源码分析简览

ThreadPoolExecutor 的实现较为复杂，涉及多个组件，但以下是一些关键点：

from concurrent.futures import ThreadPoolExecutor, as_completed
import threading
import queue

class ThreadPoolExecutor:
    def __init__(self, max_workers=None):
        self._max_workers = max_workers if max_workers is not None else (os.cpu_count() or 1)
        self._work_queue = queue.Queue()
        self._threads = set()
        self._shutdown = False
        self._init_workers()

    def _init_workers(self):
        for _ in range(self._max_workers):
            t = threading.Thread(target=self._worker)
            t.start()
            self._threads.add(t)

    def _worker(self):
        while not self._shutdown:
            task = self._work_queue.get()
            if task is not None:
                func, args, result_future = task
                try:
                    result = func(*args)
                    result_future.set_result(result)
                except Exception as e:
                    result_future.set_exception(e)
                self._work_queue.task_done()

    def submit(self, fn, *args, **kwargs):
        if self._shutdown:
            raise RuntimeError("Cannot submit new tasks to a shutdown executor")
        future = Future()
        self._work_queue.put((fn, args, future))
        return future

    def shutdown(self, wait=True):
        self._shutdown = True
        if wait:
            for t in self._threads:
                t.join()

这个简化的版本展示了 ThreadPoolExecutor 的核心功能：初始化线程池，从任务队列中提取任务，并执行这些任务。每个任务的结果通过 Future 对象来处理，它允许异步获取结果。

ThreadPoolExecutor 源码分析

要深入了解 ThreadPoolExecutor 的源码，我们需要考虑 Python 标准库中 concurrent.futures 模块的实现。这里，我将提供一个简化的分析，以帮助理解 ThreadPoolExecutor 的关键组件和工作机制。

ThreadPoolExecutor 的主要组件

线程池管理：
ThreadPoolExecutor 管理一个线程池，这些线程用来执行提交给它的任务。
任务队列：
所有提交的任务首先进入一个线程安全的队列（通常是 queue.Queue），等待线程池中的线程来执行它们。
Future 对象：
每个提交的任务都会关联一个 Future 对象，这个对象代表了异步执行的操作，可以用来获取任务的结果。

源码分析

以下是 ThreadPoolExecutor 的核心方法的简化版本，这些方法展示了如何初始化线程池、处理任务和关闭线程池。

from concurrent.futures import Executor, Future
import threading
import queue
import os

class ThreadPoolExecutor(Executor):
    def __init__(self, max_workers=None):
        if max_workers is None:
            max_workers = (os.cpu_count() or 1) * 5
        self._max_workers = max_workers
        self._work_queue = queue.Queue()
        self._threads = set()
        self._shutdown = False
        self._shutdown_lock = threading.Lock()

        # 初始化线程池
        for _ in range(max_workers):
            thread = threading.Thread(target=self._worker)
            thread.start()
            self._threads.add(thread)

    def _worker(self):
        while True:
            try:
                task = self._work_queue.get(block=True)
                if task is None:
                    # 收到 None 作为任务，表示线程应该退出
                    return
                func, args, future = task
                try:
                    result = func(*args)
                    future.set_result(result)
                except Exception as e:
                    future.set_exception(e)
            finally:
                self._work_queue.task_done()

    def submit(self, fn, *args, **kwargs):
        if self._shutdown:
            raise RuntimeError("Cannot schedule new futures after shutdown")
        future = Future()
        task = (fn, args, future)
        self._work_queue.put(task)
        return future

    def shutdown(self, wait=True):
        with self._shutdown_lock:
            self._shutdown = True
            # 发送 None 作为信号让所有线程退出
            for _ in self._threads:
                self._work_queue.put(None)

            if wait:
                for thread in self._threads:
                    thread.join()

关键点解释

初始化：在构造函数中，根据指定的 max_workers 创建线程，并启动它们。每个线程运行 _worker 方法。
任务执行：_worker 方法从任务队列中获取任务，执行它，并将结果或异常设置到相应的 Future 对象中。
任务提交：submit 方法创建一个 Future 对象，将任务打包（函数和参数），然后放入队列。
关闭线程池：shutdown 方法首先标记线程池为关闭状态，然后向队列中添加 None 作为信号让线程退出。如果 wait 参数为真，则等待所有线程完成。

这个简化的源码提供了对 ThreadPoolExecutor 工作机制的基本理解，帮助我们看到如何使用线程和队列来管理并发执行任务。在实际的 Python 标准库中，ThreadPoolExecutor 的实现会更复杂，包括更多的错误处理和优化措施。

multiprocessing 多进程编程

multiprocessing 是 Python 中一个用于创建多进程的模块，它允许程序员充分利用多核处理器。这个模块提供了与 threading 模块相似的 API，但是 multiprocessing 模块避开了全局解释器锁（GIL）的限制，可以实现真正的并行计算。

multiprocessing 的主要特性

进程（Process）：
multiprocessing.Process 类与 threading.Thread 类似，用于创建和管理单独的进程。
进程间通信（IPC）：
模块提供了多种方式来实现进程间通信，最常见的包括管道（Pipes）和队列（Queues）。这些都是线程和进程安全的。
共享数据：
multiprocessing 提供了共享内存的方式，允许进程间共享数据。例如，可以使用 Value 或 Array 存储数据。
进程池（Pool）：
Pool 类用于管理一组工作进程，它提供了一个简便的方式来并行地执行多个任务。

示例代码

以下是使用 multiprocessing 的一个基础例子，展示如何创建进程、使用队列进行进程间通信，以及如何使用进程池。

创建进程

import multiprocessing

def worker(number):
    print(f'Worker {number} is working.')

if __name__ == '__main__':
    # 创建进程
    processes = []
    for i in range(5):
        process = multiprocessing.Process(target=worker, args=(i,))
        processes.append(process)
        process.start()
    
    # 等待所有进程完成
    for process in processes:
        process.join()

使用队列进行进程间通信

from multiprocessing import Process, Queue

def sender(queue):
    queue.put("Hello from sender!")

def receiver(queue):
    msg = queue.get()
    print(f"Received message: {msg}")

if __name__ == '__main__':
    queue = Queue()
    p1 = Process(target=sender, args=(queue,))
    p2 = Process(target=receiver, args=(queue,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

使用进程池

from multiprocessing import Pool

def square(x):
    return x * x

if __name__ == '__main__':
    with Pool(5) as p:
        results = p.map(square, range(10))
    print(results)

注意事项

在 Windows 和 MacOS 系统上使用 multiprocessing 时，必须在 "__main__" 下运行代码，以避免无限创建子进程的问题。
进程间的数据不共享内存，这意味着在一个进程中对数据的修改不会影响到其他进程。如果需要共享状态，可以使用 multiprocessing 的共享内存工具或者服务器进程。
同步操作（如锁）在 multiprocessing 中也是必要的，尤其是在涉及共享资源时。

通过使用 multiprocessing 模块，Python 程序可以有效地利用多核处理器来提高性能，尤其是在进行密集计算或需要高并发的场景中。

进程间通信 - Queue、Pipe，Manager

在 Python 的 multiprocessing 模块中，进程间通信（IPC）是一个重要的功能，因为不同的进程拥有各自独立的内存空间，直接的变量共享是不可能的。multiprocessing 提供了几种机制来实现进程间的通信，其中最常用的有 Queue、Pipe 和 Manager。

1. Queue（队列）

队列是最常用的进程间通信方式之一，它允许多个进程放入和取出数据。队列在 multiprocessing 中是线程和进程安全的。

示例代码

from multiprocessing import Process, Queue

def worker(queue, data):
    queue.put(data)
    print(f"Data {data} put into queue")

def consumer(queue):
    data = queue.get()
    print(f"Data {data} received from queue")

if __name__ == '__main__':
    queue = Queue()
    p1 = Process(target=worker, args=(queue, "Hello"))
    p2 = Process(target=consumer, args=(queue,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

2. Pipe（管道）

管道提供了一种双向数据通信的方式，通过 Pipe() 函数返回一对连接对象，默认是双向的（可以设置为单向）。通常用于两个进程间的通信。

示例代码

from multiprocessing import Process, Pipe

def sender(conn):
    conn.send("Hello from sender")
    conn.close()

def receiver(conn):
    print(f"Received message: {conn.recv()}")
    conn.close()

if __name__ == '__main__':
    parent_conn, child_conn = Pipe()
    p1 = Process(target=sender, args=(parent_conn,))
    p2 = Process(target=receiver, args=(child_conn,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

3. Manager（管理器）

Manager 提供了一种更高级的方式来管理进程间的数据共享。它支持列表、字典、Namespace、Lock、RLock、Semaphore 等多种类型。通过创建一个服务器进程，来管理这些对象，其他进程通过代理与其通信。

示例代码

from multiprocessing import Process, Manager

def worker(d, l):
    d[1] = '1'
    d['2'] = 2
    d[0.25] = None
    l.reverse()

if __name__ == '__main__':
    with Manager() as manager:
        # 创建一个共享的字典和列表
        d = manager.dict()
        l = manager.list(range(10))
        
        p = Process(target=worker, args=(d, l))
        p.start()
        p.join()
        
        print(d)
        print(l)

注意事项

使用 Queue 和 Pipe 时，确保数据对象是可序列化的，因为这些数据需要通过序列化/反序列化过程在进程间传输。
Manager 对象相比直接使用 Queue 或 Pipe 在性能上可能会有所下降，因为它涉及更多的机制来维护进程安全。
在使用这些通信方式时，要注意避免死锁的情况，特别是在复杂的数据交换和多进程协作时。

这些进程间通信机制为在 Python 中实现多进程编程提供了强大的支持，使得数据在不同进程间的传递和管理变得可行和高效。

NickDeCodes

关注

10
点赞
踩
16

收藏

觉得还不错? 一键收藏
1
评论
python多线程、多进程和线程池编程

Lock：适用于需要简单互斥的场景。RLock：适用于同一个线程需要多次获取同一锁的场景，例如递归调用。正确使用这些锁可以帮助避免多线程编程中的竞态条件和死锁问题，保证数据的一致性和程序的稳定性。
复制链接

扫一扫