Python多进程探究

最新推荐文章于 2024-02-21 17:31:39 发布

Z_J_Q_

最新推荐文章于 2024-02-21 17:31:39 发布

阅读量540

点赞数

本文探讨了Python中使用multiprocessing.Pool进行多进程任务时，如何有效处理日志记录的问题。详细分析了QueueHandler与QueueListener在多线程与多进程环境下的应用，以及在子进程中遇到的死锁问题，并提出了解决方案。

摘要由CSDN通过智能技术生成

Python的multiprocessing.Pool可以方便地创建进程池，提高程序并行性，可以用不同的进程来运行不同的服务程序。

处理日志是Web服务器的一个重要功能，Python的原生logging模块提供了多种日志处理方式（logging.handlers），其中
QueueHandler与QueueListener结合使用，可以让日志处理和写日志在不同的线程上运行，这对于Web应用非常重要，可以使服务客户端的线程尽可能快地响应，避免被日志处理这种比较慢的任务拖累。

但多个进程写日志容易出问题。

import logging
from threading import Thread
from queue import Queue
from logging.handlers import QueueListener, QueueHandler
from multiprocessing import Pool

def setup_logging():
    # 日志写在队列上，一个线程从队列中读取并写到文件中
    _log_queue = Queue()
    QueueListener(
        _log_queue, logging.FileHandler("out.log")).start()
    logging.getLogger().addHandler(QueueHandler(_log_queue))

    # 父进程有一个线程在不断写日志
    def write_logs():
        while True:
            logging.error("hello, I just did something")
    Thread(target=write_logs).start()

def runs_in_subprocess():
    print("About to log...")
    logging.error("hello, I did something")
    print("...logged")

if __name__ == '__main__':
    setup_logging()

    # 同时启动一个进程池写日志
    while True:
        with Pool() as pool:
            pool.apply(runs_in_subprocess)

这段程序的输出结果是

About to log...
...logged
About to log...
...logged
About to log...
（阻塞）

这里需要明白系统启动子进程的过程

用fork()系统调用复制进程
子进程通过execve()（execl()）系统调用启动新程序

但我们可以只调用fork()，比如：

from os import fork, getpid

print("I am parent process", getpid())
if fork():
    print("I am the parent process, with PID", getpid())
else:
    print("I am the child process, with PID", getpid())

输出为：

I am parent process 666
I am the parent process, with PID 666
I am the child process, with PID 681

我们可以看到父进程和子进程运行了同样的Python代码。

而Python创建进程池默认只是调用了fork()，于是子进程可以访问到父进程内存的拷贝，包括import的模块的对象。

import logging
from multiprocessing import Pool
from os import getpid

def runs_in_subprocess():
    logging.info(
        "I am the child, with PID {}".format(getpid()))

if __name__ == '__main__':
    logging.basicConfig(
        format='Niffler %(message)s', level=logging.DEBUG)

    logging.info(
        "I am the parent, with PID {}".format(getpid()))

    with Pool() as pool:
        pool.apply(runs_in_subprocess)

输出为：

Niffler I am the parent, with PID 1730
Niffler I am the child, with PID 1931

我们可以看到进程池中的子进程继承了父进程logging模块的设置。

但是fork()并不会复制父进程的线程。

from threading import Thread, enumerate
from os import fork
from time import sleep

# Start a thread:
Thread(target=lambda: sleep(60)).start()

if fork():
    print("The parent process has {} threads".format(
        len(enumerate())))
else:
    print("The child process has {} threads".format(
        len(enumerate())))

输出为：

The parent process has 2 threads
The child process has 1 threads

让我们回到第一个程序：

import logging
from threading import Thread
from queue import Queue
from logging.handlers import QueueListener, QueueHandler
from multiprocessing import Pool

def setup_logging():
    # 日志写在队列上，一个线程从队列中读取并写到文件中
    _log_queue = Queue()
    QueueListener(
        _log_queue, logging.FileHandler("out.log")).start()
    logging.getLogger().addHandler(QueueHandler(_log_queue))

    # 父进程有一个线程在不断写日志
    def write_logs():
        while True:
            logging.error("hello, I just did something")
    Thread(target=write_logs).start()

def runs_in_subprocess():
    print("About to log...")
    logging.error("hello, I did something")
    print("...logged")

if __name__ == '__main__':
    setup_logging()

    # 同时启动一个进程池写日志
    while True:
        with Pool() as pool:
            pool.apply(runs_in_subprocess)

父进程的write_logs线程每次写日志都会添加消息到队列_log_queue，需要获得锁，而pool.apply创建子进程调用fork()会把锁变量复制过来，可能会复制到状态为acquired的锁，而runs_in_subprocess写日志也要添加消息到队列_log_queue，需要等待锁的释放，但是父进程的write_logs线程并没有复制过来，子进程的锁永远不会被释放，造成永久阻塞。

from os import fork
from threading import Lock

lock = Lock()
# 父进程获得锁
lock.acquire()

if fork() == 0:
    print("Acquiring lock...")
    lock.acquire()
    # 子进程不会获得锁
    print("Lock acquired! (This code will never run)")

输出为：

Acquiring lock...

有折衷方法可以解决这个问题。logging模块支持创建子进程时重置logging模块的设置；Python支持调用fork()时将锁设回released状态，但不支持C库创建的锁。

还有一个更好的方法，Python3 multiprocessing模块添加了一些创建子进程的新方法，其中一个在fork()之后加上execve()，这样子进程就会创建新程序，父进程的模块的状态就不会被继承，只需要在创建进程池之前简单设置一下。

import multiprocessing
multiprocessing.set_start_method('spawn')

吐槽一下 Python 混乱的 threading 和 multiprocessing

Z_J_Q_

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫