Python 多进程好很多

最新推荐文章于 2024-09-11 11:08:16 发布

Tim（杨霆）

最新推荐文章于 2024-09-11 11:08:16 发布

阅读量4.8k

点赞数 3

分类专栏： Timen_Python 文章标签： python 多进程数据对比测试

本文链接：https://blog.csdn.net/Temanm/article/details/53444365

版权

Timen_Python 专栏收录该内容

35 篇文章 0 订阅

订阅专栏

摘要：

Unix/Linux操作系统提供了一个fork()系统调用，它非常特殊。普通的函数调用，调用一次，返回一次，但是fork()调用一次，返回两次，因为操作系统自动把当前进程（称为父进程）复制了一份（称为子进程），然后，分别在父进程和子进程内返回。子进程永远返回0，而父进程返回子进程的ID。这样做的理由是，一个父进程可以fork出很多子进程，所以，父进程要记下每个子进程的ID，而子进程只需要调用getppid()就可以拿到父进程的ID。

前文回顾
Python 多线程
Multiprocessing Lock
Multiprocessing Semaphore
Multiprocessing Event
Multiprocessing Queue and Pipe
Multiprocessing Pool
Python 多进程数据对比测试

正文：
一. 前文回顾
1.1 前言
上一篇博客中写了《Python 多线程是多鸡肋》一文，感觉多线程并没有真正意义上的实现了并发，进而尝试使用多进程来实现上文的数据对比测试，从而分析测试结果。

二. Python 多线程
2.1 讲解
Pyhton实现多进程用到了 multiprocessing 模块，如果你打算编写多进程的服务程序，Unix/Linux无疑是正确的选择。由于Windows没有fork调用，难道在Windows上无法用Python编写多进程的程序？由于Python是跨平台的，自然也应该提供一个跨平台的多进程支持。multiprocessing模块就是跨平台版本的多进程模块。

multiprocessing模块提供了一个Process类来代表一个进程对象，下面的例子演示了启动一个子进程并等待其结束：

# -*- coding:utf-8 -*-
from multiprocessing import Process
import os

# 子进程要执行的代码
def run_proc(name):
    print 'Run child process %s (%s)...' % (name, os.getpid())

if __name__=='__main__':
    print 'Parent process %s.' % os.getpid()
    p = Process(target=run_proc, args=('test',))
    print 'Process will start.'
    p.start()
    p.join()
    print 'Process end.'

执行结果：

Parent process 928.
Process will start.
Run child process test (929)...
Process end.

三. Multiprocessing Lock
当多个进程需要访问共享资源的时候，Lock可以用来避免访问的冲突。主要用到了lock.acquire() 和lock.release()

# -*- coding:utf-8 -*-
import multiprocessing  
import sys  

def worker_with(lock, f):  
    with lock:  
        fs = open(f,"a+")  
        fs.write('Lock acquired via with\n')  
        fs.close()  

def worker_no_with(lock, f):  
    lock.acquire()  
    try:  
        fs = open(f,"a+")  
        fs.write('Lock acquired directly\n')  
        fs.close()  
    finally:  
        lock.release()  

if __name__ == "__main__":  

    f = "file.txt"  

    lock = multiprocessing.Lock()  
    w = multiprocessing.Process(target=worker_with, args=(lock, f))  
    nw = multiprocessing.Process(target=worker_no_with, args=(lock, f))  

    w.start()  
    nw.start()  

    w.join()  
    nw.join()

四. Multiprocessing Semaphore
Semaphore用来控制对共享资源的访问数量，例如池的最大连接数。

# -*- coding:utf-8 -*-
import multiprocessing  
import time   

def worker(s,i):  
    s.acquire()  
    print(multiprocessing.current_process().name + " acquire")  
    time.sleep(i)  
    print(multiprocessing.current_process().name + " release")  
    s.release()  

if __name__ == "__main__":  

    s = multiprocessing.Semaphore(2)  
    for i in range(5):  
        p = multiprocessing.Process(target=worker, args=(s,i*2))  
        p.start()

五. Multiprocessing Event
Event用来实现进程间同步通信。

# -*- coding:utf-8 -*-
import multiprocessing  
import time  

def wait_for_event(e):  
    """Wait for the event to be set before doing anything"""  
    print ('wait_for_event: starting')  
    e.wait()  
    print ('wait_for_event: e.is_set()->' + str(e.is_set()))  

def wait_for_event_timeout(e, t):  
    """Wait t seconds and then timeout"""  
    print ('wait_for_event_timeout: starting')  
    e.wait(t)  
    print ('wait_for_event_timeout: e.is_set()->' + str(e.is_set()))  


if __name__ == '__main__':  
    e = multiprocessing.Event()  
    w1 = multiprocessing.Process(name='block',   
                                 target=wait_for_event,  
                                 args=(e,))  
    w1.start()  

    w2 = multiprocessing.Process(name='non-block',   
                                 target=wait_for_event_timeout,   
                                 args=(e, 2))  
    w2.start()  

    time.sleep(3)  
    e.set()  
    print ('main: event is set')

六. Multiprocessing Queue and Pipe
Python的multiprocessing模块包装了底层的机制，提供了Queue、Pipes等多种方式来交换数据。

# -*- coding:utf-8 -*-
from multiprocessing import Process, Queue
import os, time, random

# 写数据进程执行的代码:
def write(q):
    for value in ['A', 'B', 'C']:
        print 'Put %s to queue...' % value
        q.put(value)
        time.sleep(random.random())

# 读数据进程执行的代码:
def read(q):
    while True:
        value = q.get(True)
        print 'Get %s from queue.' % value

if __name__=='__main__':
    # 父进程创建Queue，并传给各个子进程：
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    # 启动子进程pw，写入:
    pw.start()
    # 启动子进程pr，读取:
    pr.start()
    # 等待pw结束:
    pw.join()
    # pr进程里是死循环，无法等待其结束，只能强行终止:
    pr.terminate()

执行结果：

Put A to queue...
Get A from queue.
Put B to queue...
Get B from queue.
Put C to queue...
Get C from queue.

七. Multiprocessing Pool
如果要启动大量的子进程，可以用进程池的方式批量创建子进程：

# -*- coding:utf-8 -*-
from multiprocessing import Pool
import os, time, random

def long_time_task(name):
    print 'Run task %s (%s)...' % (name, os.getpid())
    start = time.time()
    time.sleep(random.random() * 3)
    end = time.time()
    print 'Task %s runs %0.2f seconds.' % (name, (end - start))

if __name__=='__main__':
    print 'Parent process %s.' % os.getpid()
    p = Pool()
    for i in range(5):
        p.apply_async(long_time_task, args=(i,))
    print 'Waiting for all subprocesses done...'
    p.close()
    p.join()
    print 'All subprocesses done.'

执行结果：

Parent process 669.
Waiting for all subprocesses done...
Run task 0 (671)...
Run task 1 (672)...
Run task 2 (673)...
Run task 3 (674)...
Task 2 runs 0.14 seconds.
Run task 4 (673)...
Task 1 runs 0.27 seconds.
Task 3 runs 0.86 seconds.
Task 0 runs 1.41 seconds.
Task 4 runs 1.91 seconds.
All subprocesses done.

八. Python 多进程数据对比测试
将上文列子中多线程数据对比方法，改成多进行进行数据对比：

# -*- coding:utf-8 -*-
import multiprocessing
import TestCase
import CommonVariable


def test_data(excel_index):
    pool = multiprocessing.Pool(processes=CommonVariable.multiprocess_number)
    result = []
    for i in range(CommonVariable.multiprocess_number):
        result.append(pool.apply_async(TestCase.compare_data, (CommonVariable.result_excel[excel_index + i][0], CommonVariable.result_excel[excel_index + i][1])))
    pool.close()
    pool.join()
    test_result = ""
    for i in result:
        if i.get() == "all_pass":
            pass
        else:
            test_result += i.get()
    return test_result

if __name__ == '__main__':
    print test_data(0)

结论：
多进程在Windows上执行，耗时：6302.33秒，对比单线程8023.14秒有一些改进，但远远并没有达到预期目标，在Unix/Linux下，multiprocessing模块封装了fork()调用，使我们不需要关注fork()的细节。由于Windows没有fork调用，因此，multiprocessing需要“模拟”出fork的效果，父进程所有Python对象都必须通过pickle序列化再传到子进程去，所有，如果multiprocessing在Windows下调用失败了，要先考虑是不是pickle失败了。进而在MAC 上执行，执行时间3102.12秒，时间大大缩短。

参考文献：
Python 多线程是多鸡肋
 Python 多进程编程

欢迎加QQ群 -> 阳台测试 -> 239547991（群号）