Python的线程池和进程池实现、使用示例和注意点

最新推荐文章于 2024-06-14 11:11:42 发布

Walter_Silva

最新推荐文章于 2024-06-14 11:11:42 发布

阅读量726

点赞数 1

分类专栏：机器学习笔记集

本文链接：https://blog.csdn.net/Gin077/article/details/86316461

版权

机器学习笔记集专栏收录该内容

27 篇文章 0 订阅

订阅专栏

1、线程池的内部实现可以先看https://www.jb51.net/article/139005.htm

下面是代码示例和注释

#coding='utf-8'
#threadpool.ThreadPool，线程池类

import os
import time
import threadpool

def print_file_head(filename):
	print("begin read")
	print(filename)
	print("end read")

if __name__ == '__main__':
	pool_size = 3
	filename_list = ['a','b','c']

	#标准模式
	#初始化线程池
	pool=  threadpool.ThreadPool(pool_size)
	#创建具有相同执行函数但参数不同的工作请求类
	requests = threadpool.makeRequests(print_file_head,filename_list)
	#将工作请求放入队列
	[pool.putRequest(req) for req in requests]

	#包含具有具体执行方法的工作请求类
	print("putting request to pool")
	pool.putRequest(threadpool.WorkRequest(print_file_head,args=['d',]))

	#处理队列中的请求
	pool.poll()
	pool.wait()
	print("destroy all threads before exists")
	#执行完当前任务（队列中的所有工作请求）后退出
	pool.dismissWorkers(pool_size,do_join=True)

2、进程池，multiprocessing

详细介绍可以直接看官网，https://docs.python.org/2/library/multiprocessing.html

下面介绍介绍multiprocessing的4个在使用中的注意事项（出自编写高质量代码改善Python程序的91个建议一书）

1）、进程之间的通信优先考虑Pipe和Queue，而不是Lock、Event、Condition和Semaphore等同步
原语。进程中类Queue使用pipe和一些locks、semaphores原语实现，适合进程多于2个的情况；
而对于两个进程之前的通信Pipe性能更快。下面是测试代码

from multiprocessing import Process, Pipe, Queue
import time


def reader_pipe(pipe):
    output_p, input_p = pipe
    input_p.close()
    while True:
        try:
            msg = output_p.recv()
        except EOFError:
            break


def writer_pipe(count, input_p):
    for i in range(0, count):
        input_p.send(i)


def reader_queue(queue):
    while True:
        msg = queue.get()
        if (msg == 'DONE'):
            break


def writer_queue(count, queue):
    for ii in range(0, count):
        queue.put(ii)
    queue.put("DONE")


if __name__ == '__main__':
    print("testing for pipe")
    for count in [10 ** 3, 10 ** 4, 10 ** 5]:
        output_p, input_p = Pipe()
        reader_p = Process(target=reader_pipe, args=((output_p, input_p),))
        reader_p.start()
        output_p.close()

        _start = time.time()
        writer_pipe(count, input_p)
        input_p.close()
        reader_p.join()
        print("Sending {} numbers to Pipe() took {} seconds".format(str(count), str(time.time() - _start)))

    print("testing for queue")
    for count in [10 ** 3, 10 ** 4, 10 ** 5]:
        queue = Queue()
        reader_p = Process(target=reader_queue, args=((queue),))
        #reader_p.daemon = True
        reader_p.start()

        _start = time.time()
        writer_queue(count, queue)
        reader_p.join()
        print("Sending {} numbers to Queue took {} seconds".format(str(count), str(time.time() - _start)))

2）、尽量避免资源共享

如果不可避免，Python提供了Value、Array和sharedctypes实现共享内存，也可以使用服务器进程管理器Manger来实现数据和状态的共享。两者优缺点明显，前者更快，但Manager使用起来更快，并且支持本地和远程共享内存

import time
from multiprocessing import Process, Value,Manager


def func(val):
    for i in range(1):
        time.sleep(1)
        val.value += 10

#这段代码不会改变ns中x和y的值，这是因为manager对象仅能传播对一个可变对象本身
#所做的修改，如有一个manger.list()对象，管理列表本身的任何更改都会传到其他进程。
#但如果容器对象内部还包括可修改的对象，则内部可修改的对象的任何改变都不会传播到其他进程。
def f(ns):
	ns.x.append(1)
	ns.y.append('a')

#基于上面的原因，可以将可变对象也作为参数传入
def df_update(ns,x,y):
	x.append(1)
	y.append('a')
	ns.x = x
	ns.y = y


if __name__ == '__main__':
    # v = Value('i', 0)
    # processList = [Process(target=func, args=(v,)) for i in range(10)]
    # for p in processList:
    #     p.start()
    # for p in processList:
    #     p.join()
    #
    # print(v.value)


    manager = Manager()
    ns = manager.Namespace()
    ns.x = []
    ns.y = []

    print("before process operation:{}".format(ns))
    #p = Process(target=f, args=(ns,))
    p = Process(target=df_update, args=(ns,ns.x,ns.y))
    p.start()
    p.join()
    print("after process operation:{}".format(ns))

3）、注意平台的差异性

Linux使用fork()创建子进程，父进程的资源都会在子进程中共享；而win中相对独立。
为了保持兼容，最好能够将相关资源作为子进程的构造函数的参数传递进去

def child(f):
    print(f)
    
if __name__ == '__main__':
    p = Process(target=child,args=('test',))
    p.start()
    p.join()

最后最好在脚本中加if __name__ == '__main__':作判断，这样可以避免可能出现的
RunError或者死锁

4）、尽量避免使用terminate()方式终止进程，并且确保pool.map中传入的参数是可以序列化的
函数和方法是不可序列化的

如下，会报AttributeError: Can't pickle local object 'calculate.run.<locals>.f'的错

from multiprocessing import Pool

class calculate(object):
	def run(self):
		def f(x):
			return x*x
		p = Pool()
		return p.map(f,[1,2,3])
if __name__ == '__main__':
	c = calculate()
	print(c.run())

可行的方式是

#此处没调通
from multiprocessing import Pool

def unwrap_self_f(args,**kwargs):
	return calculate.f(args,**kwargs)

class calculate(object):
	def f(self,x):
			return x*x

	def run(self):
		p = Pool()
        return p.map(unwrap_self_f, zip([self] * 3, [1, 2, 3]))

if __name__ == '__main__':
	c = calculate()
	print(c.run())

Walter_Silva

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python的线程池和进程池实现、使用示例和注意点

1、线程池的内部实现可以先看https://www.jb51.net/article/139005.htm下面是代码示例和注释#coding='utf-8'#threadpool.ThreadPool，线程池类import osimport timeimport threadpooldef print_file_head(filename): print("begin r...
复制链接

扫一扫

专栏目录