multiprocessing python_Python之multiprocessing模块的使用

作用:Python多进程处理模块,解决threading模块不能使用多个CPU内核,避免Python GIL(全局解释器)带来的计算瓶颈。

1、开启多进程的简单示例,处理函数无带参数

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingdefworker():print('工作中')if __name__ == '__main__':for i in range(5):

p= multiprocessing.Process(target=worker)

p.start()

multiprocessing_simple.py

运行效果

[root@ mnt]# python3 multiprocessing_simple.py

工作中

工作中

工作中

工作中

工作中

2、开启多进程的简单示例,处理函数有带参数

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingdefworker(num):print('工作id: %s' %num)if __name__ == '__main__':for i in range(5):

p= multiprocessing.Process(target=worker, args=(i,))

p.start()

multiprocessing_simple_args.py

运行效果

[root@ mnt]# python3 multiprocessing_simple_args.py

工作id:1工作id:2工作id:3工作id:4工作id:0

3、多进程处理导入模块里面的任务

#!/usr/bin/env python#-*- coding: utf-8 -*-

defworker():print('工作中')return

multiprocessing_import_worker.py

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimportmultiprocessing_import_workerif __name__ == '__main__':for i in range(5):

p=multiprocessing.Process(

target=multiprocessing_import_worker.worker,

)

p.start()

multiprocessing_import_main.py

运行效果

[root@ mnt]# python3 multiprocessing_import_main.py

工作中

工作中

工作中

工作中

工作中

4、多进程自定义进程名字

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimportloggingimporttime

logging.basicConfig(

level=logging.DEBUG,

format="(%(threadName)-10s) %(message)s",

)defworker():

name=multiprocessing.current_process().name

logging.debug('%s 开始' %name)

time.sleep(3)

logging.debug('%s 结束' %name)defmy_service():

name=multiprocessing.current_process().name

logging.debug('%s 开始' %name)

time.sleep(3)

logging.debug('%s 结束' %name)if __name__ == '__main__':

service=multiprocessing.Process(

name='my_service',

target=my_service,

)

worker_1=multiprocessing.Process(

name='worker_1',

target=worker,

)

worker_2=multiprocessing.Process(

target=worker,

)

service.start()

worker_1.start()

worker_2.start()

multiprocessing_names.py

运行结果

[root@ mnt]# python3 multiprocessing_names.py

(MainThread) worker_1 开始

(MainThread) Process-3开始

(MainThread) my_service 开始

(MainThread) worker_1 结束

(MainThread) Process-3结束

(MainThread) my_service 结束

5、守护进程无等待的方式

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimporttimeimportlogging

logging.basicConfig(

level=logging.DEBUG,

format='(%(threadName)-10s) %(message)s',

)defdaemon():

p=multiprocessing.current_process()

logging.debug('%s %s 开始' %(p.name, p.pid))

time.sleep(2)

logging.debug('%s %s 结束' %(p.name, p.pid))defno_daemon():

p=multiprocessing.current_process()

logging.debug('%s %s 开始' %(p.name, p.pid))

logging.debug('%s %s 结束' %(p.name, p.pid))if __name__ == '__main__':

daemon_obj=multiprocessing.Process(

target=daemon,

name='daemon')

daemon_obj.daemon=True

no_daemon_obj=multiprocessing.Process(

target=no_daemon,

name='no_daemon')

no_daemon_obj.daemon=False

daemon_obj.start()

time.sleep(1)

no_daemon_obj.start()

multiprocessing_daemon.py

运行结果

[root@ mnt]# python3 multiprocessing_daemon.py

(MainThread) daemon21931开始

(MainThread) no_daemon21932开始

(MainThread) no_daemon21932 结束

6、守护进程等待所有进程执行完成

multiprocessing_daemon_join.py

运行效果

[root@ mnt]# python3 multiprocessing_daemon_join.py

(MainThread) daemon21948开始

(MainThread) no_daemon21949开始

(MainThread) no_daemon21949结束

(MainThread) daemon21948 结束

7、守护进程设置等待超时时间

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimporttimeimportlogging

logging.basicConfig(

level=logging.DEBUG,

format='(%(threadName)-10s) %(message)s',

)defdaemon():

p=multiprocessing.current_process()

logging.debug('%s %s 开始' %(p.name, p.pid))

time.sleep(2)

logging.debug('%s %s 结束' %(p.name, p.pid))defno_daemon():

p=multiprocessing.current_process()

logging.debug('%s %s 开始' %(p.name, p.pid))

logging.debug('%s %s 结束' %(p.name, p.pid))if __name__ == '__main__':

daemon_obj=multiprocessing.Process(

target=daemon,

name='daemon')

daemon_obj.daemon=True

no_daemon_obj=multiprocessing.Process(

target=no_daemon,

name='no_daemon')

no_daemon_obj.daemon=False

daemon_obj.start()

time.sleep(1)

no_daemon_obj.start()

daemon_obj.join(1)

logging.debug('daemon_obj.is_alive():%s' %daemon_obj.is_alive())

no_daemon_obj.join()

multiprocessing_daemon_join_timeout.py

运行效果

[root@ mnt]# python3 multiprocessing_daemon_join_timeout.py

(MainThread) daemon21997开始

(MainThread) no_daemon21998开始

(MainThread) no_daemon21998结束

(MainThread) daemon_obj.is_alive():True

8、进程的终止,注意:terminate的时候,需要使用join()进程,保证进程成功终止

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimporttimeimportlogging

logging.basicConfig(

level=logging.DEBUG,

format='(%(threadName)-10s) %(message)s',

)defslow_worker():print('开始工作')

time.sleep(0.1)print('结束工作')if __name__ == '__main__':

p=multiprocessing.Process(

target=slow_worker

)

logging.debug('开始之前的状态%s' %p.is_alive())

p.start()

logging.debug('正在运行的状态%s' %p.is_alive())

p.terminate()

logging.debug('调用终止进程的状态%s' %p.is_alive())

p.join()

logging.debug('等待所有进程运行完成,状态%s' % p.is_alive())

multiprocessing_terminate.py

运行结果

[root@ mnt]# python3 multiprocessing_terminate.py

(MainThread) 开始之前的状态False

(MainThread) 正在运行的状态True

(MainThread) 调用终止进程的状态True

(MainThread) 等待所有进程运行完成,状态False

9、进程退出状态码

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimportsysimporttimedefexit_error():

sys.exit(1)defexit_ok():return

defreturn_value():return 1

defraises():raise RuntimeError('运行时的错误')defterminated():

time.sleep(3)if __name__ == '__main__':

jobs=[]

funcs=[

exit_error,

exit_ok,

return_value,

raises,

terminated,

]for func infuncs:print('运行进程的函数名 %s' % func.__name__)

j=multiprocessing.Process(

target=func,

name=func.__name__)

jobs.append(j)

j.start()

jobs[-1].terminate()for j injobs:

j.join()print('{:>15}.exitcode={}'.format(j.name, j.exitcode))

multiprocessing_exitcode.py

运行效果

[root@ mnt]# python3 multiprocessing_exitcode.py

运行进程的函数名 exit_error

运行进程的函数名 exit_ok

运行进程的函数名 return_value

运行进程的函数名 raises

运行进程的函数名 terminated

Process raises:

exit_error.exitcode=1exit_ok.exitcode=0return_value.exitcode=0Traceback (most recent calllast):

File"/usr/local/Python-3.6.6/lib/python3.6/multiprocessing/process.py", line 258, in_bootstrap

self.run()

File"/usr/local/Python-3.6.6/lib/python3.6/multiprocessing/process.py", line 93, inrun

self._target(*self._args, **self._kwargs)

File"multiprocessing_exitcode.py", line 25, inraises

raise RuntimeError('运行时的错误')

RuntimeError: 运行时的错误 #注意的是,抛出异常,退出码默认是1

raises.exitcode=1terminated.exitcode=-15

10、多进程全局日志的开启

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimportloggingimportsysdefworker():print('工作中...')

sys.stdout.flush()if __name__ == '__main__':

multiprocessing.log_to_stderr(logging.DEBUG)

p= multiprocessing.Process(target=worker, )

p.start()

p.join()

multiprocessing_log_to_stderr.py

运行效果

[root@ mnt]# python3 multiprocessing_log_to_stderr.py

[INFO/Process-1] child process calling self.run()

工作中...

[INFO/Process-1] process shutting down

[DEBUG/Process-1] running all "atexit" finalizers with priority >= 0[DEBUG/Process-1] running the remaining "atexit"finalizers

[INFO/Process-1] process exiting with exitcode 0[INFO/MainProcess] process shutting down

[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0[DEBUG/MainProcess] running the remaining "atexit" finalizers

11、多进程日志开启之设置日志的显示级别

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimportloggingimportsysdefworker():print('工作中...')

sys.stdout.flush()if __name__ == '__main__':

multiprocessing.log_to_stderr()

logger=multiprocessing.get_logger()

logger.setLevel(logging.INFO)

p= multiprocessing.Process(target=worker, )

p.start()

p.join()

multiprocessing_get_logger.py

测试效果

[root@ mnt]# python3 multiprocessing_get_logger.py

[INFO/Process-1] child process calling self.run()

工作中...

[INFO/Process-1] process shutting down

[INFO/Process-1] process exiting with exitcode 0[INFO/MainProcess] process shutting down

12、利用继承multiprocessing.Process类,实现无参的多进程

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimportloggingimportsysclassWorker(multiprocessing.Process):defrun(self):print('当前运行进程名字: %s' %self.name)if __name__ == '__main__':

jobs=[]for i in range(5):

p=Worker()

jobs.append(p)

p.start()for j injobs:

j.join()

multiprocessing_subclass.py

运行效果

[root@ mnt]# python3 multiprocessing_subclass.py

当前运行进程名字: Worker-2当前运行进程名字: Worker-3当前运行进程名字: Worker-4当前运行进程名字: Worker-5当前运行进程名字: Worker-1

13、多进程队列multiprocessing.Queue()的使用

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingclassMyFancyClass(object):def __init__(self, name):

self.name=namedefdo_something(self):

proc_name=multiprocessing.current_process().nameprint('当前进程名字: %s,当前实例化初始名字:%s' %(proc_name, self.name))defworker(q):

obj=q.get()

obj.do_something()if __name__ == '__main__':

queue=multiprocessing.Queue()#开启进程并且传进队列的实例化对象,此时队列是空,所以会阻塞等数据的到来

p =multiprocessing.Process(

target=worker,

args=(queue,)

)

p.start()#往队列增加数据

queue.put(MyFancyClass('Mrs Suk'))

queue.close()#队列等待进程处理完成

queue.join_thread()

p.join()

multiprocessing_queue.py

运行效果

[root@ mnt]# python3 multiprocessing_queue.py

当前进程名字: Process-1,当前实例化初始名字:Mrs Suk

14、多进程队列multiprocessing.JoinableQueue()的使用,示例:实现数字乘法运算,并且把结果存入队列中,最后再从队列中取出打印出来

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimporttimeclassConsumer(multiprocessing.Process):"""消费者类"""

def __init__(self, task_queue, result_queue, *args, **kwargs):

super(Consumer, self).__init__(*args, **kwargs)

self.task_queue=task_queue

self.result_queue=result_queuedefrun(self):

proc_name= self.name #获取进程名字

whileTrue:

next_task=self.task_queue.get()if next_task is None: #如果获取到对象为空的话,则队列已经退出

print('%s 退出' %proc_name)

self.task_queue.task_done()break

print('{}:{}'.format(proc_name, next_task))

answer= next_task() #这里会调用_Task类_call__方法

self.task_queue.task_done() #处理完成,向队列发送task_done(),让该队列不要在join,如果没有发送task_done(),则队列一直是join

self.result_queue.put(answer) #将运行结果放在results队列中

classTask(object):def __init__(self, a, b):

self.a=a

self.b=bdef __call__(self, *args, **kwargs):

time.sleep(0.1)return '{self.a} * {self.b} = {product}'.format(self=self, product=self.a *self.b)def __str__(self):return '{self.a} * {self.b}'.format(self=self)if __name__ == '__main__':#队列比Queue多了两个方法,task_done(),join()

tasks =multiprocessing.JoinableQueue()#结果存放的队列

results =multiprocessing.Queue()#获取电脑CPU核数

num_consumers = multiprocessing.cpu_count() * 2

print('创建{}位消费者'.format(num_consumers))

consumers=[

Consumer(tasks, results)for i inrange(num_consumers)

]#开启消费者多进程

for w inconsumers:

w.start()#往排队队列增加数据

num_jobs = 10

for i in range(10):

tasks.put(Task(i, i))#往每一个消费队列设置默认值 None

for i inrange(num_consumers):

tasks.put(None)#等待所有的任务完成

tasks.join()#打印处理的结果

whilenum_jobs:

result=results.get()print('运算结果:', result)

num_jobs-= 1

multiprocessing_producer_consumer.py

运行结果

[root@ mnt]# python3 multiprocessing_producer_consumer.py

创建2位消费者#因为测试机只有2核,所以产生两位消费者

Consumer-1:0 * 0Consumer-2:1 * 1Consumer-1:2 * 2Consumer-2:3 * 3Consumer-1:4 * 4Consumer-2:5 * 5Consumer-1:6 * 6Consumer-2:7 * 7Consumer-1:8 * 8Consumer-2:9 * 9Consumer-1退出

Consumer-2退出

运算结果:1 * 1 = 1运算结果:0 * 0 = 0运算结果:2 * 2 = 4运算结果:3 * 3 = 9运算结果:5 * 5 = 25运算结果:4 * 4 = 16运算结果:7 * 7 = 49运算结果:6 * 6 = 36运算结果:8 * 8 = 64运算结果:9 * 9 = 81

15、多进程事件设置

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimporttimedefwait_for_event(event_obj):print('无超时等待事件开始')

event_obj.wait()print('阻塞事件状态:', event_obj.is_set())defwait_for_event_timeout(event_obj, timeout):print('设置超时等待事件开始')

event_obj.wait(timeout)print('非阻塞事件状态:', event_obj.is_set())if __name__ == '__main__':

event_obj=multiprocessing.Event()

block_task=multiprocessing.Process(

name='block_task',

target=wait_for_event,

args=(event_obj,)

)

block_task.start()

non_block_task=multiprocessing.Process(

name='non_block_task',

target=wait_for_event_timeout,

args=(event_obj, 2)

)

non_block_task.start()print('等待3秒,让所有进程都正常开启')

time.sleep(3)

event_obj.set()print('设置事件状态为set()=True')

multiprocessing_event.py

运行效果

[root@ mnt]# python3 multiprocessing_event.py

等待3秒,让所有进程都正常开启

设置超时等待事件开始

无超时等待事件开始

非阻塞事件状态: False

设置事件状态为set()=True

阻塞事件状态: True

16、多进程资源控制访问,锁的使用

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimportsysdefworker_with(lock, stream):

with lock:

stream.write('通过with获取得到锁\n')defworker_no_with(lock, stream):

lock.acquire()try:

stream.write('通过lock.acquire()获取得到锁\n')finally:

lock.release()if __name__ == '__main__':

lock=multiprocessing.Lock()

w=multiprocessing.Process(

target=worker_with,

args=(lock, sys.stdout,)

)

nw=multiprocessing.Process(

target=worker_no_with,

args=(lock, sys.stdout,)

)

w.start()

nw.start()

w.join()

nw.join()

multiprocessing_lock.py

运行效果

[root@ mnt]# python3 multiprocessing_lock.py

通过lock.acquire()获取得到锁

通过with获取得到锁

17、多进程multiprocessing.Condition()同步

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimporttimedeftask_1(condition_obj):

proc_name=multiprocessing.current_process().nameprint('开始 %s' %proc_name)

with condition_obj:print('%s运行结束,开始运行task_2' %proc_name)

condition_obj.notify_all()deftask_2(condition_obj):

proc_name=multiprocessing.current_process().nameprint('开始 %s' %proc_name)

with condition_obj:

condition_obj.wait()print('task_2 %s 运行结束' %proc_name)if __name__ == '__main__':

condition_obj=multiprocessing.Condition()

s1= multiprocessing.Process(name='s1',

target=task_1,

args=(condition_obj,))

s2_clients=[

multiprocessing.Process(

name='task_2[{}]'.format(i),

target=task_2,

args=(condition_obj,),

)for i in range(1, 3)

]for c ins2_clients:

c.start()

time.sleep(1)

s1.start()

s1.join()for c ins2_clients:

c.join()

multiprocessing_condition.py

运行效果

[root@ mnt]# python3 multiprocessing_condition.py

开始 task_2[1]

开始 task_2[2]

开始 s1

s1运行结束,开始运行task_2

task_2 task_2[1] 运行结束

task_2 task_2[2] 运行结束

18、利用multiprocessing.Semaphore()自定义控制资源的并发访问

#!/usr/bin/env python#-*- coding: utf-8 -*-

importrandomimportmultiprocessingimporttimeclassActivePool:def __init__(self, *args, **kwargs):

super(ActivePool, self).__init__(*args, **kwargs)

self.mgr=multiprocessing.Manager()

self.active=self.mgr.list()

self.lock=multiprocessing.Lock()defmakeActive(self, name):

with self.lock:

self.active.append(name)defmakeInactive(self, name):

with self.lock:

self.active.remove(name)def __str__(self):

with self.lock:returnstr(self.active)defworker(s, pool):

name=multiprocessing.current_process().name

with s:

pool.makeActive(name)print('Activating {} now running {}'.format(

name, pool))

time.sleep(random.random())

pool.makeInactive(name)if __name__ == '__main__':

pool=ActivePool()

s= multiprocessing.Semaphore(3)

jobs=[

multiprocessing.Process(

target=worker,

name=str(i),

args=(s, pool),

)for i in range(10)

]for j injobs:

j.start()whileTrue:

alive=0for j injobs:ifj.is_alive():

alive+= 1j.join(timeout=0.1)print('Now running {}'.format(pool))if alive ==0:#all done

break

multiprocessing_semaphore.py

运行效果

[root@ mnt]# python3 multiprocessing_semaphore.py

Activating9 now running ['9']

Activating5 now running ['9', '5']

Activating4 now running ['9', '5', '4']

Activating1 now running ['9', '5', '1']

Now running ['9', '5', '1']

Now running ['9', '5', '1']

Now running ['9', '5', '1']

Now running ['9', '5', '1']

Activating2 now running ['9', '1', '2']

Now running ['9', '1', '2']

Now running ['9', '1', '2']

Now running ['9', '1', '2']

Now running ['9', '1', '2']

Activating6 now running ['9', '2', '6']

Now running ['9', '2', '6']

Now running ['9', '2', '6']

Activating7 now running ['2', '6', '7']

Activating8 now running ['2', '7', '8']

Now running ['2', '7', '8']

Now running ['2', '7', '8']

Now running ['2', '7', '8']

Now running ['2', '7', '8']

Activating3 now running ['7', '8', '3']

Now running ['7', '8', '3']

Now running ['7', '8', '3']

Activating0 now running ['7', '3', '0']

Now running ['7', '0']

Now running ['7']

Now running ['7']

Now running ['7']

Now running []

19、多进程multiprocessing.Manager()共享字典或列表数据

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingdefworker(dict_obj, key, value):

dict_obj[key]=valueif __name__ == '__main__':#创建一个多进程共享的字典,所有进程都能看到字典的内容

mgr =multiprocessing.Manager()

mgr_dict=mgr.dict()

jobs=[

multiprocessing.Process(

target=worker,

args=(mgr_dict, i, i * 2),

)for i in range(10)

]#开启worker任务

for j injobs:

j.start()##等待worker任务执行完成

for j injobs:

j.join()print('运行结果:', mgr_dict)

multiprocessing_manager_dict.py

运行效果

[root@ mnt]# python3 multiprocessing_manager_dict.py

运行结果: {5: 10, 6: 12, 1: 2, 2: 4, 3: 6, 7: 14, 8: 16, 9: 18, 4: 8, 0: 0}

20、多进程multiprocessing.Manager()共享命名空间,字符串类型:全局可以获得值

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimporttimedefproducer(namespace_obj, event):"""生产者"""namespace_obj.value= '命名空间设置的值:1234'event.set()defconsumer(namespace_obj, event):""""消费者"""

"""生产者和消费者首次进程开启的时候,

namespace_obj.value不存在,所以会抛异常,

当生产者事件设置set()的时候,

消费者event.wait()不阻塞,继续执行后面的结果"""

try:print('进程事件前的值: {}'.format(namespace_obj.value))exceptException as err:print('进程事件前错误:', str(err))

event.wait()print('进程事件后的值:', namespace_obj.value)if __name__ == '__main__':#创建一个共享管理器

mgr =multiprocessing.Manager()#创建一个命名空间类型共享类型

namespace =mgr.Namespace()#创建多进程的事件

event =multiprocessing.Event()

p=multiprocessing.Process(

target=producer,

args=(namespace, event),

)

c=multiprocessing.Process(

target=consumer,

args=(namespace, event),

)

c.start()

time.sleep(1)

p.start()

c.join()

p.join()

multiprocessing_namespace.py

运行效果

[root@ mnt]# python3 multiprocessing_namespace.py

进程事件前错误:'Namespace' object has no attribute 'value'进程事件后的值: 命名空间设置的值:1234

21、多进程multiprocessing.Manager()共享命名空间,列表类型:全局不可以获得值

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimporttimedefproducer(namespace_obj, event):"""生产者"""namespace_obj.my_list.append('命名空间设置的值:1234')

event.set()defconsumer(namespace_obj, event):""""消费者"""

"""生产者和消费者首次进程开启的时候,

namespace_obj.value不存在,所以会抛异常,

当生产者事件设置set()的时候,

消费者event.wait()不阻塞,继续执行后面的结果"""

try:print('进程事件前的值: {}'.format(namespace_obj.my_list))exceptException as err:print('进程事件前错误:', str(err))

event.wait()print('进程事件后的值:', namespace_obj.my_list)if __name__ == '__main__':#创建一个共享管理器

mgr =multiprocessing.Manager()#创建一个命名空间类型共享类型

namespace =mgr.Namespace()#如果是列表类型,不是能全局更换列表

namespace.my_list =[]#创建多进程的事件

event =multiprocessing.Event()

p=multiprocessing.Process(

target=producer,

args=(namespace, event),

)

c=multiprocessing.Process(

target=consumer,

args=(namespace, event),

)

c.start()

p.start()

c.join()

p.join()

multiprocessing_namespace_mutable.py

运行效果

[root@ mnt]# python3 multiprocessing_namespace_mutable.py

进程事件前的值: []

进程事件后的值: []

22、进程池之列表数字的运算

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingdefdo_calculation(data):return data * 2

defstart_process():print('进程开始', multiprocessing.current_process().name)if __name__ == '__main__':

inputs= list(range(10))print('inputs :', inputs)#使用内置的map方法运算

builtin_outputs =map(do_calculation, inputs)print('Built-in:', list(builtin_outputs))

pool_size= multiprocessing.cpu_count() * 2pool=multiprocessing.Pool(

processes=pool_size,

initializer=start_process,

)#使用进程池进行运算

pool_outputs =pool.map(do_calculation, inputs)

pool.close()

pool.join()print('Pool :', pool_outputs)

multiprocessing_pool.py

运行效果

[root@ mnt]# python3 multiprocessing_pool.py

inputs : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Built-in: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

进程开始 ForkPoolWorker-2进程开始 ForkPoolWorker-1Pool : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

23、进程池设置一个进程最多运行多少次(maxtasksperchild)就执行重启进程,作用:避免工作进程长时间运行消耗很多的系统资源

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingdefdo_calculation(data):return data * 2

defstart_process():print('进程开始', multiprocessing.current_process().name)if __name__ == '__main__':

inputs= list(range(100))print('inputs :', inputs)#使用内置的map方法运算

builtin_outputs =map(do_calculation, inputs)print('Built-in:', list(builtin_outputs))

pool_size= multiprocessing.cpu_count() * 2pool=multiprocessing.Pool(

processes=pool_size,

initializer=start_process,

maxtasksperchild=2)#使用进程池进行运算

pool_outputs =pool.map(do_calculation, inputs)

pool.close()

pool.join()print('Pool :', pool_outputs)

multiprocessing_pool_maxtasksperchild.py

运行效果

[root@ mnt]# python3 multiprocessing_pool_maxtasksperchild.py

inputs : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Built-in: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

进程开始 ForkPoolWorker-2进程开始 ForkPoolWorker-1进程开始 ForkPoolWorker-4进程开始 ForkPoolWorker-3Pool : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

24、利用多进程的进程池实例MapReduce,下面示例简单:读取文件内容,分词计数器

#!/usr/bin/env python#-*- coding: utf-8 -*-

importcollectionsimportitertoolsimportmultiprocessingclassSimpleMapReduce:def __init__(self, map_func, reduce_func, num_workers=None):""":param map_func: 会调用file_to_words(filename)函数

:param reduce_func: 会调用count_words(item)的函数

:param num_workers:"""self.map_func=map_func

self.reduce_func=reduce_func

self.pool=multiprocessing.Pool(num_workers)defpartition(self, mapped_values):"""包装一个字典集合"""partitioned_data=collections.defaultdict(list)for key, value inmapped_values:

partitioned_data[key].append(value)returnpartitioned_data.items()def __call__(self, inputs, chunksize=1):""":param inputs:文件名

:param chunksize: 处理块的大小

:return:"""

#这里返回值是:[(word,1)...]

map_responses =self.pool.map(

self.map_func,

inputs,

chunksize=chunksize,

)#返回的是collections.defaultdict().items()的key,value

partitioned_data =self.partition(

itertools.chain(*map_responses)

)#将包组好的dict_items()对象,调用传入count_words(item)的item里面,这样子,就可以使聚合函数sum()生效

reduced_values =self.pool.map(

self.reduce_func,

partitioned_data,

)return reduced_values

multiprocessing_mapreduce.py

#!/usr/bin/env python#-*- coding: utf-8 -*-

importmultiprocessingimportstringfrom multiprocessing_mapreduce importSimpleMapReducedeffile_to_words(filename):"""作用:读取文件内容,分词+计数"""

#怱略统计字符串集合

STOP_WORDS =set(['a', 'an', 'and', 'are', 'as', 'be', 'by', 'for', 'if','in', 'is', 'it', 'of', 'or', 'py', 'rst', 'that', 'the','to', 'with',

])

TR=str.maketrans({

p:' '

for p instring.punctuation

})print('进程:{} 读取文件名:{}'.format(multiprocessing.current_process().name, filename))

output=[]

with open(filename,'rt', encoding='utf-8') as f:for line inf:#怱略注释..开头

if line.lstrip().startswith('..'):continueline= line.translate(TR) #去除TR包含的符号

for word in line.split():#通过空格分割

word =word.lower()if word.isalpha() and word not inSTOP_WORDS:

output.append((word,1))returnoutputdefcount_words(item):"""词的聚合函数求合"""word, occurences=itemreturn(word, sum(occurences))if __name__ == '__main__':importoperatorimportglob#搜索当前文件,后缀为*.rst结尾的文件

input_files = glob.glob('*.rst')#实例化一个MapReduce对象

mapper =SimpleMapReduce(file_to_words, count_words)

word_counts= mapper(input_files) #这里会调用SimpleMapReduce类里面的__call__方法

word_counts.sort(key=operator.itemgetter(1)) #获取word_counts的下标为1,作为排序

word_counts.reverse() #倒序

print('\nTOP 20 WORDS BY FREQUENCY\n')

top20= word_counts[:20]

longest= max(len(word) for word, count intop20)for word, count intop20:print('{word:

len=longest + 1,

word=word,

count=count)

)

multiprocessing_wordcount.py

If there is a relationship() from Parent to Child, but there is not a reverse-relationship that links a particular Child to each Parent, SQLAlchemy will not have any awareness that when deleting this particular Child object, it needs to maintain the “secondary” table that links it to the Parent. No delete of the “secondary” table will occur.

If there is a relationship that links a particular Child to each Parent, suppose it’s called Child.parents, SQLAlchemy by default will loadin the Child.parents collection to locate all Parent objects, and remove each row from the “secondary” table which establishes this link. Note that this relationship does not need to be bidirectional; SQLAlchemy is strictly looking at every relationship() associated with the Child objectbeing deleted.

A higher performing option here is to use ON DELETE CASCADE directives with the foreign keys used by the database. Assuming the database supports this feature, the database itself can be made to automatically delete rowsin the “secondary” table as referencing rows in “child” are deleted. SQLAlchemy can be instructed to forego actively loading in the Child.parents collection in this case using the passive_deletes directive on relationship(); see Using Passive Deletes for moredetails on this.

Note again, these behaviors are only relevant to the secondary option used with relationship(). If dealing with association tables that are mapped explicitly and are not presentin the secondary option of a relevant relationship(), cascade rules can be used instead to automatically delete entities in reaction to a related entity being deleted - see Cascades for information on this feature.

统计的素材 test.rst

运行效果

[root@python-mysql mnt]# python3 multiprocessing_wordcount.py

进程:SpawnPoolWorker-1读取文件名:test.rst

TOP20WORDS BY FREQUENCY

child :8relationship :8this :7parent :5on :4delete :4table :4sqlalchemy :4not :4can :3database :3used :3option :3deleted :3parents :3will :3each :3particular :3links :3there :3

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值