Python3多进程学习

2017.12.2学习记录------------------------------------------------------------------------------------------------------------------------------------
import multiprocessing
import os
def process(num):
	print('Process:',num)

if __name__ == '__main__':
	for i in range(5):
		p = multiprocessing.Process(target=process,args=(i,))
		p.start()
	print('CPU number:'+str(multiprocessing.cpu_count()))
	for p in multiprocessing.active_children():
		print('Child process name: '+ p.name +' id: '+ str(p.pid) + ' id: '+ str(os.getpid()))
	print('Process Ended')

运行结果:

为什么这段代码先运行了呢?

print('CPU number:'+str(multiprocessing.cpu_count()))
	for p in multiprocessing.active_children():
		print('Child process name: '+ p.name +' id: '+ str(p.pid) + ' id: '+ str(os.getpid()))
	print('Process Ended')
进入eclipse调试,

p = multiprocessing.Process(target=process,args=(i,))
到这一步,就构造了一个Process,过程是调用Process模块的__init__函数


接下来进入

p.start()

仍然处于start()中,下一步popen,这一过程很复杂,我也没完全看懂,不过结合api大致流程了解,准备数据

prep_data = spawn.get_preparation_data(process_obj._name)
大量生产(spawn)子进程,spawn方法是子进程完全继承父进程的所有信息,api中解释如下:

spawn
The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

Available on Unix and Windows. The default on Windows.

然后给新创建的进程初始化属性,这里可以看到,self.pid就是pid。

关于sentinel的api解释:

sentinel

A numeric handle of a system object which will become “ready” when the process ends.

You can use this value if you want to wait on several events at once using multiprocessing.connection.wait(). Otherwise callingjoin() is simpler.

至此,一个子进程创建并启动完毕,查看这个子进程的所有变量

可以看到这个子进程的pid是7560,_parent_pid父进程id是6816,分别与p.pid和os.getpid()的数值相吻合。

回到开始的那个问题,看了单步过程,似乎还是没有解决,

将代码改成

#coding:utf-8
import multiprocessing
import os
import time
def process(num):
	print('Process:',num)

if __name__ == '__main__':
	for i in range(5):
		cstart = time.time()
		print('创建进程'+str(i)+':',cstart)
		p = multiprocessing.Process(target=process,args=(i,))
		print('进程'+str(i)+'创建完毕!'+'用时:',(time.time()-cstart))
		sstart = time.time()
		print('start进程'+str(i)+':',sstart)
		p.start()
		print('进程'+str(i)+'start完毕!'+'用时:',(time.time()-sstart))
		
	print('CPU number:'+str(multiprocessing.cpu_count()))
	cpu_time = time.time()
	for p in multiprocessing.active_children():
		print('Child process name: '+ p.name +' id: '+ str(p.pid) + ' id: '+ str(os.getpid()))
		print('cpu time '+p.name+':',time.time()-cpu_time)	
	print('cpu time',time.time()-cpu_time)	
	print('Process Ended')
运行结果如下:

真是好调皮哦。。。。。。在这个结果可以看出创建进程的时间在10^(-3)这个数量级上,而start进程的时间却在10^(-2)数量级上,所以这个过程应该是

p = multiprocessing.Process(target=process,args=(i,))
执行很快,之后到start,cpu执行时间较长,于是,跳转到下一个创建进程步,接下来,第一个进程的start还没有执行完,于是继续向下执行到

print('CPU number:'+str(multiprocessing.cpu_count()))
cpu_time = time.time()


因为前面的创建过程都没有print语句,所以在控制台看到第一条输出是“CPU number:4”,接下来的语句都没有cpu操作,只有io操作,所以看到的是

for p in multiprocessing.active_children():
		print('Child process name: '+ p.name +' id: '+ str(p.pid) + ' id: '+ str(os.getpid()))
		print('cpu time '+p.name+':',time.time()-cpu_time)	
	print('cpu time',time.time()-cpu_time)	
	print('Process Ended')
这些语句的执行结果。

最后回到start的执行结果。这里也可以发现,

target=process
process函数是在start()才调用的。

但是,还有一个问题,进程是独占CPU的,那么我创建了5个进程,外加主进程1个,也就是6个,而cpu只有4个,它是如何分配的呢?


2017.12.24学习记录-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

最近在房天下爬虫中用函数嵌套+多进程的过程时发现问题了,是Pool貌似必须传递参数。于是回头去学习Python官方文档multiprocessing模块。

这是文档给出的引例:

A prime example of this is the Pool object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism).

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))

这个例子印证了混沌鳄鱼说的那句“如果没有参数就不要用pool了,pool就是为了把参数里的任务队列分配给进程池里的几个进程并行运算。”p.map相当于生成一个一对一映射,给了三个参数1,2,3,对应的分别生成三个子进程,执行结果如下:

[1, 4, 9]

接下来是Process模块,文档给出的例子是:

In multiprocessing, processes are spawned by creating a Process object and then calling its start() method. Process follows the API of threading.Thread.

通过Process创建的是单一个进程对象,对比与上面的map创建的是批量进程对象。

from multiprocessing import Process
import os

def info(title):
    print(title)
    print('module name:', __name__)
    print('parent process:', os.getppid())
    print('process id:', os.getpid())

def f(name):
    info('function f')
    print('hello', name)

if __name__ == '__main__':
    info('main line')
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()

对于多进程的start方法,是具有平台依赖性的,也就是说unix和windows上的start方法具体实现是不同的,因为我还没学用linux,就只看win的,Windows上start方法的具体实现是spawn()

spawn

The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

Available on Unix and Windows. The default on Windows.

fork

The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

Available on Unix only. The default on Unix.

forkserver

When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.

Available on Unix platforms which support passing file descriptors over Unix pipes.


不必要的句柄不再从父进程中继承下来是Python3.4新增的。

关于手动设置start方法的例子给出:(当然Windows上用不上)

import multiprocessing as mp

def foo(q):
    q.put('hello')

if __name__ == '__main__':
    mp.set_start_method('spawn')
    q = mp.Queue()
    p = mp.Process(target=foo, args=(q,))
    p.start()
    print(q.get())
    p.join()

另外还可以通过get_context()方法来改变start的选择,略。。。。。。


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值