来自《python爬虫开发与项目实践》
1.multiprocessing 模块提供了一个Pool类来代表进程池对象
pool可以提供制定数量的进程用户调用,默认大小是cpu的核数。当有新的请求提供到pool中时,如果池没满,name就会创建一个新的进程来执行该请求;但如果池中的进程数已经达到规定最大值,那么该请求就会等待,直到池中有进程结束,才会创建新的进程来处理它。下面通过这个实例来进程池的工作流程,代码如下:
from multiprocessing import Pool
import os,time,random
def run_task(name):
print('Task %s (pid = %s) is running...'%(name,os.getpid()))
time.sleep(random.random()*3)
print('Task{}end.'.format(name))
if __name__ == '__main__':
print('Current process %s.'.format(os.getpid()))#获取当前进程的进程号
p = Pool(processes=3) #使用线程池建立三个子线程
for i in range(5):
p.apply_async(run_task,args=(i,))
#apply方法是阻塞的,就是当前子进程执行完毕后,在执行下一个进程
#apply_async是异步非阻塞的,就是不用等待当前进程执行完毕,随时根据系统调度来进行进程切换
print('Waiting for all sunprocesses done...')
p.close()
p.join()
print('All subprocesses done.')
运行结果如下:
Current process %s.
Waiting for all sunprocesses done...
Task 0 (pid = 11212) is running...
Task 1 (pid = 10736) is running...
Task 2 (pid = 6176) is running...
Task2end.
Task 3 (pid = 6176) is running...
Task0end.
Task 4 (pid = 11212) is running...
Task1end.
Task4end.
Task3end.
All subprocesses done.
Process finished with exit code 0
上述程序县创建了容量为3的进程池,以此向进程池中添加了5个任务。从运行结果中可以看到最燃添加了5个任务,但是一开始只运行3个,而且每次最多运行3个进程,当一个任务结束了,新的任务依次添加进来,任务执行使用的进程依然是原来的进程,这一点通过pid就可以看出来
from multiprocessing import Process,Queue import os,time,random def proc_write(q,urls): print('Process{} is writint...'.format(os.getpid())) #写数据进程执行的代码 for url in urls: q.put(url) print('put {} to queue...'.format(url)) time.sleep(random.random()) #读取据进程执行的代码 def proc_read(q): print("process{} is reading...".format(os.getpid())) while True: url = q.get(True) print('get {} from queue.'.format(url)) if __name__=='__main__': #父进程创建Queue,并传给各个子进程 q= Queue() proc_writer1 = Process(target = proc_write,args = (q,['url_1','url_2','url_3'])) proc_writer2 = Process(target = proc_write,args = (q,['url_4','url1_5','url_6'])) proc_reader = Process(target= proc_read,args=(q,)) #启动子进程proc_writer,写入 proc_writer1.start() proc_writer2.start() #启动子程序proc_reader,读取 proc_reader.start() proc_writer1.join() proc_writer2.join() #proc_reader进程里是死循环,无法等待其结束,只能强行终止 proc_read.terminate()
运行结果如下:
D:\python练习\test01\venv\Scripts\python.exe D:/python练习/test01/8up/8103.py
Process6316 is writint...
process6040 is reading...
put url_1 to queue...
get url_1 from queue.
Process6888 is writint...
put url_4 to queue...
get url_4 from queue.
put url_2 to queue...
get url_2 from queue.
put url1_5 to queue...
get url1_5 from queue.
put url_3 to queue...
get url_3 from queue.
put url_6 to queue...
get url_6 from queue.
最后介绍一下Pipe的通信机制,Pipe常用来在两个进程间进行通信,两个进程分别位于管道的两端。
Pipe方法返回(conn1,conn2)代表一个管道的两个端,Pipe方法有duplex参数,如果duplex参数为True,name这个管道是券商共模式,也就是说有conn1和conn2均可收发,若duplex为False,conn1只负责接收消息,conn2只负责发送消息。send和recv方法分别是发送和接收消息的方法。例如,在全双工模式下,可以调用conn1.send发送消息,conn1.recv接收消息。如果没有消息可接收,recv方法会一直阻塞,如果管道已经被关闭,那么recv方法会判处EOFError.
下面通过这个例子进行说明:创建两个进程,一个子进程通过pipe发送数据,一个子进程通过Pipe接收数据。程序示例如下:
import multiprocessing import random import time,os def proc_send(pipe,urls): for url in urls: print("Process{} send:{}".format(os.getpid(),url)) pipe.send(url) time.sleep(random.random()) def proc_recv(pipe): while True: print("Process{} rev:{}".format(os.getpid(),pipe.recv())) time.sleep(random.random()) if __name__ =="__main__": pipe = multiprocessing.Pipe() p1 = multiprocessing.Process(target = proc_send,args=(pipe[0],['url_'+str(i) for i in range(10)])) p2 = multiprocessing.Process(target=proc_recv,args = (pipe[1],)) p1.start() p2.start() p1.join() p2.join()
运行结果如下:
D:\python练习\test01\venv\Scripts\python.exe D:/python练习/test01/8up/8103.py
Process8160 send:url_0
Process5256 rev:url_0
Process8160 send:url_1
Process5256 rev:url_1
Process8160 send:url_2
Process5256 rev:url_2
Process8160 send:url_3
Process5256 rev:url_3
Process8160 send:url_4
Process5256 rev:url_4
Process8160 send:url_5
Process5256 rev:url_5
Process8160 send:url_6
Process8160 send:url_7
Process5256 rev:url_6
Process8160 send:url_8
Process8160 send:url_9
Process5256 rev:url_7
Process5256 rev:url_8
Process5256 rev:url_9