在爬虫的开发过程中中,进程和线程的概念是非常重要的,提高爬虫的效率,打造分布式爬虫,都离不开线程,进程的身影。多线程,多进程,协程,分布式进程等。
""
Python实现多进程的方式主要有两种,一种方法是使用os模块中的fork 方法,另一种方式是使用multiprocessing模块。
"""
import os
from multiprocessing import Process
# 子进程要执行的代码
def run_proc(name):
print('Child process %s (%s) Running ...' %(name, os.getpid()))
if __name__ == '__main__':
print('Parent process %s.' %os.getpid())
for i in range(5):
p = Process(target=run_proc, args=(str(i),))
print("Process will start")
p.start()
p.join()
print("Process end.")
对于上述多进程的输出结果,如下:
Parent process 9892.
Process will start
Process will start
Process will start
Process will start
Process will start
Child process 0 (13208) Running ...
Child process 1 (19416) Running ...
Child process 2 (10016) Running ...
Child process 4 (17164) Running ...
Child process 3 (3492) Running ...
Process end.
Process finished with exit code 0