Python爬虫 multiprocessing库实践——进程模拟

Python爬虫(十一)

学习Python爬虫过程中的心得体会以及知识点的整理,方便我自己查找,也希望可以和大家一起交流。

—— multiprocessing库实践 进程模拟——

1.构造多个进程

import multiprocessing
import time

def worker_1(interval):
    print ("worker_1")
    time.sleep(interval)
    print ("end worker_1")

def worker_2(interval):
    print ("worker_2")
    time.sleep(interval)
    print ("end worker_2")

def worker_3(interval):
    print ("worker_3")
    time.sleep(interval)
    print ("end worker_3")

if __name__ == "__main__":
    p1 = multiprocessing.Process(target = worker_1, args = (2,))
    p2 = multiprocessing.Process(target = worker_2, args = (3,))
    p3 = multiprocessing.Process(target = worker_3, args = (4,))

    p1.start()
    p2.start()
    p3.start()

    print("The number of CPU is:" + str(multiprocessing.cpu_count()))
    for p in multiprocessing.active_children():
        print("child   p.name:" + p.name + "\tp.id" + str(p.pid))
    print ("END!!!!!!!!!!!!!!!!!")

结果如图:
multiprocessing库实践——进程模拟

2. daemon属性(父进程为守护进程)

在windows系统下使用Python3的IDLE进行编译会出现子进程进行不了的问题,针对这个问题官方回复是:

Well, IDLE is a strange thing. In order to “capture” everything what you write using print statements orsys.stdout.write, IDLE “overrides” sys.stdout and replaces it with an object that passes everything back to IDLE so it can print it. I guess when you are starting a new process from multiprocessing, this hackery is not inherited by the child process, therefore you don’t see anything in IDLE. But I’m just guessing here, I don’t have a Windows machine at the moment to check it. – Tamás May 6 '10 at 9:10

也就是说由于Windows安全机制以及IDLE设计的问题,这个没办法搞定,只能在命令行模式下运行正常。所以这个部分我们将就在命令模式下运行。

当没有将父进程设置为守护进程时

import multiprocessing
import time


#不加daemon
def worker(interval):
    print("work start:{0}".format(time.ctime()));
    time.sleep(interval)
    print("work end:{0}".format(time.ctime()));

if __name__ == "__main__":
    p = multiprocessing.Process(target = worker, args = (3,))
    p.start()
    print "end!"

结果如图:
multiprocessing库实践——进程模拟
我们可以看到是先进行的父进程再进行的子进程。
那么我们将父进程设置为守护进程时:

import multiprocessing
import time

#加上daemon
def worker(interval):
    print("work start:{0}".format(time.ctime()));
    time.sleep(interval)
    print("work end:{0}".format(time.ctime()));

if __name__ == "__main__":
    p = multiprocessing.Process(target = worker, args = (3,))
    p.daemon = True
    p.start()
    print "end!"

结果如图:
multiprocessing库实践——进程模拟
可以看到,当父进程为守护进程时,父进程一旦结束,子进程便不再进行。当然我们还可以设置daemon执行完结束的方法

import multiprocessing
import time

#设置daemon执行完结束的方法
def worker(interval):
    print("work start:{0}".format(time.ctime()));
    time.sleep(interval)
    print("work end:{0}".format(time.ctime()));

if __name__ == "__main__":
    p = multiprocessing.Process(target = worker, args = (3,))
    p.daemon = True
    p.start()
    p.join()
    print "end!"

结果如图:
multiprocessing库实践——进程模拟
p.join()的作用就是告诉电脑,等子进程执行结束在运行父进程。

3. 调用多个进程池

import multiprocessing
import os, time, random

def Lee():
    print ("\nRun task Lee-%s" %(os.getpid())) #os.getpid()获取当前的进程的ID
    start = time.time()
    time.sleep(random.random() * 10) #random.random()随机生成0-1之间的小数
    end = time.time()
    print ('Task Lee, runs %0.2f seconds.' %(end - start))

def Marlon():
    print ("\nRun task Marlon-%s" %(os.getpid()))
    start = time.time()
    time.sleep(random.random() * 40)
    end=time.time()
    print ('Task Marlon runs %0.2f seconds.' %(end - start))

def Allen():
    print ("\nRun task Allen-%s" %(os.getpid()))
    start = time.time()
    time.sleep(random.random() * 30)
    end = time.time()
    print ('Task Allen runs %0.2f seconds.' %(end - start))

def Frank():
    print ("\nRun task Frank-%s" %(os.getpid()))
    start = time.time()
    time.sleep(random.random() * 20)
    end = time.time()
    print ('Task Frank runs %0.2f seconds.' %(end - start))

if __name__=='__main__':
    function_list=  [Lee, Marlon, Allen, Frank]
    print ("parent process %s" %(os.getpid()))

    pool=multiprocessing.Pool(2)
    for func in function_list:
        pool.apply_async(func)     #Pool执行函数,apply执行函数,当有一个进程执行完毕后,会添加一个新的进程到pool中

    print ('Waiting for all subprocesses done...')
    pool.close()
    pool.join()    #调用join之前,一定要先调用close() 函数,否则会出错, close()执行后不会有新的进程加入到pool,join函数等待素有子进程结束
    print ('All subprocesses done.')

结果如图:
multiprocessing库实践——进程模拟

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值