分布式进程

分布式进程指的是将Process进程分布的多台机器上,充分利用多台机器的性能完成复杂的任务,我们可以将这点应用的分布式爬虫的开发中。

我们举个简单例子,服务进程用来设置任务在task_queue,并设置接口。任务进程调用相同的接口,执行任务,结果写进result queue

 

 

 

taskManager.py : 服务进程

from multiprocessing.managers import BaseManager
from multiprocessing import freeze_support
import queue
#任务个数
task_number = 10
#定义收发队列
task_queue = queue.Queue(task_number)
result_queue = queue.Queue(task_number)

def get_task():
    return task_queue

def get_result():
    return result_queue

#创建类似的QueenManager
class QueueManager(BaseManager):
    pass
def win_run():
    QueueManager.register('get_task_queue',callable=get_task)
    QueueManager.register('get_result_queue',callable=get_result)
    manager = QueueManager(address = ('127.0.0.1',8001),authkey = 'qiye')
    manager.start()
    try:
        task = manager.get_task_queue()
        result = manager.get_result_queue()
        for url in ["ImageUrl_" + str(i) for i in range(10)]:
            print('put task %s....' % url)
            task.put(url)
        print('try get result....')
        for i in range(10):
            print('result is %s' % result.get(timeout=10))
    except:
        print('Manager error')
    finally:
        manager.shutdown()


if __name__ == '__main__':
    freeze_support()
    win_run()

taskWorker.py: 任务进程

import time
from multiprocessing.managers import BaseManager

class QueueManager(BaseManager):
    pass

QueueManager.register('get_task_queue')
QueueManager.register('get_result_queue')

server_addr = '127.0.0.1'
print('Connect to server %s..' % server_addr)
m = QueueManager(address=(server_addr,8001),authkey='qiye')
m.connect()
task = m.get_task_queue()
result = m.get_result_queue()

while(not task.empty()):
    image_url = task.get(True,timeout=5)
    print('run task download %s ....' % image_url)
    time.sleep(1)
    result.put('%s----->success' % image_url)

print ('work exit.')

先执行服务进程, 任务被放进 task_queue:

put task ImageUrl_0....
put task ImageUrl_1....
put task ImageUrl_2....
put task ImageUrl_3....
put task ImageUrl_4....
put task ImageUrl_5....
put task ImageUrl_6....
put task ImageUrl_7....
put task ImageUrl_8....
put task ImageUrl_9....
try get result....

服务进程还在执行时,运行任务进程,

Connect to server 127.0.0.1..
run task download ImageUrl_0 ....
run task download ImageUrl_1 ....
run task download ImageUrl_2 ....
run task download ImageUrl_3 ....
run task download ImageUrl_4 ....
run task download ImageUrl_5 ....
run task download ImageUrl_6 ....
run task download ImageUrl_7 ....
run task download ImageUrl_8 ....
run task download ImageUrl_9 ....
work exit.

任务进程结束后,可以看到数据被写入result_queue:

 

result is ImageUrl_0----->success
result is ImageUrl_1----->success
result is ImageUrl_2----->success
result is ImageUrl_3----->success
result is ImageUrl_4----->success
result is ImageUrl_5----->success
result is ImageUrl_6----->success
result is ImageUrl_7----->success
result is ImageUrl_8----->success
result is ImageUrl_9----->success

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值