Python基于socket的多进程分布式计算Demo

前言

通过对multiprocessing.managers的学习,写了一个基于socket的分布式计算的小Demo。这个Demo做的事情是,master产生0-20的整数并放入task queue,slave在集群网络中获取task queue取数,做sum操作并将结果放进result queue,master打印出result queue的元素值。

Test

  1. demo在本地以多进程模拟分布式环境运行。若需要运行在不同机器环境,则需更改client.py中的本地环回地址IP为 server/master 机器IP。
  2. 运行 server.py 后再运行 client.py, client_2.py, client_3.py…也先运行client.py再运行server.py程序。
    client端运行的代码都相同。

1. Master / Server

# server.py
# -*- coding:utf-8 -*-

# 多进程分布式Demo
# 服务器端
# master服务端原理:通过managers模块把Queue通过网络暴露出去,其他机器的进程就可以访问Queue了
# 服务进程负责启动Queue,把Queue注册到网络上,然后往Queue里面写入任务,代码如下:

import random, Queue as queue
from multiprocessing.managers import BaseManager
import numpy as np
import time
from jc import utils		# 私人工具库,可删除此引用

# 初始化自定义logger
mlog = utils.my_logger("Server")

# 发送任务的队列
task_queue = queue.Queue()
# 接收结果的队列
result_queue = queue.Queue()


# 使用标准函数来代替lambda函数,避免python2.7中,pickle无法序列化lambda的问题
def get_task_queue():
	global task_queue
	return task_queue


# 使用标准函数来代替lambda函数,避免python2.7中,pickle无法序列化lambda的问题
def get_result_queue():
	global result_queue
	return result_queue


def startManager(host, port, authkey):
    # 把两个Queue都注册到网络上,callable参数关联了Queue对象,注意回调函数不能使用括号
    BaseManager.register('get_task_queue', callable=get_task_queue)
    BaseManager.register('get_result_queue', callable=get_result_queue)
    # 设置host,绑定端口port,设置验证码为authkey
    manager = BaseManager(address=(host, port), authkey=authkey)
    # 启动manager服务器
    manager.start()
    return manager


def put_queue(manager, objs):
    # 通过网络访问queueu
    task = manager.get_task_queue()
    for obj in objs:
        try:
            #print("Put obj:{}".format(obj))
            mlog.info("Put obj:{}".format(obj))
            task.put(obj)
            time.sleep(1)
        except queue.Full:
            mlog.info("put_queue task full.exit ")
            break

def get_result(worker):
    # 通过网络访问queueu
    result = worker.get_result_queue()
    while 1:
        try:
            n = result.get(timeout=10)
            mlog.info("Result get {}".format(n))
            time.sleep(1)
        except queue.Empty:
            mlog.info("get_result result empty...retring")
            continue
        else:
            pass


if __name__ == "__main__":

    host = '127.0.0.1'
    port = 5000
    authkey = b'abc'
    # 启动manager服务器
    manager = startManager(host, port, authkey)

    # 数据
    data = range(0,20,1)

    # 给task队列添加数据
    put_queue(manager, data)
    #get_queue(manager)

    get_result(manager)

    # 关闭服务器
    manager.shutdown

2. Slave / Client

# client.py
# -*- coding:utf-8 -*-

# 在分布式多进程环境下,添加任务到Queue不可以直接对原始的task_queue进行操作,
# 那样就绕过了QueueManager的封装,必须通过manager.get_task_queue()获得的Queue接口添加。

import random, Queue as queue
import time
from multiprocessing.managers import BaseManager
from jc import utils

# 初始化自定义logger
mlog = utils.my_logger("Client")

cal_queue = queue.Queue(3)

def start_worker(host, port, authkey):
	# 由于这个BaseManager只从网络上获取queue,所以注册时只提供名字
	BaseManager.register('get_task_queue')
	BaseManager.register('get_result_queue')
	mlog.info ('Connect to server %s' % host)
	# 注意,端口port和验证码authkey必须和manager服务器设置的完全一致
	worker = BaseManager(address=(host, port), authkey=authkey)
	# 链接到manager服务器
	try:
		worker.connect()
	except Exception as e:
		mlog.exception(e)
		mlog.info("Tring reconnection...")
		time.sleep(1)
		start_worker(host, port, authkey)
	else:
		mlog.info('Connecting server %s' % host)
		return worker

def get_queue(worker):
	if not worker:
		mlog.info("worker is None, exit")

	task = worker.get_task_queue()
	result = worker.get_result_queue()
	# 从task队列取数据,并添加到result队列中

	tag = 0
	while 1:
		tag = tag + 1
		time.sleep(1)
		if cal_queue.full() or (tag>3 and not cal_queue.empty()):
			cal_sum = 0
			while not cal_queue.empty():
				cal_sum += cal_queue.get()
			result.put(cal_sum)
			mlog.info('result put %d' % cal_sum)
			tag = 0
		try:
			n = task.get(timeout=10)
			mlog.info('worker get %d' % n)
			cal_queue.put(n)
		except queue.Empty:
			mlog.info("get_queue task empty...retring")
			continue
		except queue.Full:
			mlog.info("put_cal_queue task full...waiting")
			continue



if __name__ == "__main__":
	host = '127.0.0.1'
	port = 5000
	authkey = b'abc'

	# 启动worker
	worker = start_worker(host, port, authkey)
	# 获取队列
	get_queue(worker)

运行Log

Master 1x + Slave 2x
在master中get的结果没有对(0++20)的结果汇总,这是预期的。

  1. server:
/usr/bin/python2.7 /Users/gdlocal1/PycharmProjects/test/test.py
2019-08-27 18:36:23,439 - Server:put_queue - INFO - Put obj:0 
2019-08-27 18:36:24,444 - Server:put_queue - INFO - Put obj:1 
2019-08-27 18:36:25,448 - Server:put_queue - INFO - Put obj:2 
2019-08-27 18:36:26,453 - Server:put_queue - INFO - Put obj:3 
2019-08-27 18:36:27,457 - Server:put_queue - INFO - Put obj:4 
2019-08-27 18:36:28,461 - Server:put_queue - INFO - Put obj:5 
2019-08-27 18:36:29,466 - Server:put_queue - INFO - Put obj:6 
2019-08-27 18:36:30,467 - Server:put_queue - INFO - Put obj:7 
2019-08-27 18:36:31,471 - Server:put_queue - INFO - Put obj:8 
2019-08-27 18:36:32,476 - Server:put_queue - INFO - Put obj:9 
2019-08-27 18:36:33,479 - Server:put_queue - INFO - Put obj:10 
2019-08-27 18:36:34,484 - Server:put_queue - INFO - Put obj:11 
2019-08-27 18:36:35,488 - Server:put_queue - INFO - Put obj:12 
2019-08-27 18:36:36,492 - Server:put_queue - INFO - Put obj:13 
2019-08-27 18:36:37,497 - Server:put_queue - INFO - Put obj:14 
2019-08-27 18:36:38,497 - Server:put_queue - INFO - Put obj:15 
2019-08-27 18:36:39,502 - Server:put_queue - INFO - Put obj:16 
2019-08-27 18:36:40,507 - Server:put_queue - INFO - Put obj:17 
2019-08-27 18:36:41,509 - Server:put_queue - INFO - Put obj:18 
2019-08-27 18:36:42,512 - Server:put_queue - INFO - Put obj:19 
2019-08-27 18:36:43,518 - Server:get_result - INFO - Result get 6 
2019-08-27 18:36:44,521 - Server:get_result - INFO - Result get 9 
2019-08-27 18:36:45,525 - Server:get_result - INFO - Result get 27 
2019-08-27 18:36:46,528 - Server:get_result - INFO - Result get 25 
2019-08-27 18:36:47,533 - Server:get_result - INFO - Result get 40 
2019-08-27 18:36:48,537 - Server:get_result - INFO - Result get 46 
2019-08-27 18:36:59,545 - Server:get_result - INFO - get_result result empty...retring 
2019-08-27 18:37:05,564 - Server:get_result - INFO - Result get 37 
2019-08-27 18:37:16,569 - Server:get_result - INFO - get_result result empty...retring 
  1. client 1:
2019-08-27 18:36:20,573 - Client_1:start_worker - INFO - Connect to server 127.0.0.1 
2019-08-27 18:36:23,504 - Client_1:start_worker - INFO - Connecting server 127.0.0.1 
2019-08-27 18:36:24,514 - Client_1:get_queue - INFO - worker get 0 
2019-08-27 18:36:25,517 - Client_1:get_queue - INFO - worker get 2 
2019-08-27 18:36:27,505 - Client_1:get_queue - INFO - worker get 4 
2019-08-27 18:36:28,507 - Client_1:get_queue - INFO - result put 6 
2019-08-27 18:36:29,485 - Client_1:get_queue - INFO - worker get 6 
2019-08-27 18:36:30,489 - Client_1:get_queue - INFO - worker get 7 
2019-08-27 18:36:35,501 - Client_1:get_queue - INFO - worker get 12 
2019-08-27 18:36:36,506 - Client_1:get_queue - INFO - result put 25 
2019-08-27 18:36:36,507 - Client_1:get_queue - INFO - worker get 13 
2019-08-27 18:36:39,508 - Client_1:get_queue - INFO - worker get 16 
2019-08-27 18:36:40,513 - Client_1:get_queue - INFO - worker get 17 
2019-08-27 18:36:41,517 - Client_1:get_queue - INFO - result put 46 
2019-08-27 18:36:41,518 - Client_1:get_queue - INFO - worker get 18 
2019-08-27 18:36:42,519 - Client_1:get_queue - INFO - worker get 19 
2019-08-27 18:36:53,524 - Client_1:get_queue - INFO - get_queue task empty...retring 
2019-08-27 18:37:04,533 - Client_1:get_queue - INFO - get_queue task empty...retring 
2019-08-27 18:37:05,538 - Client_1:get_queue - INFO - result put 37 
2019-08-27 18:37:15,541 - Client_1:get_queue - INFO - get_queue task empty...retring 
2019-08-27 18:37:26,547 - Client_1:get_queue - INFO - get_queue task empty...retring 
  1. client 2:
2019-08-27 18:36:16,474 - Client_2:start_worker - INFO - Connect to server 127.0.0.1 
2019-08-27 18:36:23,516 - Client_2:start_worker - INFO - Connecting server 127.0.0.1 
2019-08-27 18:36:24,526 - Client_2:get_queue - INFO - worker get 1 
2019-08-27 18:36:26,508 - Client_2:get_queue - INFO - worker get 3 
2019-08-27 18:36:28,482 - Client_2:get_queue - INFO - worker get 5 
2019-08-27 18:36:29,487 - Client_2:get_queue - INFO - result put 9 
2019-08-27 18:36:31,483 - Client_2:get_queue - INFO - worker get 8 
2019-08-27 18:36:32,484 - Client_2:get_queue - INFO - worker get 9 
2019-08-27 18:36:33,485 - Client_2:get_queue - INFO - worker get 10 
2019-08-27 18:36:34,486 - Client_2:get_queue - INFO - result put 27 
2019-08-27 18:36:34,486 - Client_2:get_queue - INFO - worker get 11 
2019-08-27 18:36:37,503 - Client_2:get_queue - INFO - worker get 14 
2019-08-27 18:36:38,505 - Client_2:get_queue - INFO - worker get 15 
2019-08-27 18:36:39,509 - Client_2:get_queue - INFO - result put 40 
2019-08-27 18:36:49,513 - Client_2:get_queue - INFO - get_queue task empty...retring 

GitLab

代码维护在Gitlab

https://gitlab.com/cyril_j/mutils/tree/master/Python/Distributed_Computer_demo

参考

  1. 基于socket的python分布式运算中多服务器间的通信问题
  2. 分布式进程 - 廖雪峰
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值