分布式处理
下面一个示例程序中,我们将使用ZMQ进行超级计算,也就是并行处理模型:
任务分发器ventilator会分发大量可以并行计算的任务;
有一组worker会处理这些任务;
结果收集器sinker会在末端接收所有worker的处理结果,进行汇总。
现实中,worker可能散落在不同的计算机中,利用GPU(图像处理单元)进行复杂计算。下面是任务分发器的代码,它会生成100个任务,任务内容是让收到的worker延迟若干毫秒。
taskvent: Parallel task ventilator in java
packageguide;importjava.util.Random;importorg.zeromq.ZMQ;//
//Task ventilator in Java//Binds PUSH socket to tcp://localhost:5557//Sends batch of tasks to workers via that socket//public classtaskvent {public static void main (String[] args) throwsException {
ZMQ.Context context= ZMQ.context(1);//Socket to send messages on
ZMQ.Socket sender =context.socket(ZMQ.PUSH);
sender.bind("tcp://*:5557");//Socket to send messages on
ZMQ.Socket sink =context.socket(ZMQ.PUSH);
sink.connect("tcp://localhost:5558");
System.out.println("Press Enter when the workers are ready: ");
System.in.read();
System.out.println("Sending tasks to workers\n");//The first message is "0" and signals start of batch
sink.send("0", 0);//Initialize random number generator
Random srandom = newRandom(System.currentTimeMillis());//Send 100 tasks
inttask_nbr;int total_msec = 0; //Total expected cost in msecs
for (task_nbr = 0; task_nbr < 100; task_nbr++) {intworkload;//Random workload from 1 to 100msecs
workload = srandom.nextInt(100) + 1;
total_msec+=workload;
System.out.print(workload+ ".");
String string= String.format("%d", workload);
sender.send(string,0);
}
System.out.println("Total expected cost: " + total_msec + " msec");
Thread.sleep(1000); //Give 0MQ time to deliver
sink.close();
sender.close();
context.term();
}
}
下面是worker的代码,它接受信息并延迟指定的毫秒数,并发送执行完毕的信号:
worker
packageguide;importorg.zeromq.ZMQ;//
//Task worker in Java//Connects PULL socket to tcp://localhost:5557//Collects workloads from ventilator via that socket//Connects PUSH socket to tcp://localhost:5558//Sends results to sink via that socket//public classtaskwork {public static void main (String[] args) throwsException {
ZMQ.Context context= ZMQ.context(1);//Socket to receive messages on
ZMQ.Socket receiver =context.socket(ZMQ.PULL);
receiver.connect("tcp://localhost:5557");//Socket to send messages to
ZMQ.Socket sender =context.socket(ZMQ.PUSH);
sender.connect("tcp://localhost:5558");//Process tasks forever
while (!Thread.currentThread ().isInterrupted ()) {
String string= new String(receiver.recv(0), ZMQ.CHARSET).trim();long msec =Long.parseLong(string);//Simple progress indicator for the viewer
System.out.flush();
System.out.print(string+ '.');//Do the work
Thread.sleep(msec);//Send results to sink
sender.send(ZMQ.MESSAGE_SEPARATOR, 0);
}
sender.close();
receiver.close();
context.term();
}
}
下面是结果收集器的代码。它会收集100个处理结果,并计算总的执行时间,让我们由此判别任务是否是并行计算的。
sink
packageguide;importorg.zeromq.ZMQ;//
//Task sink in Java//Binds PULL socket to tcp://localhost:5558//Collects results from workers via that socket//public classtasksink {public static void main (String[] args) throwsException {//Prepare our context and socket
ZMQ.Context context = ZMQ.context(1);
ZMQ.Socket receiver=context.socket(ZMQ.PULL);
receiver.bind("tcp://*:5558");//Wait for start of batch
String string = new String(receiver.recv(0), ZMQ.CHARSET);//Start our clock now
long tstart =System.currentTimeMillis();//Process 100 confirmations
inttask_nbr;int total_msec = 0; //Total calculated cost in msecs
for (task_nbr = 0; task_nbr < 100; task_nbr++) {
string= new String(receiver.recv(0), ZMQ.CHARSET).trim();if ((task_nbr / 10) * 10 ==task_nbr) {
System.out.print(":");
}else{
System.out.print(".");
}
}//Calculate and report duration of batch
long tend =System.currentTimeMillis();
System.out.println("\nTotal elapsed time: " + (tend - tstart) + " msec");
receiver.close();
context.term();
}
}
一组任务的平均执行时间在5秒左右,以下是分别开始1个、2个、5个worker时的执行结果:
# 1 worker
Total elapsed time: 5311 msec
# 2 workers
Total elapsed time: 2421 msec
# 5 workers
Total elapsed time: 1177 msec
关于这段代码的几个细节:
worker上游和任务分发器相连,下端和结果收集器相连,这就意味着你可以开启任意多个worker。但若worker是绑定至端点的,而非连接至端点,那我们就需要准备更多的端点,并配置任务分发器和结果收集器。所以说,任务分发器和结果收集器是这个网络结构中较为稳定的部分,因此应该由它们绑定至端点,而非worker,因为它们较为动态。
我们需要做一些同步的工作,ventilator中的等所有worker启动后按enter开始的逻辑,等待worker全部启动之后再分发任务。这点在ZMQ中很重要,且不易解决。zmq非常快,而连接套接字的动作会耗费一定的时间,因此当第一个worker连接成功时,它会一下收到很多任务。所以说,如果我们不进行同步,那这些任务根本就不会被并行地执行(全都给了第一个worker了)。你可以自己试验一下。
任务分发器使用PUSH套接字向worker均匀地分发任务(假设所有的worker都已经连接上了),这种机制称为负载均衡,以后我们会见得更多。
结果收集器的PULL套接字会均匀地从worker处收集消息,这种机制称为公平队列:
管道模式也会出现慢连接的情况,让人误以为PUSH套接字没有进行负载均衡。如果你的程序中某个worker接收到了更多的请求,那是因为它的PULL套接字连接得比较快,从而在别的worker连接之前获取了额外的消息。