Description
BIT has recently taken delivery of their new supercomputer, a 32 processor Apollo Odyssey distributed shared memory machine with a hierarchical communication subsystem. Valentine McKee’s research advisor, Jack Swigert, has asked her to benchmark the new system.
“”Since the Apollo is a distributed shared memory machine, memory access and communication times are not uniform,” Valentine told Swigert. “Communication is fast between processors that share the same memory subsystem, but it is slower between processors that are not on the same subsystem. Communication between the Apollo and machines in our lab is slower yet.”
“How is Apollo’s port of the Message Passing Interface (MPI) working out?” Swigert asked.
“Not so well,” Valentine replied. “To do a broadcast of a message from one processor to all the other n-1 processors, they just do a sequence of n-1 sends. That really serializes things and kills the performance.”
“Is there anything you can do to fix that?”
“Yes,” smiled Valentine. “There is. Once the first processor has sent the message to another, those two can then send messages to two other hosts at the same time. Then there will be four hosts that can send, and so on.”
“Ah, so you can do the broadcast as a binary tree!”
“Not really