1.MPI barrier:
1.1two algorithms for Barrier Synchronization.
P.S.:thisis the paper that referenced in the mpi source code.
Thispaper introduces two good algorithms:
thedissemination algorithm and the tournament algorithms.
The dissemination algorithm:
The main theory ofdissemination :
duringround i, process p sends all of the information that it knows toprocess 2i+p(modn).If a process waits to receive the message sent to it during roundi and incorporates that message into its own message for allsubsequent rounds starting with round i+1,then all processes receiveinformation originating at all other processes in exactly log2nrounds.
The tournament algorithm:
the performance is also log2n.
1.2.Asurvey of Barrier Algorithms for Coarse grained supercomputers.
This paper gives an overview about all currently known algorithms whichare suitable for distributed shared memory architectures and messagepassing based computer systems.
The following introduces the performance of every algorithm:
1.3Scalability evaluation of barrier algorithms for OpenMP
Thispaper has evaluated the algorithms mentioned above,the followingpicture is the result:
1.4Sourcecode for tournament algorithm:
youcan get it from the following address:
https://github.com/jedivind/barriersync/blob/master/MP-MPI/tournament.c
Butit still has a problem ,in the tournament algorithms you have toinitialize some variables only once ,so I am wondering now?
2.MPI_Allgather
2.1 the default method is everyprocess i receive message form i-1 and
send message it received to i+1(wrap-round) .the performance is o(p)
2.2 the algorithm the mpi soucecode used is a method likes dissemination
which was proposed by JehoshuaBruck .(log p)
The following picture is anexample.
2.3recursivedoubling(logp)
Note: a. the recursive doubling algorithm is straightforward for apower-of-two number of processes but is a little tricky to get rightfor a non-power-of-two number of processes.
3.MPI_Reduce
3.1 the algorithm the mpi sourcecode used is the binomial tree.
The theory can be describedbellow:
for instance there are 8 nodes,and their ranks range from 0 to 7.
mask = 1: 0 <----1 2<----3 4<----5 6<----7
mask = 2 : 0<----2 4<----6
mask = 4 : 0<----4
3.2 Rabenseifner's reducealgorithm .
seehttp://www.hlrs.de/mpi/myreduce.html.And you can download the file myreduce.c.
Note:
a.the binomial tree algorithms is better for short messages.
b. the Rabenseifner's reduce algorithm is better for long messages.
c. Rabemseifner's algorithm is not suitable for user-definedreduction
operations but predefined reduction operations. This means only basicdatatypes is allowed not derived datatypes.
d. if message is larger than 2KB ,just use the Rabenseifner,otherwise use the binomial tree algorithm.
4.MPI_Bcast
4.1 the algorithm in the mpich2 src is recursive subdivisionalgorithms.
The root sends to the process com_size/2 away; the receiver becomes aroot for a subtree and applies the same process. It also callsbinomial algorithms.
4.2 Scatter and allgather = broadcast
Note: a. binomial broadcast is better for small messages (<12KB)
b.(scatter+ allgather) is better for long messages(>512KB).
c.the mpi source code have already used the two algorithmsand use
the message length to determine which algorithm to be called.
5.MPI_Allreduce
5.1. for predefined operations ,we can use recursive doubling(short message)
and rabenseifner(long message) algorithms.
5.2. for user-defined operations, just use recursive doubling.
Note: Both algorithms have beenimplemented in the mpi src.
6. MPI_Alltoall
6.1. Bruck's algorithms (for<256bytes)
6.2. Tony Ladd's Post all irecvsand isends(for medium size messages. 256Bytes-32KB)
6.3. Pairwise exchange (for longmessages and power of 2 processors)
6.4. for non-power of 2, analgorithm in which in each step ,k,process I sends data to (i+k) andreceives from (i-k).
Note: all have been implementedin the src.
7. MPI_Scatter
7.1. binomial tree algorithmboth for short and long messages.
References:
1.twoalgorithms for Barrier Synchronization.
2.Asurvey of Barrier Algorithms for Coarse grained supercomputers.
3.Scalability evaluation of barrier algorithms forOpenMP .
4.On optimizing collective communication.
5.Improving the Performance of Collective Operations inMPICH.