algorithms for collective operations in MPI

最新推荐文章于 2022-07-25 15:22:19 发布

chrysanthemumcao

最新推荐文章于 2022-07-25 15:22:19 发布

阅读量919

点赞数

分类专栏： MPI

本文链接：https://blog.csdn.net/chrysanthemumcao/article/details/8197154

版权

MPI 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

1.MPI barrier:

1.1two algorithms for Barrier Synchronization.

P.S.:thisis the paper that referenced in the mpi source code.

Thispaper introduces two good algorithms:

thedissemination algorithm and the tournament algorithms.

The dissemination algorithm:

The main theory ofdissemination :

duringround i, process p sends all of the information that it knows toprocess 2ⁱ+p(modn).If a process waits to receive the message sent to it during roundi and incorporates that message into its own message for allsubsequent rounds starting with round i+1,then all processes receiveinformation originating at all other processes in exactly log2nrounds.

The tournament algorithm:

the performance is also log2n.

1.2.Asurvey of Barrier Algorithms for Coarse grained supercomputers.

This paper gives an overview about all currently known algorithms whichare suitable for distributed shared memory architectures and messagepassing based computer systems.

The following introduces the performance of every algorithm:

1.3Scalability evaluation of barrier algorithms for OpenMP

Thispaper has evaluated the algorithms mentioned above,the followingpicture is the result:

1.4Sourcecode for tournament algorithm:

youcan get it from the following address:

https://github.com/jedivind/barriersync/blob/master/MP-MPI/tournament.c

Butit still has a problem ,in the tournament algorithms you have toinitialize some variables only once ,so I am wondering now?

2.MPI_Allgather

2.1 the default method is everyprocess i receive message form i-1 and

send message it received to i+1(wrap-round) .the performance is o(p)

2.2 the algorithm the mpi soucecode used is a method likes dissemination

which was proposed by JehoshuaBruck .(log p)

The following picture is anexample.

2.3recursivedoubling(logp)

Note: a. the recursive doubling algorithm is straightforward for apower-of-two number of processes but is a little tricky to get rightfor a non-power-of-two number of processes.

3.MPI_Reduce

3.1 the algorithm the mpi sourcecode used is the binomial tree.

The theory can be describedbellow:

for instance there are 8 nodes,and their ranks range from 0 to 7.

mask = 1: 0 <----1 2<----3 4<----5 6<----7

mask = 2 : 0<----2 4<----6

mask = 4 : 0<----4

3.2 Rabenseifner's reducealgorithm .

seehttp://www.hlrs.de/mpi/myreduce.html.And you can download the file myreduce.c.

Note:

a.the binomial tree algorithms is better for short messages.

b. the Rabenseifner's reduce algorithm is better for long messages.

c. Rabemseifner's algorithm is not suitable for user-definedreduction

operations but predefined reduction operations. This means only basicdatatypes is allowed not derived datatypes.

d. if message is larger than 2KB ,just use the Rabenseifner,otherwise use the binomial tree algorithm.

4.MPI_Bcast

4.1 the algorithm in the mpich2 src is recursive subdivisionalgorithms.

The root sends to the process com_size/2 away; the receiver becomes aroot for a subtree and applies the same process. It also callsbinomial algorithms.

4.2 Scatter and allgather = broadcast

Note: a. binomial broadcast is better for small messages (<12KB)

b.(scatter+ allgather) is better for long messages(>512KB).

c.the mpi source code have already used the two algorithmsand use

the message length to determine which algorithm to be called.

5.MPI_Allreduce

5.1. for predefined operations ,we can use recursive doubling(short message)

and rabenseifner(long message) algorithms.

5.2. for user-defined operations, just use recursive doubling.

Note: Both algorithms have beenimplemented in the mpi src.

6. MPI_Alltoall

6.1. Bruck's algorithms (for<256bytes)

6.2. Tony Ladd's Post all irecvsand isends(for medium size messages. 256Bytes-32KB)

6.3. Pairwise exchange (for longmessages and power of 2 processors)

6.4. for non-power of 2, analgorithm in which in each step ,k,process I sends data to (i+k) andreceives from (i-k).

Note: all have been implementedin the src.

7. MPI_Scatter

7.1. binomial tree algorithmboth for short and long messages.

References:

1.twoalgorithms for Barrier Synchronization.

2.Asurvey of Barrier Algorithms for Coarse grained supercomputers.

3.Scalability evaluation of barrier algorithms forOpenMP .

4.On optimizing collective communication.

5.Improving the Performance of Collective Operations inMPICH.

chrysanthemumcao

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
algorithms for collective operations in MPI

1.MPI barrier:1.1two algorithms for Barrier Synchronization.P.S.:thisis the paper that referenced in the mpi source code.Thispaper introduces two good algorithms:thedissemination algorithm and
复制链接

扫一扫