OpenMP + MPI 编程 (六)Hybrid Program Template & Types of MPI Calls Among Threads

Hybrid Program Template

Start with MPI initialization

(Serial regions are executed by the
main thread of the MPI process)

Create OMP parallel regions within
each MPI process
MPI calls may be allowed here too
MPI rank is known to all threads

Call MPI in single-threaded regions

Finalize MPI

MPI_Init
...
MPI_Call
...
OMP parallel
...
MPI_Call
...
end parallel

...
MPI_Call
...
MPI_Finalize

Types of MPI Calls Among Threads

Single-threaded messaging

Call MPI from a serial region

  • Call MPI from a single thread
  • within a parallel region

Multi-threaded messaging

  • Call MPI from multiple threads
    within a parallel region
  • Requires an implementation of
    MPI that is thread-safe

MPI and Thread Safety

Consider thread safety when calling MPI from threads
Use MPI_Init_thread to determine/select MPI’s level of thread support:

  • Substitute MPI_Init_thread() for the usual call to MPI_Init()
  • MPI_Init_thread() is supported in MPI-2 and up

Thread safety is identified/controlled by MPI’s provided types:

  • Single means no multithreading
  • Funneled means only the main thread can call MPI
  • Serialized means multiple threads can call MPI, but only 1 call can be in progress at a time
  • Multiple means MPI is thread safe

Monotonic values are assigned to the corresponding MPI parameters:

MPI_THREAD_SINGLE < MPI_THREAD_FUNNELED < MPI_THREAD_SERIALIZED < MPI_THREAD_MULTIPLE

MPI-2 MPI_Init_thread

call MPI_Init_thread (                          irqd, ipvd, ierr)
int  MPI_Init_thread (int *argc, char ***argv, int rqd, int *pvd)
int  MPI::Init_thread(int& argc, char**& argv, int rqd)
  • Input: rqd, or “required” (integer)

    • Indicates the desired level of thread support
  • Output: pvd, or “provided” (integer)

    • Indicates the available level of thread support
  • If thread level rqd is supported, the call returns pvd = rqd.

  • Otherwise, pvd returns the highest provided level of support.

Possible values for rqd and pvd:

Support LevelsDescription
MPI_THREAD_SINGLEOnly one thread will execute.
MPI_THREAD_FUNNELEDProcess may be multi-threaded, but only the main thread will make MPI calls (calls are “funneled” to main thread). Default
MPI_THREAD_SERIALIZEDProcess may be multi-threaded, and any thread can make MPI calls, but threads cannot execute MPI calls concurrently; they must take turns (MPI calls are “serialized”).
MPI_THREAD_MULTIPLEMultiple threads may call MPI, with no restriction.

Example: Single-Threaded MPI

#include <mpi.h>
int main(int argc, char **argv) {
	int rank, size, ie, i;
	
	ie= MPI_Init(&argc,&argv[]);
	ie= MPI_Comm_rank(...&rank);
	ie= MPI_Comm_size(...&size);
	
	//Setup shared mem, comp/comm
	
	#pragma omp parallel for
	  for(i=0; i<n; i++){
	    <work>
	  }
	
	// compute & communicate
	
	ie= MPI_Finalize();
}

Funneled MPI via Main Thread

  • Must have support for MPI_THREAD_FUNNELED or higher
  • Only the main thread may make MPI calls; in OpenMP, the main thread is activated by the omp master construct
  • This OpenMP construct does not create implicit barriers for the other threads, so it may be necessary to use omp barrier as well
  • As illustrated in the example, barriers may be required in two places:
    • Before the MPI call, in case the MPI call needs to wait on the results of other threads
    • After the MPI call, in case other threads immediately need the results of the MPI call

In the example, the main thread executes a single MPI call, while the other threads are idl

#include <mpi.h>
int main(int argc, char **argv) {
int ie;
...
#pragma omp parallel
{
  #pragma omp barrier
  #pragma omp master
  {
    ie = MPI_<Whatever>(...);
  }
  #pragma omp barrier
}
...

Serialized MPI and OpenMP

  • Must have support for MPI_THREAD_SERIALIZED or higher
  • Any thread may make MPI calls, but not concurrently with other threads
  • Works with OpenMP constructs such as omp critical and omp single
  • It may be necessary to use omp barrier before omp single and the MPI call(s), but not after, as there is an implicit barrier at the end of the omp single worksharing construct

As discussed on the previous page for MPI_THREAD_FUNNELED, a barrier should only be necessary if it is important that MPI wait on the results of other threads. The following example is the simplest one: any thread (not necessarily the main thread) executes an MPI call within the omp single construct, while the other threads are idle.

#include <mpi.h>
int main(int argc, char **argv) {
int ie, ipvd;
...
ie = MPI_Init_thread(
    MPI_THREAD_SERIALIZED,ipvd);
#pragma omp parallel
{
  #pragma omp barrier
  #pragma omp single
  {
    ie = MPI_<Whatever>(...);
  }
  //Don't need omp barrier here
}
...

Overlapping Work and MPI

  • Just a few processes or threads should be capable of saturating the Omni-Path network link (according to Intel Omni-Path documentation, Sec. 4.9.4)Just a few processes or threads should be capable of saturating the Omni-Path network link (according to Intel Omni-Path documentation, Sec. 4.9.4)

    • Why use all cores to communicate?
    • Instead, communicate using just one core or several cores
    • Can do work with the rest during communication
  • Must have support for MPI_THREAD_FUNNELED or higher to do this

  • Can be difficult to manage and load-balance!

#include <mpi.h>
int main(int argc, char **argv) {
int ie;
...
#pragma omp parallel
{
  if (thread == 0) {
    ie = MPI_<Whatever>(...);
  } else {
    <work>
  }
}
...

Example: Multi-Threaded MPI Calls

  • Thread ID as well as rank can be used in communication
  • Technique is illustrated in multi-thread “ping” (send/receive) example:

在这里插入图片描述

  • A total of nthreads messages are sent and received
  • From the sending process: each thread sends a message tagged with its thread number
  • Recall that an MPI message can be received only by an MPI_Recv with a matching tag
  • At the receiving process: each thread looks for a message tagged with its thread number
  • Therefore, communication occurs pairwise, between threads whose numbers match

NUMA Control in Code

Within a code, Scheduling Affinity and Memory Policy can be examined and changed via APIs to Linux system routines

懒得抄,可以自己绑定cpu

uick Guide to KMP_AFFINITY

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值