OpenMP + MPI 编程（六）Hybrid Program Template & Types of MPI Calls Among Threads

最新推荐文章于 2024-01-13 14:33:42 发布

叶子心情你不懂

最新推荐文章于 2024-01-13 14:33:42 发布

阅读量410

点赞数

分类专栏：总结报告文章标签： openmp mpi

本文链接：https://blog.csdn.net/qq_40379678/article/details/109676241

版权

总结报告专栏收录该内容

15 篇文章 3 订阅

订阅专栏

Hybrid Program Template

Start with MPI initialization

(Serial regions are executed by the
main thread of the MPI process)

Create OMP parallel regions within
each MPI process
MPI calls may be allowed here too
MPI rank is known to all threads

Call MPI in single-threaded regions

Finalize MPI

MPI_Init
...
MPI_Call
...
OMP parallel
...
MPI_Call
...
end parallel

...
MPI_Call
...
MPI_Finalize

Types of MPI Calls Among Threads

Single-threaded messaging

Call MPI from a serial region

Call MPI from a single thread
within a parallel region

Multi-threaded messaging

Call MPI from multiple threads
within a parallel region
Requires an implementation of
MPI that is thread-safe

MPI and Thread Safety

Consider thread safety when calling MPI from threads
Use MPI_Init_thread to determine/select MPI’s level of thread support:

Substitute MPI_Init_thread() for the usual call to MPI_Init()
MPI_Init_thread() is supported in MPI-2 and up

Thread safety is identified/controlled by MPI’s provided types:

Single means no multithreading
Funneled means only the main thread can call MPI
Serialized means multiple threads can call MPI, but only 1 call can be in progress at a time
Multiple means MPI is thread safe

Monotonic values are assigned to the corresponding MPI parameters:

MPI_THREAD_SINGLE < MPI_THREAD_FUNNELED < MPI_THREAD_SERIALIZED < MPI_THREAD_MULTIPLE

MPI-2 MPI_Init_thread

call MPI_Init_thread (                          irqd, ipvd, ierr)
int  MPI_Init_thread (int *argc, char ***argv, int rqd, int *pvd)
int  MPI::Init_thread(int& argc, char**& argv, int rqd)

Input: rqd, or “required” (integer)
- Indicates the desired level of thread support
Output: pvd, or “provided” (integer)
- Indicates the available level of thread support
If thread level rqd is supported, the call returns pvd = rqd.
Otherwise, pvd returns the highest provided level of support.

Possible values for rqd and pvd:

Support Levels	Description
MPI_THREAD_SINGLE	Only one thread will execute.
MPI_THREAD_FUNNELED	Process may be multi-threaded, but only the main thread will make MPI calls (calls are “funneled” to main thread). Default
MPI_THREAD_SERIALIZED	Process may be multi-threaded, and any thread can make MPI calls, but threads cannot execute MPI calls concurrently; they must take turns (MPI calls are “serialized”).
MPI_THREAD_MULTIPLE	Multiple threads may call MPI, with no restriction.

Example: Single-Threaded MPI

#include <mpi.h>
int main(int argc, char **argv) {
	int rank, size, ie, i;
	
	ie= MPI_Init(&argc,&argv[]);
	ie= MPI_Comm_rank(...&rank);
	ie= MPI_Comm_size(...&size);
	
	//Setup shared mem, comp/comm
	
	#pragma omp parallel for
	  for(i=0; i<n; i++){
	    <work>
	  }
	
	// compute & communicate
	
	ie= MPI_Finalize();
}

Funneled MPI via Main Thread

Must have support for MPI_THREAD_FUNNELED or higher
Only the main thread may make MPI calls; in OpenMP, the main thread is activated by the omp master construct
This OpenMP construct does not create implicit barriers for the other threads, so it may be necessary to use omp barrier as well
As illustrated in the example, barriers may be required in two places:
- Before the MPI call, in case the MPI call needs to wait on the results of other threads
- After the MPI call, in case other threads immediately need the results of the MPI call

In the example, the main thread executes a single MPI call, while the other threads are idl

#include <mpi.h>
int main(int argc, char **argv) {
int ie;
...
#pragma omp parallel
{
  #pragma omp barrier
  #pragma omp master
  {
    ie = MPI_<Whatever>(...);
  }
  #pragma omp barrier
}
...

Serialized MPI and OpenMP

Must have support for MPI_THREAD_SERIALIZED or higher
Any thread may make MPI calls, but not concurrently with other threads
Works with OpenMP constructs such as omp critical and omp single
It may be necessary to use omp barrier before omp single and the MPI call(s), but not after, as there is an implicit barrier at the end of the omp single worksharing construct

As discussed on the previous page for MPI_THREAD_FUNNELED, a barrier should only be necessary if it is important that MPI wait on the results of other threads. The following example is the simplest one: any thread (not necessarily the main thread) executes an MPI call within the omp single construct, while the other threads are idle.

#include <mpi.h>
int main(int argc, char **argv) {
int ie, ipvd;
...
ie = MPI_Init_thread(
    MPI_THREAD_SERIALIZED,ipvd);
#pragma omp parallel
{
  #pragma omp barrier
  #pragma omp single
  {
    ie = MPI_<Whatever>(...);
  }
  //Don't need omp barrier here
}
...

Overlapping Work and MPI

Just a few processes or threads should be capable of saturating the Omni-Path network link (according to Intel Omni-Path documentation, Sec. 4.9.4)Just a few processes or threads should be capable of saturating the Omni-Path network link (according to Intel Omni-Path documentation, Sec. 4.9.4)
- Why use all cores to communicate?
- Instead, communicate using just one core or several cores
- Can do work with the rest during communication
Must have support for MPI_THREAD_FUNNELED or higher to do this
Can be difficult to manage and load-balance!

#include <mpi.h>
int main(int argc, char **argv) {
int ie;
...
#pragma omp parallel
{
  if (thread == 0) {
    ie = MPI_<Whatever>(...);
  } else {
    <work>
  }
}
...

Example: Multi-Threaded MPI Calls

Thread ID as well as rank can be used in communication
Technique is illustrated in multi-thread “ping” (send/receive) example:

在这里插入图片描述

A total of nthreads messages are sent and received
From the sending process: each thread sends a message tagged with its thread number
Recall that an MPI message can be received only by an MPI_Recv with a matching tag
At the receiving process: each thread looks for a message tagged with its thread number
Therefore, communication occurs pairwise, between threads whose numbers match

NUMA Control in Code

Within a code, Scheduling Affinity and Memory Policy can be examined and changed via APIs to Linux system routines

懒得抄，可以自己绑定cpu

uick Guide to KMP_AFFINITY

叶子心情你不懂

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
OpenMP + MPI 编程（六）Hybrid Program Template & Types of MPI Calls Among Threads

Hybrid Program TemplateStart with MPI initialization(Serial regions are executed by themain thread of the MPI process)Create OMP parallel regions withineach MPI processMPI calls may be allowed here tooMPI rank is known to all threadsCall MPI in sin
复制链接

扫一扫