Hybrid Program Template
Start with MPI initialization
(Serial regions are executed by the
main thread of the MPI process)
Create OMP parallel regions within
each MPI process
MPI calls may be allowed here too
MPI rank is known to all threads
Call MPI in single-threaded regions
Finalize MPI
MPI_Init
...
MPI_Call
...
OMP parallel
...
MPI_Call
...
end parallel
...
MPI_Call
...
MPI_Finalize
Types of MPI Calls Among Threads
Single-threaded messaging
Call MPI from a serial region
- Call MPI from a single thread
- within a parallel region
Multi-threaded messaging
- Call MPI from multiple threads
within a parallel region - Requires an implementation of
MPI that is thread-safe
MPI and Thread Safety
Consider thread safety when calling MPI from threads
Use MPI_Init_thread to determine/select MPI’s level of thread support:
- Substitute MPI_Init_thread() for the usual call to MPI_Init()
- MPI_Init_thread() is supported in MPI-2 and up
Thread safety is identified/controlled by MPI’s provided types:
- Single means no multithreading
- Funneled means only the main thread can call MPI
- Serialized means multiple threads can call MPI, but only 1 call can be in progress at a time
- Multiple means MPI is thread safe
Monotonic values are assigned to the corresponding MPI parameters:
MPI_THREAD_SINGLE < MPI_THREAD_FUNNELED < MPI_THREAD_SERIALIZED < MPI_THREAD_MULTIPLE
MPI-2 MPI_Init_thread
call MPI_Init_thread ( irqd, ipvd, ierr)
int MPI_Init_thread (int *argc, char ***argv, int rqd, int *pvd)
int MPI::Init_thread(int& argc, char**& argv, int rqd)
-
Input: rqd, or “required” (integer)
- Indicates the desired level of thread support
-
Output: pvd, or “provided” (integer)
- Indicates the available level of thread support
-
If thread level rqd is supported, the call returns pvd = rqd.
-
Otherwise, pvd returns the highest provided level of support.
Possible values for rqd and pvd:
Support Levels | Description |
---|---|
MPI_THREAD_SINGLE | Only one thread will execute. |
MPI_THREAD_FUNNELED | Process may be multi-threaded, but only the main thread will make MPI calls (calls are “funneled” to main thread). Default |
MPI_THREAD_SERIALIZED | Process may be multi-threaded, and any thread can make MPI calls, but threads cannot execute MPI calls concurrently; they must take turns (MPI calls are “serialized”). |
MPI_THREAD_MULTIPLE | Multiple threads may call MPI, with no restriction. |
Example: Single-Threaded MPI
#include <mpi.h>
int main(int argc, char **argv) {
int rank, size, ie, i;
ie= MPI_Init(&argc,&argv[]);
ie= MPI_Comm_rank(...&rank);
ie= MPI_Comm_size(...&size);
//Setup shared mem, comp/comm
#pragma omp parallel for
for(i=0; i<n; i++){
<work>
}
// compute & communicate
ie= MPI_Finalize();
}
Funneled MPI via Main Thread
- Must have support for MPI_THREAD_FUNNELED or higher
- Only the main thread may make MPI calls; in OpenMP, the main thread is activated by the omp master construct
- This OpenMP construct does not create implicit barriers for the other threads, so it may be necessary to use omp barrier as well
- As illustrated in the example, barriers may be required in two places:
- Before the MPI call, in case the MPI call needs to wait on the results of other threads
- After the MPI call, in case other threads immediately need the results of the MPI call
In the example, the main thread executes a single MPI call, while the other threads are idl
#include <mpi.h>
int main(int argc, char **argv) {
int ie;
...
#pragma omp parallel
{
#pragma omp barrier
#pragma omp master
{
ie = MPI_<Whatever>(...);
}
#pragma omp barrier
}
...
Serialized MPI and OpenMP
- Must have support for MPI_THREAD_SERIALIZED or higher
- Any thread may make MPI calls, but not concurrently with other threads
- Works with OpenMP constructs such as omp critical and omp single
- It may be necessary to use omp barrier before omp single and the MPI call(s), but not after, as there is an implicit barrier at the end of the omp single worksharing construct
As discussed on the previous page for MPI_THREAD_FUNNELED, a barrier should only be necessary if it is important that MPI wait on the results of other threads. The following example is the simplest one: any thread (not necessarily the main thread) executes an MPI call within the omp single construct, while the other threads are idle.
#include <mpi.h>
int main(int argc, char **argv) {
int ie, ipvd;
...
ie = MPI_Init_thread(
MPI_THREAD_SERIALIZED,ipvd);
#pragma omp parallel
{
#pragma omp barrier
#pragma omp single
{
ie = MPI_<Whatever>(...);
}
//Don't need omp barrier here
}
...
Overlapping Work and MPI
-
Just a few processes or threads should be capable of saturating the Omni-Path network link (according to Intel Omni-Path documentation, Sec. 4.9.4)Just a few processes or threads should be capable of saturating the Omni-Path network link (according to Intel Omni-Path documentation, Sec. 4.9.4)
- Why use all cores to communicate?
- Instead, communicate using just one core or several cores
- Can do work with the rest during communication
-
Must have support for MPI_THREAD_FUNNELED or higher to do this
-
Can be difficult to manage and load-balance!
#include <mpi.h>
int main(int argc, char **argv) {
int ie;
...
#pragma omp parallel
{
if (thread == 0) {
ie = MPI_<Whatever>(...);
} else {
<work>
}
}
...
Example: Multi-Threaded MPI Calls
- Thread ID as well as rank can be used in communication
- Technique is illustrated in multi-thread “ping” (send/receive) example:
- A total of nthreads messages are sent and received
- From the sending process: each thread sends a message tagged with its thread number
- Recall that an MPI message can be received only by an MPI_Recv with a matching tag
- At the receiving process: each thread looks for a message tagged with its thread number
- Therefore, communication occurs pairwise, between threads whose numbers match
NUMA Control in Code
Within a code, Scheduling Affinity and Memory Policy can be examined and changed via APIs to Linux system routines
懒得抄,可以自己绑定cpu