4.2 控制嵌套并行操作
通过在执行程序前设置各种环境变量,可以在运行时控制嵌套并行操作。
4.2.1 OMP_NESTED
可通过设置 OMP_NESTED 环境变量或调用 omp_set_nested() 来启用或禁用嵌套并行操作。
以下示例中的嵌套并行构造具有三个级别。
示例 4-1 嵌套并行操作示例#include
#include
void report_num_threads(int level)
{
#pragma omp single
{
printf("Level %d: number of threads in the team - %d\n",
level, omp_get_num_threads());
}
}
int main()
{
omp_set_dynamic(0);
#pragma omp parallel num_threads(2)
{
report_num_threads(1);
#pragma omp parallel num_threads(2)
{
report_num_threads(2);
#pragma omp parallel num_threads(2)
{
report_num_threads(3);
}
}
}
return(0);
}
启用嵌套并行操作时,编译和运行此程序会产生以下(经过排序的)输出:% setenv OMP_NESTED TRUE
% a.out
Level 1: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
以下示例运行相同程序,但是禁用了嵌套并行操作:% setenv OMP_NESTED FALSE
% a.out
Level 1: number of threads in the team - 2
Level 2: number of threads in the team - 1
Level 3: number of threads in the team - 1
Level 2: number of threads in the team - 1
Level 3: number of threads in the team - 1
4.2.2 OMP_THREAD_LIMIT
OpenMP 运行时库维护一个线程池,该线程池可用作并行区域中的从属线程。可通过设置 OMP_THREAD_LIMIT 环境变量来控制池中的线程数。缺省情况下,池中的线程数最多为 1023。
线程池只包含运行时库创建的非用户线程。该池不包含初始线程或由用户程序显式创建的任何线程。
如果将 OMP_THREAD_LIMIT 设置为 1(或将 SUNW_MP_MAX_POOL_THREADS 设置为 0),则线程池将为空,并且将由一个线程执行所有并行区域。
以下示例表明,如果池中的线程不足,并行区域将获得较少的从属线程。代码与示例 4-1 中的相同。使所有并行区域同时处于活动状态所需的线程数为 8 个。所以,池至少需要包含 7 个线程。如果将 OMP_THREAD_LIMIT 设置为 6(或将 SUNW_MP_MAX_POOL_THREADS 设置为 5),则池最多包含
5 个从属线程。这意味着四个最里面的并行区域中的两个区域可能无法获取所请求的所有从属线程。以下示例显示一个可能的结果:% setenv OMP_NESTED TRUE
% OMP_THREAD_LIMIT 6
% a.out
Level 1: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 1
Level 3: number of threads in the team - 1
4.2.3 OMP_MAX_ACTIVE_LEVELS
环境变量 OMP_MAX_ACTIVE_LEVELS 可控制嵌套活动并行区域的最大数量。如果由包含多个线程的组执行并行区域,则该并行区域处于活动状态。嵌套活动并行区域的缺省最大数量为 4。
请注意,设置该环境变量不足以启用嵌套并行操作。该环境变量仅控制嵌套活动并行区域的最大数量,并不启用嵌套并行操作。要启用嵌套并行操作,必须将 OMP_NESTED 设置为 TRUE,或者必须使用求值结果为 true 的参数调用 omp_set_nested()。
以下代码将创建 4 级嵌套并行区域。如果将 OMP_MAX_ACTIVE_LEVELS 设置为 2,嵌套深度为 3 和 4 的嵌套并行区域将由单个线程来执行。#include
#include
#define DEPTH 5
void report_num_threads(int level)
{
#pragma omp single
{
printf("Level %d: number of threads in the team - %d\n",
level, omp_get_num_threads());
}
}
void nested(int depth)
{
if (depth == DEPTH)
return;
#pragma omp parallel num_threads(2)
{
report_num_threads(depth);
nested(depth+1);
}
}
int main()
{
omp_set_dynamic(0);
omp_set_nested(1);
nested(1);
return(0);
}
以下示例说明了使用最大嵌套级别 4 编译并运行此程序所产生的可能结果。(实际结果取决于 OS 调度线程的方式。)% setenv OMP_MAX_ACTIVE_LEVELS 4
% a.out |sort
Level 1: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 4: number of threads in the team - 2
Level 4: number of threads in the team - 2
Level 4: number of threads in the team - 2
Level 4: number of threads in the team - 2
Level 4: number of threads in the team - 2
Level 4: number of threads in the team - 2
Level 4: number of threads in the team - 2
Level 4: number of threads in the team - 2
以下示例说明了将嵌套级别设置为 2 时运行的可能结果:% setenv OMP_MAX_ACTIVE_LEVELS 2
% a.out |sort
Level 1: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 3: number of threads in the team - 1
Level 3: number of threads in the team - 1
Level 3: number of threads in the team - 1
Level 3: number of threads in the team - 1
Level 4: number of threads in the team - 1
Level 4: number of threads in the team - 1
Level 4: number of threads in the team - 1
Level 4: number of threads in the team - 1
此外,这些示例只显示了一些可能的结果。实际结果取决于 OS 调度线程的方式。