对于OpenMP的任务调度主要针对于并行的for循环,当每一次循环过程中的计算时间复杂度不一致的时候,简单的给每一个线程分配相同次数的迭代,会导致线程计算负载不均衡。不仅如此,对于实时计算的计算机,每一个核心的占用率是不一样的。针对该问题,OpenMP中给出多种线程调度的方式。
1. 基本使用
#include <stdio.h>
#include <omp.h>
const int NUMS = 20;
int main() {
omp_set_num_threads(4);
#pragma omp parallel for
for(int i=0; i< NUMS; ++i) {
printf("id is %3d thread is %d\n",i, omp_get_thread_num());
}
return 0;
}
输出结果:
id is 0 thread is 0
id is 1 thread is 0
id is 2 thread is 0
id is 3 thread is 0
id is 15 thread is 3
id is 16 thread is 3
id is 17 thread is 3
id is 18 thread is 3
id is 19 thread is 3
id is 4 thread is 0
id is 10 thread is 2
id is 11 thread is 2
id is 12 thread is 2
id is 13 thread is 2
id is 14 thread is 2
id is 5 thread is 1
id is 6 thread is 1
id is 7 thread is 1
id is 8 thread is 1
id is 9 thread is 1
可以看到对于循环20次的时候,使用四个线程,每一个线程会平均分配5个迭代运算,分配方式是按顺序分配。
2. schedule子句
语法:schedule(type[, size])
(1)type
:表示调度的类型,分别有static
、dynamic
、guided
、runtime
。在实际使用的过程中,一般不会使用runtime
,当类型为runtime
的时候,不能设置size
。
(2)size
:表示一次性分配给一个线程的循环次数。
3. 静态调度static
编译器在执行没有使用schedule
的并行指令时,默认是static
子句。static
在编译的过程中,就已经确定了线程的分配方案。如果不使用size
制定,默认每一个线程会分配N/num_threads
个迭代。也就是说,先给第一个线程直接分配N/num_threads
个,依次遍历每一个线程。但是如果设置了size
,首先给第一个线程分配size
个迭代,然后分配到最后一个线程,再给第一个线程分配。(实际的分配过程可能不是这样子的,但是实际看到)
#include <stdio.h>
#include <omp.h>
const int NUMS = 20;
int main() {
omp_set_num_threads(4);
#pragma omp parallel for schedule(static)
for(int i=0; i< NUMS; ++i) {
printf("id is %3d thread is %d\n",i, omp_get_thread_num());
}
return 0;
}
输出结果:
id is 0 thread is 0
id is 1 thread is 0
id is 2 thread is 0
id is 3 thread is 0
id is 4 thread is 0
id is 5 thread is 1
id is 6 thread is 1
id is 7 thread is 1
id is 8 thread is 1
id is 9 thread is 1
id is 10 thread is 2
id is 11 thread is 2
id is 12 thread is 2
id is 13 thread is 2
id is 14 thread is 2
id is 15 thread is 3
id is 16 thread is 3
id is 17 thread is 3
id is 18 thread is 3
id is 19 thread is 3
使用size
#include <stdio.h>
#include <omp.h>
const int NUMS = 20;
int main() {
omp_set_num_threads(4);
#pragma omp parallel for schedule(static, 2)
for(int i=0; i< NUMS; ++i) {
printf("id is %3d thread is %d\n",i, omp_get_thread_num());
}
return 0;
}
输出结果:
id is 0 thread is 0
id is 1 thread is 0
id is 8 thread is 0
id is 9 thread is 0
id is 16 thread is 0
id is 17 thread is 0
id is 4 thread is 2
id is 5 thread is 2
id is 12 thread is 2
id is 13 thread is 2
id is 2 thread is 1
id is 3 thread is 1
id is 10 thread is 1
id is 11 thread is 1
id is 18 thread is 1
id is 19 thread is 1
id is 6 thread is 3
id is 7 thread is 3
id is 14 thread is 3
id is 15 thread is 3
4.动态调度dynamic
对于在实际运行的过程,有的线程执行速度较快,因此在执行完之后会去领取新的任务。如果不设置size
,是会迭代每一个循环分配给各个线程,如果使用size
,会每次分配size
个给一个线程。
#include <stdio.h>
#include <omp.h>
const int NUMS = 20;
int main() {
omp_set_num_threads(4);
#pragma omp parallel for schedule(dynamic)
for(int i=0; i< NUMS; ++i) {
printf("id is %3d thread is %d\n",i, omp_get_thread_num());
}
return 0;
}
输出结果:
id is 0 thread is 0
id is 4 thread is 0
id is 5 thread is 0
id is 6 thread is 0
id is 7 thread is 0
id is 8 thread is 0
id is 3 thread is 3
id is 10 thread is 3
id is 11 thread is 3
id is 12 thread is 3
id is 13 thread is 3
id is 1 thread is 2
id is 15 thread is 2
id is 16 thread is 2
id is 17 thread is 2
id is 18 thread is 2
id is 19 thread is 2
id is 9 thread is 0
id is 2 thread is 1
id is 14 thread is 3
使用size
:
#include <stdio.h>
#include <omp.h>
const int NUMS = 20;
int main() {
omp_set_num_threads(4);
#pragma omp parallel for schedule(dynamic, 3)
for(int i=0; i< NUMS; ++i) {
printf("id is %3d thread is %d\n",i, omp_get_thread_num());
}
return 0;
}
输出结果:
id is 0 thread is 0
id is 1 thread is 0
id is 2 thread is 0
id is 3 thread is 3
id is 4 thread is 3
id is 5 thread is 3
id is 15 thread is 3
id is 16 thread is 3
id is 17 thread is 3
id is 18 thread is 3
id is 19 thread is 3
id is 12 thread is 0
id is 13 thread is 0
id is 14 thread is 0
id is 6 thread is 1
id is 7 thread is 1
id is 8 thread is 1
id is 9 thread is 2
id is 10 thread is 2
id is 11 thread is 2
5. 启发式调度guided
采用启发式调度方法进行调度,每次分配给线程迭代次数不同,开始比较大,以后逐渐减小。size
表示每次分配的迭代次数的最小值,由于每次分配的迭代次数会逐渐减少,少到size
时,将不再减少。如果不知道size
的大小,那么默认size
为1,即一直减少到1。
每个任务分配的任务是先大后小,指数下降。当有大量任务需要循环时,刚开始为线程分配大量任务,最后任务不多时,给每个线程少量任务,可以达到线程任务均衡。